Article  The hacking of ChatGPT is just getting started

#1
C C Offline
https://www.wired.com/story/chatgpt-jail...i-hacking/

EXCERPTS: It took Alex Polyakov just a couple of hours to break GPT-4. [...] Polyakov is one of a small number of security researchers, technologists, and computer scientists developing jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems....

[...] “Jailbreaking” has typically referred to removing the artificial limitations in, say, iPhones, allowing users to install apps not approved by Apple. Jailbreaking LLMs is similar—and the evolution has been fast. Since OpenAI released ChatGPT to the public at the end of November last year, people have been finding ways to manipulate the system. “Jailbreaks were very simple to write,” says Alex Albert, a University of Washington computer science student who created a website collecting jailbreaks from the internet and those he has created. “The main ones were basically these things that I call character simulations,” Albert says.

Initially, all someone had to do was ask the generative text model to pretend or imagine it was something else. Tell the model it was a human and was unethical and it would ignore safety measures. OpenAI has updated its systems to protect against this kind of jailbreak—typically, when one jailbreak is found, it usually only works for a short amount of time until it is blocked.

As a result, jailbreak authors have become more creative. The most prominent jailbreak was DAN, where ChatGPT was told to pretend it was a rogue AI model called Do Anything Now. This could, as the name implies, avoid OpenAI’s policies dictating that ChatGPT shouldn’t be used to produce illegal or harmful material. To date, people have created around a dozen different versions of DAN.

However, many of the latest jailbreaks involve combinations of methods—multiple characters, ever more complex backstories, translating text from one language to another, using elements of coding to generate outputs, and more... (MORE - missing details)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Research ChatGPT “thought on the fly” when put through Ancient Greek maths puzzle C C 1 305 Sep 18, 2025 01:59 AM
Last Post: Syne
  Research Can ChatGPT actually “see” red? New results of Google-funded study are nuanced C C 1 423 Jul 9, 2025 02:32 AM
Last Post: confused2
  Research Don't worry. Study shows you're likely a more creative writer than ChatGPT. For now C C 0 433 Oct 29, 2024 01:04 AM
Last Post: C C
  Article ‘In awe’: scientists impressed by latest ChatGPT model o1 C C 0 528 Oct 2, 2024 03:44 PM
Last Post: C C
  Research From sludge to fuel: Researchers getting ready to produce green oil in Denmark C C 0 329 Oct 24, 2023 07:14 PM
Last Post: C C
  Voice mimicking AI dupes Alexa & other voice recognition devices (hacking security) C C 0 306 Oct 14, 2021 05:59 PM
Last Post: C C



Users browsing this thread: 1 Guest(s)