Scivillage.com Casual Discussion Science Forum
Article The hacking of ChatGPT is just getting started - Printable Version

+- Scivillage.com Casual Discussion Science Forum (https://www.scivillage.com)
+-- Forum: Culture (https://www.scivillage.com/forum-49.html)
+--- Forum: Gadgets & Technology (https://www.scivillage.com/forum-83.html)
+--- Thread: Article The hacking of ChatGPT is just getting started (/thread-13964.html)



The hacking of ChatGPT is just getting started - C C - Apr 14, 2023

https://www.wired.com/story/chatgpt-jailbreak-generative-ai-hacking/

EXCERPTS: It took Alex Polyakov just a couple of hours to break GPT-4. [...] Polyakov is one of a small number of security researchers, technologists, and computer scientists developing jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems....

[...] “Jailbreaking” has typically referred to removing the artificial limitations in, say, iPhones, allowing users to install apps not approved by Apple. Jailbreaking LLMs is similar—and the evolution has been fast. Since OpenAI released ChatGPT to the public at the end of November last year, people have been finding ways to manipulate the system. “Jailbreaks were very simple to write,” says Alex Albert, a University of Washington computer science student who created a website collecting jailbreaks from the internet and those he has created. “The main ones were basically these things that I call character simulations,” Albert says.

Initially, all someone had to do was ask the generative text model to pretend or imagine it was something else. Tell the model it was a human and was unethical and it would ignore safety measures. OpenAI has updated its systems to protect against this kind of jailbreak—typically, when one jailbreak is found, it usually only works for a short amount of time until it is blocked.

As a result, jailbreak authors have become more creative. The most prominent jailbreak was DAN, where ChatGPT was told to pretend it was a rogue AI model called Do Anything Now. This could, as the name implies, avoid OpenAI’s policies dictating that ChatGPT shouldn’t be used to produce illegal or harmful material. To date, people have created around a dozen different versions of DAN.

However, many of the latest jailbreaks involve combinations of methods—multiple characters, ever more complex backstories, translating text from one language to another, using elements of coding to generate outputs, and more... (MORE - missing details)