
https://youtu.be/KY7_ufxh_Rk
VIDEO EXCERPTS: For the past years, the problems with artificial intelligence have been more amusing than scary, such as being unable to count the legs on a zebra. But in the past month things have taken a decidedly dark turn. I am starting to worry that the long-awaited era of agentic AI might become a big disaster.
Agentic AI is the term given to the current large language models that can use tools on your behalf, such as browsing the web, sending email, or talk to other people’s AIs. But once you let them do that, the potential damage is no longer contained.
[...] One of the most realistic threats on the horizon is AI-worms, that’s self-replicating AI prompts. An example for this comes from a recent paper that used a visual AI model based on an open-source version of Llama. You see, agentic AI uses tools by taking screenshots and analyzing them. They need to understand images.
But the authors of the paper demonstrate that it’s possible to tweak images so that they contain instructions for the model that humans can’t see. It works by subtly changing the pixels so that they trigger the weights for certain words.
In an example that they provide, an image that the AI agent “sees” on social media could trigger it to share that image, potentially setting off a cascade.
A similar problem was reported already last year, in which another group showed that you can put instructions into an email and tell the AI agent to share these instructions per email with potentially other AI agents. They just put the instructions into the text, but you could hide them so that no one would see them, say, in a small white font at the footer. You know, like the unsubscribe option.
This strategy is known as “prompt injection” and it’s a fundamental problem with large language models: They don’t distinguish between data and instructions to work on the data. It’s both in the same input. As others have pointed out before, this is a basically unfixable problem. So naturally, we are deploying it at scale...
AI is becoming dangerous. Are we ready? ... https://youtu.be/KY7_ufxh_Rk
https://www.youtube-nocookie.com/embed/KY7_ufxh_Rk
VIDEO EXCERPTS: For the past years, the problems with artificial intelligence have been more amusing than scary, such as being unable to count the legs on a zebra. But in the past month things have taken a decidedly dark turn. I am starting to worry that the long-awaited era of agentic AI might become a big disaster.
Agentic AI is the term given to the current large language models that can use tools on your behalf, such as browsing the web, sending email, or talk to other people’s AIs. But once you let them do that, the potential damage is no longer contained.
[...] One of the most realistic threats on the horizon is AI-worms, that’s self-replicating AI prompts. An example for this comes from a recent paper that used a visual AI model based on an open-source version of Llama. You see, agentic AI uses tools by taking screenshots and analyzing them. They need to understand images.
But the authors of the paper demonstrate that it’s possible to tweak images so that they contain instructions for the model that humans can’t see. It works by subtly changing the pixels so that they trigger the weights for certain words.
In an example that they provide, an image that the AI agent “sees” on social media could trigger it to share that image, potentially setting off a cascade.
A similar problem was reported already last year, in which another group showed that you can put instructions into an email and tell the AI agent to share these instructions per email with potentially other AI agents. They just put the instructions into the text, but you could hide them so that no one would see them, say, in a small white font at the footer. You know, like the unsubscribe option.
This strategy is known as “prompt injection” and it’s a fundamental problem with large language models: They don’t distinguish between data and instructions to work on the data. It’s both in the same input. As others have pointed out before, this is a basically unfixable problem. So naturally, we are deploying it at scale...
AI is becoming dangerous. Are we ready? ... https://youtu.be/KY7_ufxh_Rk