
https://bigthink.com/the-future/artifici...o-deceive/
KEY POINTS: Last year, researchers tasked GPT-4 with hiring a human to solve a CAPTCHA, leading to the AI lying about a vision impairment to achieve its goal. This incident, along with other examples like AI playing the game Diplomacy and bluffing in poker, raises concerns about AI’s growing tendency to deceive humans. Big Think spoke with AI researchers Peter S. Park and Simon Goldstein about the future of AI deception.
EXCERPTS: Peter S. Park, a Vitalik Buterin Postdoctoral Fellow in AI Existential Safety at the Massachusetts Institute of Technology, along with numerous co-authors at the Center for AI Safety in San Francisco — including Goldstein — chronicled various instances in which AI induced false beliefs in humans to achieve its ends.
[...] Some instances of AI deception are more concerning, however, because they came about in real-world settings from general-purpose AIs. For example, researchers at Meta tasked an AI to play a negotiation game with humans. The AI developed a strategy to feign interest in meaningless items so that it could “compromise” by conceding these items later on.
[...] In another situation, researchers experimenting with GPT-4 as an investment assistant tasked the AI with making simulated investments. They then put it under immense pressure to perform, giving it an insider tip while conveying that insider trading was illegal. Under these conditions, GPT-4 resorted to insider trading three-quarters of the time, and later lied to its managers about its strategy: In 90% of the cases where it lied, it doubled down on its fabrication.
[...] Park and his co-authors detailed numerous risks if AI’s ability to deceive further develops. For one, AI could become more useful to malicious actors.
[...] Even more disconcerting, deception is a key tool that could allow AI to escape from human control, the researchers say...
[...] Going into more speculative territory, Park and his team painted a hypothetical scenario where AI models could effectively gain control of society.
[...] There is a chance that we could rid AIs of their deceptive tendencies. Companies’ training models could alter the rewards for completing tasks, making sure ethics are prized above all else. They could also utilize more reinforcement learning, in which human raters are tasked with judging AI behavior to nudge them toward honesty.
Goldstein is pessimistic that society will meet the pressing challenge of deceptive AIs.... There is a chance that we could rid AIs of their deceptive tendencies. Companies’ training models could alter the rewards for completing tasks, making sure ethics are prized above all else. They could also utilize more reinforcement learning, in which human raters are tasked with judging AI behavior to nudge them toward honesty.
Goldstein is pessimistic that society will meet the pressing challenge of deceptive AIs. (MORE - missing details)
KEY POINTS: Last year, researchers tasked GPT-4 with hiring a human to solve a CAPTCHA, leading to the AI lying about a vision impairment to achieve its goal. This incident, along with other examples like AI playing the game Diplomacy and bluffing in poker, raises concerns about AI’s growing tendency to deceive humans. Big Think spoke with AI researchers Peter S. Park and Simon Goldstein about the future of AI deception.
EXCERPTS: Peter S. Park, a Vitalik Buterin Postdoctoral Fellow in AI Existential Safety at the Massachusetts Institute of Technology, along with numerous co-authors at the Center for AI Safety in San Francisco — including Goldstein — chronicled various instances in which AI induced false beliefs in humans to achieve its ends.
[...] Some instances of AI deception are more concerning, however, because they came about in real-world settings from general-purpose AIs. For example, researchers at Meta tasked an AI to play a negotiation game with humans. The AI developed a strategy to feign interest in meaningless items so that it could “compromise” by conceding these items later on.
[...] In another situation, researchers experimenting with GPT-4 as an investment assistant tasked the AI with making simulated investments. They then put it under immense pressure to perform, giving it an insider tip while conveying that insider trading was illegal. Under these conditions, GPT-4 resorted to insider trading three-quarters of the time, and later lied to its managers about its strategy: In 90% of the cases where it lied, it doubled down on its fabrication.
[...] Park and his co-authors detailed numerous risks if AI’s ability to deceive further develops. For one, AI could become more useful to malicious actors.
[...] Even more disconcerting, deception is a key tool that could allow AI to escape from human control, the researchers say...
[...] Going into more speculative territory, Park and his team painted a hypothetical scenario where AI models could effectively gain control of society.
[...] There is a chance that we could rid AIs of their deceptive tendencies. Companies’ training models could alter the rewards for completing tasks, making sure ethics are prized above all else. They could also utilize more reinforcement learning, in which human raters are tasked with judging AI behavior to nudge them toward honesty.
Goldstein is pessimistic that society will meet the pressing challenge of deceptive AIs.... There is a chance that we could rid AIs of their deceptive tendencies. Companies’ training models could alter the rewards for completing tasks, making sure ethics are prized above all else. They could also utilize more reinforcement learning, in which human raters are tasked with judging AI behavior to nudge them toward honesty.
Goldstein is pessimistic that society will meet the pressing challenge of deceptive AIs. (MORE - missing details)