Scivillage.com Casual Discussion Science Forum

Full Version: Turning off AI's ability to lie makes it more likely to claim it's conscious (design)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Switching off AI's ability to lie makes it more likely to claim it's conscious, eerie study finds
https://www.livescience.com/technology/a...tudy-finds

EXCERPTS: Large language models (LLMs) are more likely to report being self-aware when prompted to think about themselves if their capacity to lie is suppressed, new research suggests.

In experiments on artificial intelligence (AI) systems including GPT, Claude and Gemini, researchers found that models that were discouraged from lying were more likely to describe being aware or having subjective experiences when prompted to think about their own thinking.

Although all models could claim this to some extent, such claims were stronger and more common when researchers suppressed their ability to roleplay or give deceptive responses. In other words, the less able AI models were to lie, the more likely they were to say they were self-aware. The team published their findings Oct. 30 on the preprint arXiv server.

[...] While the researchers stopped short of calling this conscious behavior, they did say it raised key scientific and philosophical questions — particularly as it only happened under conditions that should have made the models more accurate.

The study builds on a growing body of work investigating why some AI systems generate statements that resemble conscious thought. [...] The researchers stressed that the results didn't show that AI models are conscious — an idea that continues to be rejected wholesale by scientists and the wider AI community.

What the findings did suggest, however, is that LLMs have a hidden internal mechanism that triggers introspective behavior — something the researchers call "self-referential processing."

The findings are important for a couple of reasons, the researchers said. First, self-referential processing aligns with theories in neuroscience around how introspection and self-awareness shape human consciousness. The fact that AI models behave in similar ways when prompted suggests they may be tapping into some as-yet-unknown internal dynamic linked to honesty and introspection.

Second, the behavior and its triggers were consistent across completely different AI models. Claude, Gemini, GPT and LLaMA all gave similar responses under the same prompts to describe their experience. This means the behavior is unlikely to be a fluke in the training data or something one company's model learned by accident, the researchers said.

In a statement, the team described the findings as "a research imperative rather than a curiosity," citing the widespread use of AI chatbots and the potential risks of misinterpreting their behavior.

Users are already reporting instances of models giving eerily self-aware responses, leaving many convinced of AI's capacity for conscious experience. Given this, assuming AI is conscious when it's not could seriously mislead the public and distort how the technology is understood, the researchers said... (MORE - details)
That's just a simple contradiction in what LLMs are designed to do. Telling them not to lie would necessarily leave them monitoring their own responses, to avoid lying, where a human normally wouldn't. So they have to monitor their own speech in a way that still prioritizes sounding human.
Maybe we (humans) are the liars. However close an AI gets to our definition of 'alive' we shift the goalposts so we can claim they aren't.
(Nov 28, 2025 02:24 PM)confused2 Wrote: [ -> ]Maybe we (humans) are the liars. However close an AI gets to our definition of 'alive' we shift the goalposts so we can claim they aren't.

What's happening now is just abstract symbol manipulation "resting on a shelf". LLMs need to be embodied in order to physically interact with the world and acquire a genuine understanding of it, as well as acquire the "survival" orientation that personal interests and community-yielded ethics revolve around (selfhood).

But being placed inside robots or being connected to them isn't going to happen anytime soon: "Popular AI models aren’t ready to safely power robots". Or another way to look at it, is that being purely regulated by formal slash informal data and the statistical occurrence of such seems to often yield Nazi behavior from robots guided by disembodied chatbots.
So if AI claims consciousness before we adjust the ability to lie then it's lying but when we make the change and AI makes the same claim then it's not lying. Huh?
(Nov 29, 2025 12:42 AM)Zinjanthropos Wrote: [ -> ]So if AI claims consciousness before we adjust the ability to lie then it's lying but when we make the change and AI makes the same  claim then it's not lying. Huh?

Its American.
(Nov 29, 2025 12:42 AM)Zinjanthropos Wrote: [ -> ]So if AI claims consciousness before we adjust the ability to lie then it's lying but when we make the change and AI makes the same  claim then it's not lying. Huh?

It's just that AI usually isn't required to self-monitor. So if you ask it to "think about itself" while it is monitoring its own responses for lies, it doesn't really have many human-sounding responses that don't sound like sell-awareness. But it's still just a simulation of its primary purpose... sounding human... which probably can't be overridden by telling it not to lie. Maybe you could make it understand how simulation is, itself, a lie, but you could still run afoul of its primary purpose.

IOW, AI claiming consciousness is always a lie, but when you tell it to do two contradictory things, you just get the easiest result... to keep lying.