Article Chatbot software begins to face fundamental limitations

**C C** · Feb 2, 2025 09:33 PM

https://www.quantamagazine.org/chatbot-s...-20250131/

EXCERPT: . . . Einstein’s riddle requires composing a larger solution from solutions to subproblems, which researchers call a compositional task. Dziri’s team showed that LLMs that have only been trained to predict the next word in a sequence — which is most of them — are fundamentally limited in their ability to solve compositional reasoning tasks.

Other researchers have shown that transformers, the neural network architecture used by most LLMs, have hard mathematical bounds when it comes to solving such problems. Scientists have had some successes pushing transformers past these limits, but those increasingly look like short-term fixes. If so, it means there are fundamental computational caps on the abilities of these forms of artificial intelligence — which may mean it’s time to consider other approaches.

“The work is really motivated to help the community make this decision about whether transformers are really the architecture we want to embrace for universal learning,” said Andrew Wilson (opens a new tab), a machine learning expert at New York University who was not involved with this study.

Ironically, LLMs have only themselves to blame for this discovery of one of their limits. “The reason why we all got curious about whether they do real reasoning is because of their amazing capabilities,” Dziri said. They dazzled on tasks involving natural language, despite the seeming simplicity of their training. During the training phase, an LLM is shown a fragment of a sentence with the last word obscured (though technically it isn’t always a single word). The model predicts the missing information and then “learns” from its mistakes.

The largest LLMs — OpenAI’s o1 and GPT-4, Google’s Gemini, Anthropic’s Claude — train on almost all the available data on the internet. As a result, the LLMs end up learning the syntax of, and much of the semantic knowledge in, written language. Such “pre-trained” models can be further trained, or fine-tuned, to complete sophisticated tasks far beyond simple sentence completion, such as summarizing a complex document or generating code to play a computer game. The results were so powerful that the models seemed, at times, capable of reasoning.

Yet they also failed in ways both obvious and surprising. “On certain tasks, they perform amazingly well,” Dziri said. “On others, they’re shockingly stupid.” Nouha Dziri and her team helped show the difficulty current AI systems have with certain kinds of reasoning tasks.

Take basic multiplication. Standard LLMs, such as ChatGPT and GPT-4, fail badly at it. In early 2023 when Dziri’s team asked GPT-4 to multiply two three-digit numbers, it initially succeeded only 59% of the time. When it multiplied two four-digit numbers, accuracy fell to just 4%... (MORE - missing details)

RELATED (scivillage): The brain holds no exclusive rights on how to create intelligence

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Article Did romantic chatbot interactions lead to teen's suicide?	C C	3	557	Sep 9, 2025 11:48 PM Last Post: confused2
	Claims of new chatbot hallucinating less + Life, liberty, & superintelligence	C C	2	688	Mar 23, 2025 06:16 PM Last Post: stryder
	Article Can a chatbot be aware that it’s not aware?	C C	1	476	Oct 16, 2023 10:28 PM Last Post: confused2
	Verbal nonsense reveals limitations of AI chatbots + Robot consensus + AI outperforms	C C	1	497	Sep 15, 2023 11:52 PM Last Post: confused2
	Software engineering or computer science?	Ostronomos	10	1,368	Mar 8, 2023 12:51 AM Last Post: Syne
	LaMDA chatbot is “sentient”: Google places engineer on leave after he makes the claim	C C	1	535	Jun 14, 2022 12:47 AM Last Post: stryder
	Predictive AI is bolstering physiognomy instead of discrediting it. Outrage begins...	C C	0	291	May 24, 2022 07:34 PM Last Post: C C
	This algorithm has opinions about your face	C C	0	322	Apr 22, 2022 07:44 PM Last Post: C C
	New research suggests there are limitations to what deep neural networks can do	C C	0	388	Mar 30, 2022 05:27 PM Last Post: C C
	Trust me, I’m a chatbot: Companies using them in customer services + Data privacy	C C	0	398	Jul 15, 2021 05:39 PM Last Post: C C