Article  Chatbot software begins to face fundamental limitations

#1
C C Offline
https://www.quantamagazine.org/chatbot-s...-20250131/

EXCERPT: . . . Einstein’s riddle requires composing a larger solution from solutions to subproblems, which researchers call a compositional task. Dziri’s team showed that LLMs that have only been trained to predict the next word in a sequence — which is most of them — are fundamentally limited in their ability to solve compositional reasoning tasks.

Other researchers have shown that transformers, the neural network architecture used by most LLMs, have hard mathematical bounds when it comes to solving such problems. Scientists have had some successes pushing transformers past these limits, but those increasingly look like short-term fixes. If so, it means there are fundamental computational caps on the abilities of these forms of artificial intelligence — which may mean it’s time to consider other approaches.

“The work is really motivated to help the community make this decision about whether transformers are really the architecture we want to embrace for universal learning,” said Andrew Wilson (opens a new tab), a machine learning expert at New York University who was not involved with this study.

Ironically, LLMs have only themselves to blame for this discovery of one of their limits. “The reason why we all got curious about whether they do real reasoning is because of their amazing capabilities,” Dziri said. They dazzled on tasks involving natural language, despite the seeming simplicity of their training. During the training phase, an LLM is shown a fragment of a sentence with the last word obscured (though technically it isn’t always a single word). The model predicts the missing information and then “learns” from its mistakes.

The largest LLMs — OpenAI’s o1 and GPT-4, Google’s Gemini, Anthropic’s Claude — train on almost all the available data on the internet. As a result, the LLMs end up learning the syntax of, and much of the semantic knowledge in, written language. Such “pre-trained” models can be further trained, or fine-tuned, to complete sophisticated tasks far beyond simple sentence completion, such as summarizing a complex document or generating code to play a computer game. The results were so powerful that the models seemed, at times, capable of reasoning.

Yet they also failed in ways both obvious and surprising. “On certain tasks, they perform amazingly well,” Dziri said. “On others, they’re shockingly stupid.” Nouha Dziri and her team helped show the difficulty current AI systems have with certain kinds of reasoning tasks.

Take basic multiplication. Standard LLMs, such as ChatGPT and GPT-4, fail badly at it. In early 2023 when Dziri’s team asked GPT-4 to multiply two three-digit numbers, it initially succeeded only 59% of the time. When it multiplied two four-digit numbers, accuracy fell to just 4%... (MORE - missing details)

RELATED (scivillage): The brain holds no exclusive rights on how to create intelligence
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Claims of new chatbot hallucinating less + Life, liberty, & superintelligence C C 2 222 Mar 23, 2025 06:16 PM
Last Post: stryder
  Article Can a chatbot be aware that it’s not aware? C C 1 182 Oct 16, 2023 10:28 PM
Last Post: confused2
  Verbal nonsense reveals limitations of AI chatbots + Robot consensus + AI outperforms C C 1 215 Sep 15, 2023 11:52 PM
Last Post: confused2
  Software engineering or computer science? Ostronomos 10 489 Mar 8, 2023 12:51 AM
Last Post: Syne
  LaMDA chatbot is “sentient”: Google places engineer on leave after he makes the claim C C 1 182 Jun 14, 2022 12:47 AM
Last Post: stryder
  Predictive AI is bolstering physiognomy instead of discrediting it. Outrage begins... C C 0 139 May 24, 2022 07:34 PM
Last Post: C C
  This algorithm has opinions about your face C C 0 136 Apr 22, 2022 07:44 PM
Last Post: C C
  New research suggests there are limitations to what deep neural networks can do C C 0 130 Mar 30, 2022 05:27 PM
Last Post: C C
  Trust me, I’m a chatbot: Companies using them in customer services + Data privacy C C 0 148 Jul 15, 2021 05:39 PM
Last Post: C C
  SpaceX Computer Hardware and Software Yazata 2 388 Dec 30, 2019 10:44 PM
Last Post: C C



Users browsing this thread: 1 Guest(s)