Recent results show that LLMs struggle with compositional tasks

In 1962, a complex logic puzzle challenged readers by describing five houses and asking “Who Owns the Zebra?” Modern AI models like ChatGPT have struggled with such tasks, revealing limitations in their reasoning abilities. Teams of researchers, including Nouha Dziri and Binghui Peng, have explored the mathematical bounds of transformers, used in large language models. Despite these limitations, interventions like positional embeddings and chain-of-thought prompting are showing promise in enhancing LLM performance on tasks like arithmetic. While these interventions extend capabilities, the fundamental constraints of LLMs suggest that there will always be compositional problems beyond their reach, urging researchers to consider alternative AI approaches.

https://www.quantamagazine.org/chatbot-software-begins-to-face-fundamental-limitations-20250131/