This paper explores the dynamics of large language model (LLM) agents interacting in a society and learning social norms over generations. The study focuses on the evolution of cooperation in LLM agents playing an iterated Donor Game. Results show that Claude 3.5 Sonnet agents outperform Gemini 1.5 Flash and GPT-4o, with Claude 3.5 Sonnet being able to use costly punishment to achieve higher scores. Variations in behavior across random seeds suggest sensitivity to initial conditions. The findings suggest the need for new LLM benchmarks to assess the implications of deploying LLM agents on the cooperative infrastructure of society.
https://arxiv.org/abs/2412.10270