In this study, we examine the capabilities of large language models (LLMs) to determine whether they simply learn superficial statistics or if they develop a coherent model of the data generating process, also known as a world model. Through analyzing the learned representations of various spatial and temporal datasets in the Llama-2 family of models, we provide evidence for the latter. Our findings reveal that LLMs acquire linear representations of space and time across different scales, which are consistent regardless of prompting variations and entity types. Surprisingly, we also discover the presence of specific neurons that encode spatial and temporal coordinates. These results suggest that modern LLMs possess structured knowledge and literal world models, challenging the notion that they merely learn surface-level statistics.
https://arxiv.org/abs/2310.02207