In a project called BIG-bench, researchers tested large language models like ChatGPT on 204 tasks. Performance scaled predictability with model size, except for certain tasks where performance suddenly jumped after remaining low. This “breakthrough” behavior was likened to a phase transition in physics. However, a new paper from Stanford argues that these abilities are not unpredictable but a consequence of how they are measured. By changing metrics for tasks like three-digit addition, researchers found that the models’ abilities improved gradually and predictably with increasing parameters, dispelling the notion of emergence as a sudden, unpredictable phenomenon. The debate around AI safety and potential continues as these large models evolve.
https://www.quantamagazine.org/how-quickly-do-large-language-models-learn-unexpected-skills-20240213/