Getting 50% (SoTA) on Arc-AGI with GPT-4o

The author shares their process of reaching 50% accuracy on ARC-AGI’s public test set using GPT-4o, through generating Python programs and meticulous reasoning. They implemented additional tweaks and prompt variations to improve performance significantly. Despite the prior state of the art being 34% accuracy, the author achieved 50%, even surpassing human accuracy on a subset of the training set. Revision proved crucial in improving accuracy further. The author highlights GPT-4o’s limitations in vision and coding, showcasing the need for improvements in these areas. Ultimately, the results are promising but also reveal areas for enhancement and future exploration.

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt