ChatGPT-4o vs. Math

In this series, the author tests OpenAI’s ChatGPT-4o’s ability to solve a math problem involving a roll of tape. Various experiments are conducted to evaluate if GPT-4o can solve the problem with just the prompt, with the prompt and image, and with prompt engineering. Surprisingly, using a simple prompt engineering technique called Chain-of-Thought resulted in the best performance. Multi-modality with image input led to confusion and incorrect answers. The conclusion reveals that the most effective approach was using a text-only prompt with zero-shot Chain-of-Thought prompt engineering. This highlights the importance of consistency in model performance.

https://www.sabrina.dev/p/chatgpt4o-vs-math