Can generalist foundation models beat special-purpose tuning?

In this paper, the authors explore the capabilities of GPT-4, a generalist foundation model, in various domains and tasks. They challenge the assumption that these models cannot perform as well as fine-tuned specialist models. The authors conduct a systematic exploration of prompt engineering to unlock deeper specialist capabilities in GPT-4. They introduce Medprompt, a composition of several prompting strategies, which allows GPT-4 to achieve state-of-the-art results on nine benchmark datasets in the MultiMedQA suite. The method outperforms leading specialist models and shows promise in other domains such as electrical engineering, machine learning, philosophy, accounting, law, nursing, and clinical psychology.

https://arxiv.org/abs/2311.16452

To top