Coping with dumb LLMs using classic ML

The author explores using a local LLM to improve search relevance without relying on expensive AI. By comparing LLM’s preferences for product relevance to human ratings, the author aims to create a reliable search relevance evaluator using their laptop. Various experiments are conducted, such as combining multiple LLM decisions to make smarter choices and analyzing different prompts for product attributes. Surprisingly, by training a decision tree classifier with LLM evaluations, the author achieves promising results in predicting human preferences for product relevance. This approach could help guide search solutions and provide insights into human preferences. The use of traditional ML to combine LLM outputs is highlighted as an effective strategy.

https://softwaredoug.com/blog/2025/01/21/llm-judge-decision-tree