Rag is more than just embedding search

Large language models (LLMs) have sparked interest in retrieval augmented generation (RAG). However, the simple approach of embedding user queries and directly searching a vector store is not always effective. The “Dumb” RAG Model assumes similarity between query and content embeddings, limits complex queries to a single string, and ignores other contextual information. Query understanding and rewriting can improve the RAG Model by leveraging tools like Pydantic. Metaphor Systems demonstrates how structuring queries can optimize performance. A personal assistant example highlights the multiple dispatch pattern for unified results from multiple backends. The flexibility of the instructor framework allows for diverse applications.

https://jxnl.github.io/instructor/blog/2023/09/17/rag-is-more-than-just-embedding-search/