The author compares Midas turning everything into gold to data scientists turning everything into vectors, highlighting the importance of vectors as the language of AI. They caution against blindly applying cosine similarity to vectors, as it can lead to misleading results. The post delves into the complexities of using embeddings and cosine similarity, providing tips on how to optimize similarity search for better outcomes. The author emphasizes the significance of understanding similarity definitions and using task-specific embeddings to achieve more meaningful comparisons. Unique approaches such as pre-prompt engineering and context extraction are discussed, offering practical insights for improving vector similarity in various use cases.
https://p.migdal.pl/blog/2025/01/dont-use-cosine-similarity/