Understanding the BM25 full text search algorithm

BM25, a widely used full text search algorithm, is default in Lucene/Elasticsearch and SQLite. It combines full text search and vector similarity search in hybrid search. The author attempted to understand BM25 by exploring its components and how it ranks documents probabilistically. BM25 cleverly ranks by probability without calculating probability, assuming most documents are irrelevant. BM25 scores can be compared within the same collection, aiding in personalized content feeds to match user interests. The backstory of BM25’s development includes impressive leaps of faith. While comparisons across queries are limited, BM25 remains a powerful tool for ranking relevance within a document collection.

https://emschwartz.me/understanding-the-bm25-full-text-search-algorithm/

To top