The author presents a new memory-disk hybrid indexing and search system called SPANN, aimed at providing an efficient solution for approximate nearest neighbor search with reduced costs. By storing centroid points in memory and large posting lists on disk, SPANN guarantees low latency and high recall rates. Using hierarchical balanced clustering for indexing and a query-aware pruning scheme for searching, SPANN outperforms DiskANN in speed and memory efficiency. Results show SPANN achieves 90% recall in just one millisecond with 32GB memory. The proposed system is available for use, showcasing its practicality and effectiveness.
https://arxiv.org/abs/2111.08566