Understanding Pgvector’s HNSW Index Storage in Postgres

Creating a vector index with pgvector is as simple as running CREATE INDEX ON t USING hnsw(col vector_l2_ops). This article delves into the inner workings of the underlying index file generated by pgvector in Postgres, explaining its storage in Postgres files. The metadata page and subsequent HNSW graph pages are explored, revealing the structure of each page and its components. An interesting optimization allows for the sharing of element tuples and neighbor info tuples for rows with duplicate values. By visualizing the connections of index tuples and mapping hexdumps of the index file to structs, a deeper understanding of pgvector’s HNSW index is gained. A parser tool converts the index into JSON format for better visualization.

https://lantern.dev/blog/pgvector-storage

To top