TL;DR – Voyage introduces voyage-code-3, an advanced embedding model for code retrieval that outperforms OpenAI-v3-large and CodeSage-large by 13.80% and 16.81% on 32 datasets. It supports smaller dimensions, Matryoshka learning, and quantized formats, reducing storage and search costs. The model can handle various embedding quantizations, including int8, uint8, binary, and ubinary, with a extended context length of 32K tokens. Matryoshka embeddings enable shorter versions of embeddings without quality loss. Voyage-code-3 optimizes for code retrieval challenges and uses a carefully curated training data set. Evaluation shows superior performance and the option for binary rescoring for further quality improvement. Free tokens available for trial.
https://blog.voyageai.com/2024/12/04/voyage-code-3/