Wordllama – Things you can do with the token embeddings of an LLM

WordLlama is a compact NLP toolkit focused on tasks like fuzzy-deduplication and ranking, optimized for CPU hardware with minimal inference-time dependencies. It extracts token embeddings from large language models to create efficient word representations, significantly smaller than models like GloVe. The utility offers Matryoshka Representations for truncating embedding dimensions, low resource requirements for fast CPU operation, and plans for Binarization for even faster calculations. Performance benchmarks show WordLlama’s efficiency, with multiple models available for different tasks. Training notes indicate the preference for 512 or 1024 dimensions for binary embeddings. The roadmap includes adding semantic text splitting and example notebooks.

https://github.com/dleemiller/WordLlama

To top