Is Cosine-Similarity of Embeddings Really About Similarity?

Cosine-similarity is used to measure semantic similarity by comparing vectors, but it may not always give accurate results compared to unnormalized dot-products. Regularized linear models can provide insights into why cosine-similarity may not always be reliable, as it can lead to arbitrary and meaningless similarities. The regularization used in learning deep models can also have unintended effects on cosine-similarities, making results potentially arbitrary and opaque. Caution is advised when using cosine-similarity, and alternative methods are suggested. The author, Harald Steck, shares these insights to warn against blindly relying on cosine-similarity for evaluating similarities between high-dimensional objects.

https://arxiv.org/abs/2403.05440

To top