Overflow in consistent hashing (2018)

In 1997, David Karger et al. introduced consistent hashing, now widely used in systems like Apache Cassandra. This technique minimizes data movement between machines when cluster sizes change, using two hash functions to distribute data items among servers. M.V. Ramakrishna’s 1987 “urn” problem addresses overflow probabilities in multi-server setups, offering surprising results that challenge intuitive assumptions. Approximations by Ramakrishna shed light on the complex interplay between load factors, number of items, bin capacity, and number of bins. As system scales, overflow probabilities fluctuate unexpectedly. Adjusting node sizes or number can impact storage capacity drastically. Google and Vimeo offer innovative solutions for managing overflow in consistent hashing systems.


To top