Yelp Rebuilds Corrupted Cassandra Cluster Using Its Data Streaming Architecture

Yelp recently faced a data corruption issue with its Apache Cassandra cluster and had to find a solution to sanitize the data. After exploring various options, Yelp decided to transfer the data to a new cluster to remove the corrupted records. The company typically uses multiple smaller Cassandra clusters for different purposes, and some were hosted on EC2 while others were on Kubernetes. The team used a data pipeline, inspired by manufacturing industry sortation systems, to filter out the defective data and transfer the sanitized data to the new cluster. Statistical sampling was used to validate the data migration process. Ultimately, Yelp successfully migrated to the new cluster and eliminated the corrupted one.

https://www.infoq.com/news/2023/07/yelp-corrupted-cassandra-rebuild/

To top