In April 2018, the author was working at a startup preparing to launch a highly anticipated new feature. They had built a waitlist to generate interest but kept the launch date a secret. After completing the feature and conducting light testing, they decided to enable it for all customers. However, they were unprepared for the influx of traffic that followed, leading to incidents such as services crashing and high latency. This was a result of the thundering herd problem, which occurs when a large number of requests overload an API after a period of unavailability. The author suggests scaling the API horizontally or vertically, prioritizing certain workloads, and implementing rate-limiting, caching, circuit-breaking, and alerts to prevent such incidents in the future. They also emphasize the importance of communication with clients and making infrastructure changes to protect other services. Overall, the author provides insightful strategies and suggestions for handling the thundering herd problem.
https://encore.dev/blog/thundering-herd-problem