Debugging an evil Go runtime bug: From heat guns to kernel compiler flags

The author, a former SRE at Google, is a big Prometheus and Grafana fan. They use Prometheus for monitoring their personal servers, client work, and internal events like Euskal Encounter. They experienced an unexpected crash in node_exporter, a Prometheus component written in Go, which led to a series of unusual crashes hinting at a potential hardware issue. Further testing revealed the presence of bad RAM with one weak cell that worsened with temperature. They creatively masked out the bad bits using a GRUB feature, ultimately sacrificing 3MiB of RAM for long-term device reliability. The journey included heating RAM sticks with a heat gun to uncover additional weak bits.

https://marcan.st/2017/12/debugging-an-evil-go-runtime-bug/

To top