In the paper by Hoffmann et al. (2022), three methods are presented for estimating a compute-optimal scaling law. Our attempt to replicate their third estimation procedure reveals inconsistencies with the first two methods, failure to fit the data, and implausibly narrow confidence intervals. We question the validity of their reported estimates, suggesting that over 600,000 experiments would be needed to achieve such precision, particularly when they likely conducted fewer than 500. In contrast, our rederivation of the scaling law using the third approach produces results that align with the findings from the first two methods.
https://arxiv.org/abs/2404.10102