In this study, we delve into the generalization properties of binary logistic classification, showcasing the dynamics of Grokking in a random feature model. We discover that Grokking, characterized by delayed generalization and non-monotonic test loss, is enhanced when the model is applied to training sets near linear separability. Despite a perfect generalizing solution being available, the logistic loss’s implicit bias can lead to overfitting when the data is linearly separable. Interestingly, our results align with recent literature, indicating that Grokking is prevalent near the interpolation threshold, akin to critical phenomena seen in physical systems.
https://arxiv.org/abs/2410.04489