Deep learning’s success is often credited to its ability to uncover new data representations automatically rather than relying on handcrafted features. However, a study reveals that deep networks learned through gradient descent are essentially equivalent to kernel machines, which memorize data and use it directly for prediction. This insight enhances the interpretability of deep network weights, as they are a combination of training examples. The network architecture integrates knowledge of the target function into the kernel, leading to improved learning algorithms. This revelation challenges common beliefs about deep learning and offers a new perspective on how these models operate.
https://arxiv.org/abs/2012.00152