Self-Compressing Neural Networks

In this study, the focus is on reducing the size of neural networks to improve efficiency in terms of execution time, power consumption, bandwidth, and memory footprint. The proposed method, called Self-Compression, aims to remove redundant weights and reduce the number of bits needed to represent the remaining weights using a generalized loss function. Surprisingly, the experiments show that floating point accuracy can be achieved with only 3% of the bits and 18% of the weights remaining in the network. This approach does not require specialized hardware, making it a simple and general solution for efficient training and inference.

https://arxiv.org/abs/2301.13142