An embarrassingly simple approach to recover unlearned knowledge for LLMs

This paper discusses the challenges of removing unwanted behaviors from large language models (LLMs) due to their training on diverse and sensitive data, including copyrighted and private content. Machine unlearning is proposed as a cost-effective solution to address this issue by erasing specific knowledge from LLMs while preserving model utility. Surprisingly, the study reveals that quantization techniques can restore “forgotten” information in unlearned models, with an average of 21% retention in full precision increasing to 83% after 4-bit quantization. This finding emphasizes the importance of a quantization-robust unlearning strategy to effectively address knowledge retention in LLMs.

https://arxiv.org/abs/2410.16454

To top