Language Models Are Super Mario: Absorbing Abilities from Homologous Models

In this paper, we present DARE, a technique that allows Language Models (LMs) to acquire new capabilities by assimilating parameters from similar models without retraining or GPUs. DARE sets delta parameters to zeros without impacting the abilities of Supervised Fine-Tuning (SFT) LMs, helping to sparsify parameters and merge multiple models into one. Experimental results show that DARE can eliminate up to 99% of delta parameters, and merging task-specific LMs can significantly enhance accuracy. For example, merging WizardLM and WizardMath improved zero-shot accuracy from 2.2 to 66.3, surpassing individual performances. The merged LM also ranks at the top on the Open LLM Leaderboard.

https://arxiv.org/abs/2311.03099