“Imprecise” language models are smaller, speedier, and nearly as accurate

Large language models, like ChatGPT, continue to improve in performance, but they are also becoming larger and more energy-intensive. To address this issue, researchers are working on creating 1-bit LLMs that are small enough to run on devices like cellphones. They are achieving this by rounding off high-precision numbers to 1 or -1 to compress the network parameters. One example is the BiLLM method, which reduced memory capacity by a tenth while maintaining performance. Another method, BitNet, led to LLMs that were 10 times more energy-efficient. These 1-bit models have the potential to revolutionize custom hardware and systems designed specifically for their optimization.

https://spectrum.ieee.org/1-bit-llm