New LLM optimization technique slashes memory costs up to 75%

Sakana AI researchers have developed a technique called “universal transformer memory” that optimizes language models’ memory usage. By using neural attention memory models (NAMMs), the new capability allows Transformers to focus on critical information, reducing unnecessary details and improving performance. NAMMs can be applied to various models without additional training, leading to better performance on natural language, coding problems, and other tasks on long sequences. This approach can save up to 75% of cache memory. NAMMs automatically adjust their behavior based on the task, making them a versatile tool for enterprise applications. The researchers suggest further advancements in memory capabilities using NAMMs.

https://venturebeat.com/ai/new-llm-optimization-technique-slashes-memory-costs-up-to-75/