MK-1 – Oh TL;DR

OpenAI, Anthropic, and Google are able to serve their large language models economically by employing various techniques to optimize their inference stack. MK-1 aims to provide similar capabilities to other companies running AI models. Their first product, MKML, is a software package that can reduce LLM inference costs on GPUs by 2x with just a few lines of Python code. MKML can shrink the memory footprint of models and improve inference time. It works with popular ecosystems like Hugging Face and PyTorch. MKML offers cost optimization and speed optimization depending on the use case. The compressed models have high fidelity and can be easily integrated into existing workflows.

https://mkone.ai/blog/introducing-mk1