Cost Per 1M tokens Of Self Hosting Llama-3

Cost of self-hosting Llama-3 8B-Instruct using EKS is around $17 per 1M tokens, while ChatGPT offers $1 per 1M tokens. Self-hosting hardware can bring the cost to less than $0.01 per 1M tokens, but it takes ~5.5 years to break even. Running Llama-3 on AWS instances like g4dn.2xlarge wasn’t efficient due to OOM issues, but upgrading to g4dn.16xlarge improved response time. Incorporating vLLM with 4 GPUs enhanced performance. Self-hosting hardware with 4 NVidia Tesla T4s can reduce costs significantly. Despite initial challenges, managing and scaling one’s hardware could potentially undercut ChatGPT prices, but actual utilization needs consideration.

https://blog.lytix.co/posts/self-hosting-llama-3

To top