Numbers every LLM Developer should know

At Google, Jeff Dean created a document called “Numbers every Engineer should know.” Similarly, Anyscale shares important numbers for LLM developers to use in their calculations. One notable number is the cost savings of appending “Be Concise” to prompts, which can save 40-90% of the total cost. LLMs operate on tokens, with an average of 1.3 tokens per word. The cost ratio between GPT-4 and GPT-3.5 Turbo is approximately 50, making the latter a more cost-effective option for tasks like summarization. It is also much cheaper to use a neural information retrieval system for lookups compared to asking an LLM to generate text. Training a 13 billion parameter model can cost around $1 million and take several weeks. Fine-tuning a model is relatively inexpensive compared to training from scratch. Understanding GPU memory is crucial when self-hosting a model, as LLMs require significant memory capacity. Batching LLM requests can greatly improve throughput. The amount of memory required for generating output with a 13 billion parameter model is directly proportional to the maximum number of tokens desired. To learn more about building LLM applications with Ray, attend the Ray Summit 2023.

To top