Paddler Paddler is an open-source load balancer and reverse proxy specially designed for llama.cpp servers, which require unique strategies for efficient request distribution. Paddler maintains a stateful load balancer that considers each server’s available slots and uses agents to monitor llama.cpp instances for optimal performance. It supports dynamic addition/removal of servers for integration with autoscaling tools. To register llama.cpp instances, agents report health status to the load balancer. Installation involves downloading the latest release or building the project, requiring go>=1.21 and nodejs. Running agents and the load balancer collect data and provide reverse proxy functionalities. The roadmap includes features like circuit breaker, OpenTelemetry observer, and integration with cloud providers.
https://github.com/distantmagic/paddler