How to generate tested software packages using LLMs, a sandbox and a while loop

devlooper is a program synthesis agent that autonomously fixes its output by running tests. It utilizes a sandbox to run tests and iterates until all tests pass, updating the code and fixing the environment as necessary. The project uses environment “templates” to define the setup and test harness for different languages/frameworks. A sandbox is used to run tests in an isolated environment and fetch the output. In each iteration, the agent runs the test command and diagnoses any errors using a separate step to generate a DebugPlan. To use devlooper, a Modal account, Modal token, OpenAI account, and API key are required. The project is a proof of concept with potential future directions for improvement.

https://github.com/modal-labs/devlooper