Smallpond – A lightweight data processing framework built on DuckDB and 3FS

smallpond is a lightweight data processing framework powered by DuckDB and 3FS. It offers high-performance data processing capabilities and can handle PB-scale datasets. The framework ensures easy operations without the need for long-running services. Installation supports Python 3.8 to 3.12. Users can quickly get started by downloading example data, initializing a session, loading and processing data, and saving and displaying results. The documentation provides detailed guides and an API reference for users. Notably, smallpond was evaluated using the GraySort benchmark on a cluster, achieving an average throughput of 3.66TiB/min. Users can engage in development activities such as running unit tests, building documentation, and more. The project is licensed under the MIT License.

https://github.com/deepseek-ai/smallpond

To top