DeepSeek has been making waves with their R1 model since January 2025, surpassing OpenAI’s O1 in performance. Their efficient infrastructure significantly cuts costs while maintaining high performance. DeepSeek is now targeting data engineers with smallpond, a distributed compute framework using DuckDB. This approach optimizes performance by deferring computation until necessary, improving efficiency. They chose to go with a Ray-powered distribution mechanism for scaling out rather than up. The use of the custom DeepSeek 3FS framework for storage enhances speed and performance compared to systems like AWS S3. While smallpond simplifies distributed computing, it might not be as optimized for complex queries. Overall, it’s exciting to see DuckDB being creatively used in AI-heavy workloads.
https://mehdio.substack.com/p/duckdb-goes-distributed-deepseeks