Daft: A High-Performance Distributed Dataframe Library for Multimodal Data

Daft is a powerful solution for developers handling multimodal data, providing a distributed dataframe library that brings familiarity to developers already acquainted with pandas or polars. With native support for complex types and memory formats, such as images, Daft maximizes the performance potential of modern hardware, including SIMD optimizations through Rust computation engine. Additionally, Daft optimizes memory usage, enabling small clusters to handle large datasets efficiently, and empowers users to work with datasets of any size through out-of-core processing. Daft stands out from other distributed dataframes such as Spark, Modin, and Dask with its impressive results on the TPC-H benchmark, showcasing its reliable processing on larger-than-memory datasets and ease of use on cloud infrastructure.

https://blog.getdaft.io/p/introducing-daft-a-high-performance

To top