Arroyo, an open-source stream processing engine, is gearing up for its 0.10 release featuring a new SQL engine based on Apache Arrow and DataFusion SQL toolkit. This update promises better performance, architectural simplicity, and improved community interaction. The shift to Arrow allows Arroyo to compete with top batch engines, reduce Docker image size significantly, and increase pipeline throughput and startup speed. Previously designed for a managed cloud service, Arroyo’s move towards self-hosting comes with its challenges, including enhancing user experience and code generation. The adoption of Arrow’s columnar format over static types brings about new possibilities for improved data processing, multitasking, and seamless integration with various data systems and languages.
https://www.arroyo.dev/blog/why-arrow-and-datafusion