Building and scaling Notion’s data lake

Notion has seen a tremendous growth in data over the past three years, with activity from users and content creation driving a doubling every 6-12 months. To meet the demands of critical use cases, especially AI features, Notion built and scaled a data lake for efficient data management. All data in Notion, from texts to images, is stored as “blocks” in Postgres, totaling over 200 billion blocks by 2023. Challenges arose with scaling, leading Notion to create an in-house data lake using Spark, S3, Kafka, and Hudi. This move resulted in significant cost savings and improved data freshness, supporting the launch of Notion AI features.

https://www.notion.so/de-de/blog/building-and-scaling-notions-data-lake

To top