In this article, the author discusses various file formats and storage layers used in data lakes and data lakehouses. They provide an overview of file formats like Avro, Parquet, ORC, Arrow, and Feather, highlighting their unique features and capabilities. The author also explores higher-level storage layers like Hive Format, Iceberg, and Delta Lake, which provide metadata management and schema evolution capabilities. They explain the differences between Iceberg and Delta Lake and discuss how these storage layers enable features like partitioning, schema evolution, data compression, and efficient query optimization. Additionally, the author explains the concept of data lakes and data lakehouses, highlighting the role of scalable storage and query engines in these environments. Examples of data lakehouse products like Databricks and Dremio are provided as well.
https://davidgomes.com/understanding-parquet-iceberg-and-data-lakehouses-at-broad/