Apache Iceberg: open table format for data lakehouse

Apache Iceberg (2020-2021): table format with schema evolution, time travel, ACID transactions on Parquet in object storage. One of the three lakehouse standards with Delta Lake and Apache Hudi. ASF Graduated 2020.

Open SourceR&D Apache IcebergLakehouseParquetNetflixData EngineeringOpen Source

Data lakes with ACID

Data lakes based on Parquet/ORC files in S3/HDFS have limits: no multi-file ACID transactions, no controlled schema evolution, no time travel, query performance with huge file lists. Table formats add a metadata layer above files to solve these issues.

Apache Iceberg, created by Ryan Blue and Daniel Weeks at Netflix from 2017, donated to Apache Foundation in 2018 and Graduated TLP in May 2020. Apache 2.0 licence.

Features

  • Snapshot isolation — each commit produces a snapshot; readers see a consistent point in time
  • Time travel — queries on historical snapshots (SELECT ... AT TIMESTAMP '2021-01-15')
  • Schema evolution — add/rename/drop column with backward compatibility
  • Partition evolution — partition strategy change without rewrite
  • Hidden partitioning — Iceberg manages partitions automatically based on column values
  • Row-level operations — efficient UPDATE, DELETE, MERGE
  • ACID transactions — atomic commits across multiple files

Compute engines

Iceberg is decoupled from compute engine. Natively supported by:

  • Apache Spark
  • Trino / Presto
  • Apache Flink
  • AWS Athena, Google BigQuery (external tables)
  • DuckDB, Dremio

The three table formats

Three main lakehouse table formats are establishing themselves:

  • Delta Lake (Databricks) — Linux Foundation Open Source, more tied to Spark
  • Apache Iceberg — neutral, multi-engine support
  • Apache Hudi — Uber-originated, streaming upsert focus

Iceberg’s “vendor-neutral” positioning is a key strength versus Delta Lake.

In the Italian context

Italian adoption in companies with mature data lakes: banks, telco, large retailers, research institutes.


References: Apache Iceberg. Ryan Blue, Daniel Weeks, Netflix (2017). Apache TLP (May 2020). Apache 2.0 licence. Alternatives: Delta Lake (Databricks), Apache Hudi (Uber).

Need support? Under attack? Service Status
Need support? Under attack? Service Status