Research & Development

Quantum Computing, Blockchain, IoT & Industry 4.0, Robotics, Energy & Sustainability. Prototypes and proof-of-concepts to validate emerging technologies.

Discover research areas →

Data lakes with ACID

Data lakes based on Parquet/ORC files in S3/HDFS have limits: no multi-file ACID transactions, no controlled schema evolution, no time travel, query performance with huge file lists. Table formats add a metadata layer above files to solve these issues.

Apache Iceberg, created by Ryan Blue and Daniel Weeks at Netflix from 2017, donated to Apache Foundation in 2018 and Graduated TLP in May 2020. Apache 2.0 licence.

Features

Snapshot isolation — each commit produces a snapshot; readers see a consistent point in time
Time travel — queries on historical snapshots (SELECT ... AT TIMESTAMP '2021-01-15')
Schema evolution — add/rename/drop column with backward compatibility
Partition evolution — partition strategy change without rewrite
Hidden partitioning — Iceberg manages partitions automatically based on column values
Row-level operations — efficient UPDATE, DELETE, MERGE
ACID transactions — atomic commits across multiple files

Compute engines

Iceberg is decoupled from compute engine. Natively supported by:

Apache Spark
Trino / Presto
Apache Flink
AWS Athena, Google BigQuery (external tables)
DuckDB, Dremio

The three table formats

Three main lakehouse table formats are establishing themselves:

Delta Lake (Databricks) — Linux Foundation Open Source, more tied to Spark
Apache Iceberg — neutral, multi-engine support
Apache Hudi — Uber-originated, streaming upsert focus

Iceberg’s “vendor-neutral” positioning is a key strength versus Delta Lake.

In the Italian context

Italian adoption in companies with mature data lakes: banks, telco, large retailers, research institutes.

References: Apache Iceberg. Ryan Blue, Daniel Weeks, Netflix (2017). Apache TLP (May 2020). Apache 2.0 licence. Alternatives: Delta Lake (Databricks), Apache Hudi (Uber).

Company

Actions

Links

Products

Solutions

Industries

Apache Iceberg: open table format for data lakehouse