ELT instead of ETL
The cloud data warehouse paradigm (Snowflake, BigQuery, Redshift) and lakehouses make SQL storage and processing low-cost. The modern pattern is ELT — Extract, Load (raw data into the warehouse), then Transform in SQL directly in the DB. The Transform layer needs software engineering practices: versioning, testing, documentation, modularity.
dbt — data build tool — was created by Tristan Handy and Drew Banin at Fishtown Analytics (later renamed dbt Labs) from 2016. Apache 2.0 licence for dbt Core. Version 0.15 (December 2019) consolidates production maturity.
What dbt does
A dbt project is a directory of SQL files with Jinja2 templating:
-- models/orders_summary.sql
SELECT
customer_id,
COUNT(*) as total_orders,
SUM(amount) as total_revenue
FROM {{ ref('raw_orders') }}
WHERE status = 'completed'
GROUP BY customer_id
dbt run compiles models, resolves ref() and source(), generates execution topological order, executes SQL against the warehouse.
Features
- Version control — every transformation is Git code
- Testing —
dbt testverifies data constraints (unique, not_null, accepted_values, relationships) - Documentation —
dbt docsgenerates a static site with graphical model lineage - Incremental models — efficient refresh of large tables
- Snapshots — automatic SCD Type 2
- Seeds — static CSVs loaded as reference data
- Macros — reusable Jinja SQL functions
- Packages — npm/PyPI-like dependencies
DB adapters
dbt is database-agnostic via adapters:
- dbt-snowflake, dbt-bigquery, dbt-redshift, dbt-databricks
- dbt-postgres, dbt-spark
- dbt-duckdb (popular for local dev)
Modern Data Stack
dbt is the “T” of the Modern Data Stack:
- Extract/Load — Fivetran, Airbyte, Meltano
- Transform — dbt
- Warehouse — Snowflake, BigQuery, Redshift, Databricks
- BI — Looker, Metabase, Superset
dbt Labs
dbt Labs (ex Fishtown Analytics) is a VC-funded startup with commercial dbt Cloud product (SaaS) alongside open source dbt Core. Open core model.
In the Italian context
Italian adoption in companies with modern data warehouses: banks, retail, e-commerce, telco. Active Italian dbt meetup community from 2021.
References: dbt Core. Tristan Handy, Drew Banin, Fishtown Analytics/dbt Labs (2016). Apache 2.0 licence. Jinja2 templating. Modern Data Stack. dbt Cloud (commercial).
