dbt Core: versioned SQL data transformation

dbt (data build tool) Core: data transformation framework by Fishtown Analytics (later dbt Labs), 2016+. SQL with Jinja2, Git version control, testing, auto-generated documentation. De facto standard of the modern data stack.

Open SourceR&D dbtData TransformationAnalytics EngineeringSQLdbt LabsOpen Source

ELT instead of ETL

The cloud data warehouse paradigm (Snowflake, BigQuery, Redshift) and lakehouses make SQL storage and processing low-cost. The modern pattern is ELTExtract, Load (raw data into the warehouse), then Transform in SQL directly in the DB. The Transform layer needs software engineering practices: versioning, testing, documentation, modularity.

dbtdata build tool — was created by Tristan Handy and Drew Banin at Fishtown Analytics (later renamed dbt Labs) from 2016. Apache 2.0 licence for dbt Core. Version 0.15 (December 2019) consolidates production maturity.

What dbt does

A dbt project is a directory of SQL files with Jinja2 templating:

-- models/orders_summary.sql
SELECT
    customer_id,
    COUNT(*) as total_orders,
    SUM(amount) as total_revenue
FROM {{ ref('raw_orders') }}
WHERE status = 'completed'
GROUP BY customer_id

dbt run compiles models, resolves ref() and source(), generates execution topological order, executes SQL against the warehouse.

Features

  • Version control — every transformation is Git code
  • Testingdbt test verifies data constraints (unique, not_null, accepted_values, relationships)
  • Documentationdbt docs generates a static site with graphical model lineage
  • Incremental models — efficient refresh of large tables
  • Snapshots — automatic SCD Type 2
  • Seeds — static CSVs loaded as reference data
  • Macros — reusable Jinja SQL functions
  • Packages — npm/PyPI-like dependencies

DB adapters

dbt is database-agnostic via adapters:

  • dbt-snowflake, dbt-bigquery, dbt-redshift, dbt-databricks
  • dbt-postgres, dbt-spark
  • dbt-duckdb (popular for local dev)

Modern Data Stack

dbt is the “T” of the Modern Data Stack:

  • Extract/Load — Fivetran, Airbyte, Meltano
  • Transform — dbt
  • Warehouse — Snowflake, BigQuery, Redshift, Databricks
  • BI — Looker, Metabase, Superset

dbt Labs

dbt Labs (ex Fishtown Analytics) is a VC-funded startup with commercial dbt Cloud product (SaaS) alongside open source dbt Core. Open core model.

In the Italian context

Italian adoption in companies with modern data warehouses: banks, retail, e-commerce, telco. Active Italian dbt meetup community from 2021.


References: dbt Core. Tristan Handy, Drew Banin, Fishtown Analytics/dbt Labs (2016). Apache 2.0 licence. Jinja2 templating. Modern Data Stack. dbt Cloud (commercial).

Need support? Under attack? Service Status
Need support? Under attack? Service Status