From incubation to top-level project
In January 2019 Apache Airflow is promoted to a top-level project of the Apache Software Foundation, completing the incubation process that began in 2016. The promotion reflects maturity achieved in both code and community: hundreds of companies use Airflow in production to orchestrate data pipelines, and the contributor base has grown to over seven hundred active developers.
Since its introduction as a tool for defining workflows as DAGs (Directed Acyclic Graphs) in Python code, Airflow has consolidated its role as the reference platform for batch process orchestration in the data world.
KubernetesExecutor and scalability
The most significant architectural addition is the KubernetesExecutor, which runs each DAG task as an isolated Kubernetes pod. Every task receives its own container with specific dependencies, allocated resources and process isolation. When execution finishes, the pod is destroyed. This model eliminates the need to maintain a fixed worker pool and enables elastic scalability: the cluster allocates resources only when tasks are running.
The KubernetesExecutor sits alongside the CeleryExecutor, which remains the preferred choice when task startup latency is critical. The ability to choose the executor based on workload makes Airflow adaptable to different scenarios — from a single server to a distributed cluster.
Connections, pools and web interface
The connections management system centralises credentials for databases, APIs and external services. Pools limit the number of concurrent tasks that can access a shared resource, preventing overload on databases or capacity-constrained services.
The web interface has matured into a complete operational tool: DAG status visualisation, Gantt charts for execution time analysis, individual task logs, manual retry management. For data engineering teams managing hundreds of pipelines, the visibility provided by the UI is an operational requirement, not an accessory.
Plugins and industrial adoption
The plugin system allows extending Airflow with custom operators, hooks to external systems, sensors and macros. The community has produced hundreds of operators for cloud services — AWS, Google Cloud, Azure — databases, messaging systems and machine learning platforms. This extensibility has transformed Airflow from a scheduling tool into a generic orchestration platform.
Adoption spans diverse sectors: ETL to data warehouses, feeding machine learning pipelines, cross-system synchronisation, report generation. For organisations that need to coordinate dozens of interdependent processes with reliability, retry and monitoring requirements, Airflow provides a consolidated infrastructure and an active community.
Link: airflow.apache.org
