Polars: DataFrames in Rust, pandas successor

Polars (Ritchie Vink, 2020-2022): DataFrame library in Rust with Python bindings, columnar, lazy evaluation, query optimiser. 10-100x faster than pandas on large datasets, alternative fluent API.

Open SourceR&D PolarsRitchie VinkRustDataFramesPythonpandasOpen Source

pandas limits

pandas (Wes McKinney, 2008) is the dominant Python DataFrame library, but shows limits on modern scales:

  • Single-threaded by default — does not leverage multi-core CPUs
  • Eager evaluation — every operation runs immediately
  • Memory — eager-loaded DataFrame in RAM, problems with data > memory

Polars, created by Ritchie Vink (Dutch engineer) from 2020, responds with a Rust rewrite:

  • Rust core with underlying Arrow format
  • Lazy evaluation — operation chain optimised before execution
  • Multi-threaded by default
  • Streaming — datasets larger than RAM
  • Query optimiser — predicate pushdown, projection pushdown

MIT licence. Version 0.14-0.15 (autumn 2022) consolidates production maturity, 1.0 reached July 2024.

API

Polars has two modes:

Eager (familiar from pandas):

import polars as pl
df = pl.read_csv("data.csv")
result = df.filter(pl.col("age") > 30).group_by("country").agg(pl.col("salary").mean())

Lazy (optimised):

result = (
    pl.scan_csv("data.csv")
      .filter(pl.col("age") > 30)
      .group_by("country")
      .agg(pl.col("salary").mean())
      .collect()
)

In lazy mode, Polars builds an execution plan, optimises it, then executes with minimum overhead.

Performance

Public benchmarks (TPC-H, DB-benchmark): Polars 10-100x faster than pandas on medium-large datasets. Competitive with Spark on single-node; Dask/Ray for distributed.

Interoperability

Polars integrates with:

  • pandas.to_pandas() and .from_pandas()
  • NumPy
  • Arrow (shared data format with Spark, DuckDB, others)
  • Parquet, CSV, JSON, Avro
  • PyArrow, pyarrow-flight

In the Italian context

Rapid adoption in Italian data teams from 2023 for scenarios where pandas is too slow but Spark is overkill.


References: Polars. Ritchie Vink. Rust + Python bindings. MIT licence. Arrow format. Lazy evaluation + query optimiser. 1.0 (July 2024). Modern alternative to pandas.

Need support? Under attack? Service Status
Need support? Under attack? Service Status