Platform Documentation

Tags guide

Documentation for database platforms supported by BenchBox.

Platform Guides

DataFrame Platforms (Native API)

BenchBox supports benchmarking DataFrame libraries using their native APIs instead of SQL. This enables direct performance comparison between SQL and DataFrame paradigms on identical workloads.

Available DataFrame Platforms

Platform

CLI Name

Family

Status

Documentation

Polars

polars-df

Expression

Production-ready

Polars DataFrame

Pandas

pandas-df

Pandas

Production-ready

Pandas DataFrame

Modin

modin-df

Pandas

Production-ready

Modin DataFrame

Dask

dask-df

Pandas

Production-ready

Dask DataFrame

cuDF

cudf-df

Pandas

Production-ready

cuDF DataFrame

PySpark

pyspark-df

Expression

Production-ready

PySpark DataFrame

DataFusion

datafusion-df

Expression

Production-ready

DataFusion DataFrame

# Quick start with DataFrame platforms
benchbox run --platform polars-df --benchmark tpch --scale 0.1    # Recommended - fast
benchbox run --platform pandas-df --benchmark tpch --scale 0.1    # Familiar API
benchbox run --platform modin-df --benchmark tpch --scale 0.1     # Parallel Pandas
benchbox run --platform dask-df --benchmark tpch --scale 0.1      # Distributed
benchbox run --platform cudf-df --benchmark tpch --scale 0.1      # GPU (Linux only)
benchbox run --platform pyspark-df --benchmark tpch --scale 0.1   # Spark ecosystem
benchbox run --platform datafusion-df --benchmark tpch --scale 0.1

# Compare SQL vs DataFrame on same workload
benchbox run --platform polars --benchmark tpch --scale 0.1       # SQL mode
benchbox run --platform polars-df --benchmark tpch --scale 0.1    # DataFrame mode

SQL Platforms

Core Local Databases

These platforms are included in the base BenchBox installation with no additional dependencies:

  • DuckDB - Embedded analytical database (default local platform)

  • SQLite - Embedded row-store database for lightweight testing

Local/Embedded Analytics Engines

Traditional Relational Databases

  • PostgreSQL - Open-source relational database with TimescaleDB support

Distributed SQL Engines

  • PrestoDB - Distributed SQL query engine (Facebook fork)

  • Trino - Distributed SQL query engine (community fork, formerly PrestoSQL)

  • Apache Spark - Unified analytics engine for large-scale data processing

Cloud Data Warehouses

GPU-Accelerated Platforms

  • CUDF - NVIDIA RAPIDS GPU-accelerated DataFrames

Platform Categories

By Installation Complexity

Zero Config (included in base install):

  • DuckDB, SQLite

Single Extra (one pip install command):

  • DataFusion, Polars, PostgreSQL, ClickHouse

Cloud SDK Required (authentication setup needed):

  • Databricks, BigQuery, Redshift, Snowflake, Athena, Firebolt, Azure Synapse

Infrastructure Required (external cluster needed):

  • Trino, Presto, Spark, ClickHouse (server mode)

By Use Case

Local Development & Testing:

  • DuckDB (recommended), SQLite, DataFusion, Polars

Production Benchmarking:

  • Databricks, Snowflake, BigQuery, Redshift

Self-Hosted Analytics:

  • ClickHouse, PostgreSQL, Trino, Presto, Spark

GPU Workloads:

  • CUDF (NVIDIA GPUs required)

Quick Start by Platform

Local Platforms (No Setup Required)

# DuckDB - Default, included in base install
benchbox run --platform duckdb --benchmark tpch --scale 0.01

# SQLite - Included in base install
benchbox run --platform sqlite --benchmark tpch --scale 0.01

Cloud Platforms (Credentials Required)

# Databricks - Requires DATABRICKS_TOKEN and DATABRICKS_HOST
benchbox run --platform databricks --benchmark tpch --scale 1.0

# BigQuery - Requires GOOGLE_APPLICATION_CREDENTIALS
benchbox run --platform bigquery --benchmark tpch --scale 1.0

# Snowflake - Requires SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_ACCOUNT
benchbox run --platform snowflake --benchmark tpch --scale 1.0

Future Platforms