Platform Documentation¶
Documentation for database platforms supported by BenchBox.
Platform Guides¶
Platform Selection Guide - Choose the right platform for your needs
Quick Reference - Quick comparison and feature matrix
Comparison Matrix - Detailed platform comparison
Deployment Modes - Platform deployment architecture (local, self-hosted, cloud)
DataFrame Platforms (Native API)¶
BenchBox supports benchmarking DataFrame libraries using their native APIs instead of SQL. This enables direct performance comparison between SQL and DataFrame paradigms on identical workloads.
DataFrame Platforms Overview - Architecture, installation, and usage guide
Available DataFrame Platforms¶
Platform |
CLI Name |
Family |
Status |
Documentation |
|---|---|---|---|---|
Polars |
|
Expression |
Production-ready |
|
Pandas |
|
Pandas |
Production-ready |
|
Modin |
|
Pandas |
Production-ready |
|
Dask |
|
Pandas |
Production-ready |
|
cuDF |
|
Pandas |
Production-ready |
|
PySpark |
|
Expression |
Production-ready |
|
DataFusion |
|
Expression |
Production-ready |
# Quick start with DataFrame platforms
benchbox run --platform polars-df --benchmark tpch --scale 0.1 # Recommended - fast
benchbox run --platform pandas-df --benchmark tpch --scale 0.1 # Familiar API
benchbox run --platform modin-df --benchmark tpch --scale 0.1 # Parallel Pandas
benchbox run --platform dask-df --benchmark tpch --scale 0.1 # Distributed
benchbox run --platform cudf-df --benchmark tpch --scale 0.1 # GPU (Linux only)
benchbox run --platform pyspark-df --benchmark tpch --scale 0.1 # Spark ecosystem
benchbox run --platform datafusion-df --benchmark tpch --scale 0.1
# Compare SQL vs DataFrame on same workload
benchbox run --platform polars --benchmark tpch --scale 0.1 # SQL mode
benchbox run --platform polars-df --benchmark tpch --scale 0.1 # DataFrame mode
SQL Platforms¶
Core Local Databases¶
These platforms are included in the base BenchBox installation with no additional dependencies:
Local/Embedded Analytics Engines¶
Apache DataFusion - Rust-based in-memory query engine
ClickHouse Local Mode - Running ClickHouse embedded
Polars - High-performance DataFrame library with SQL support
Traditional Relational Databases¶
PostgreSQL - Open-source relational database with TimescaleDB support
Distributed SQL Engines¶
PrestoDB - Distributed SQL query engine (Facebook fork)
Trino - Distributed SQL query engine (community fork, formerly PrestoSQL)
Apache Spark - Unified analytics engine for large-scale data processing
Cloud Data Warehouses¶
Snowflake - Cloud-native data platform (AWS, Azure, GCP)
Databricks - Lakehouse platform with Unity Catalog support
Google BigQuery - Serverless data warehouse
Amazon Redshift - Columnar data warehouse
MotherDuck - Serverless DuckDB cloud (inherits DuckDB dialect)
Starburst - Managed Trino / Starburst Galaxy (inherits Trino dialect)
AWS Athena - Serverless query service for S3
Firebolt - High-performance cloud analytics (Core + Cloud modes)
Azure Synapse Analytics - Microsoft cloud analytics platform
Microsoft Fabric - Microsoft unified analytics platform
GPU-Accelerated Platforms¶
CUDF - NVIDIA RAPIDS GPU-accelerated DataFrames
Platform Categories¶
By Installation Complexity¶
Zero Config (included in base install):
DuckDB, SQLite
Single Extra (one pip install command):
DataFusion, Polars, PostgreSQL, ClickHouse
Cloud SDK Required (authentication setup needed):
Databricks, BigQuery, Redshift, Snowflake, Athena, Firebolt, Azure Synapse
Infrastructure Required (external cluster needed):
Trino, Presto, Spark, ClickHouse (server mode)
By Use Case¶
Local Development & Testing:
DuckDB (recommended), SQLite, DataFusion, Polars
Production Benchmarking:
Databricks, Snowflake, BigQuery, Redshift
Self-Hosted Analytics:
ClickHouse, PostgreSQL, Trino, Presto, Spark
GPU Workloads:
CUDF (NVIDIA GPUs required)
Quick Start by Platform¶
Local Platforms (No Setup Required)¶
# DuckDB - Default, included in base install
benchbox run --platform duckdb --benchmark tpch --scale 0.01
# SQLite - Included in base install
benchbox run --platform sqlite --benchmark tpch --scale 0.01
Cloud Platforms (Credentials Required)¶
# Databricks - Requires DATABRICKS_TOKEN and DATABRICKS_HOST
benchbox run --platform databricks --benchmark tpch --scale 1.0
# BigQuery - Requires GOOGLE_APPLICATION_CREDENTIALS
benchbox run --platform bigquery --benchmark tpch --scale 1.0
# Snowflake - Requires SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_ACCOUNT
benchbox run --platform snowflake --benchmark tpch --scale 1.0
Future Platforms¶
Future Platforms - Planned platform support and roadmap