Platform Documentation¶
Documentation for database platforms supported by BenchBox.
Platform Guides¶
Platform Selection Guide - Choose the right platform for your needs
Quick Reference - Quick comparison and feature matrix
Comparison Matrix - Detailed platform comparison
Deployment Modes - Platform deployment architecture (local, self-hosted, cloud)
DataFrame Platforms (Native API)¶
BenchBox supports benchmarking DataFrame libraries using their native APIs instead of SQL. This enables direct performance comparison between SQL and DataFrame paradigms on identical workloads.
DataFrame Platforms Overview - Architecture, installation, and usage guide
Available DataFrame Platforms¶
Platform |
CLI Name |
Family |
Status |
Documentation |
|---|---|---|---|---|
Polars |
|
Expression |
Production-ready |
|
Pandas |
|
Pandas |
Production-ready |
|
Modin |
|
Pandas |
Production-ready |
|
Dask |
|
Pandas |
Production-ready |
|
cuDF |
|
Pandas |
Production-ready |
|
PySpark |
|
Expression |
Production-ready |
|
LakeSail |
|
Expression |
Production-ready |
|
DataFusion |
|
Expression |
Production-ready |
# Quick start with DataFrame platforms
benchbox run --platform polars-df --benchmark tpch --scale 0.1 # Recommended - fast
benchbox run --platform pandas-df --benchmark tpch --scale 0.1 # Familiar API
benchbox run --platform modin-df --benchmark tpch --scale 0.1 # Parallel Pandas
benchbox run --platform dask-df --benchmark tpch --scale 0.1 # Distributed
benchbox run --platform cudf-df --benchmark tpch --scale 0.1 # GPU (Linux only)
benchbox run --platform pyspark-df --benchmark tpch --scale 0.1 # Spark ecosystem
benchbox run --platform lakesail-df --benchmark tpch --scale 0.1 # Sail (fast Spark)
benchbox run --platform datafusion-df --benchmark tpch --scale 0.1
# Compare SQL vs DataFrame on same workload
benchbox run --platform polars --benchmark tpch --scale 0.1 # SQL mode
benchbox run --platform polars-df --benchmark tpch --scale 0.1 # DataFrame mode
SQL Platforms¶
Core Local Databases¶
These platforms are included in the base BenchBox installation with no additional dependencies:
Local/Embedded Analytics Engines¶
Apache DataFusion - Rust-based in-memory query engine
ClickHouse Local Mode - Running ClickHouse embedded
Polars - High-performance DataFrame library with SQL support
Traditional Relational Databases¶
PostgreSQL - Open-source relational database with TimescaleDB support
Self-Hosted OLAP Databases¶
StarRocks - MPP columnar OLAP database (MySQL protocol + Stream Load)
Apache Doris - MPP real-time analytical database (MySQL protocol + Stream Load)
QuestDB - Time-series database (PostgreSQL wire protocol + REST API)
Distributed SQL Engines¶
PrestoDB - Distributed SQL query engine (Facebook fork)
Trino - Distributed SQL query engine (community fork, formerly PrestoSQL)
Apache Spark - Unified analytics engine for large-scale data processing
LakeSail Sail - High-performance Spark-compatible engine (Spark Connect)
Cloud Data Warehouses¶
ClickHouse Cloud - Managed ClickHouse service (inherits ClickHouse dialect)
Snowflake - Cloud-native data platform (AWS, Azure, GCP)
Databricks - Lakehouse platform with Unity Catalog support
Onehouse Quanton - Serverless Spark with multi-format support (Hudi, Iceberg, Delta)
Google BigQuery - Serverless data warehouse
Amazon Redshift - Columnar data warehouse
MotherDuck - Serverless DuckDB cloud (inherits DuckDB dialect)
Starburst - Managed Trino / Starburst Galaxy (inherits Trino dialect)
Amazon Athena - Serverless query service for S3
Firebolt - High-performance cloud analytics (Core + Cloud modes)
Databend - Cloud-native OLAP warehouse (Snowflake-compatible SQL, S3/MinIO storage)
Azure Synapse Analytics - Microsoft cloud analytics platform
Microsoft Fabric - Microsoft unified analytics platform
TimescaleDB / TigerData - TimescaleDB with managed TigerData cloud mode (
timescaledb:cloud)
GPU-Accelerated Platforms¶
CUDF - NVIDIA RAPIDS GPU-accelerated DataFrames
Platform Categories¶
By Installation Complexity¶
Zero Config (included in base install):
DuckDB, SQLite
Single Extra (one pip install command):
DataFusion, Polars, PostgreSQL, ClickHouse
Cloud SDK Required (authentication setup needed):
Databricks SQL, Databend, BigQuery, Redshift, Snowflake, Amazon Athena, Firebolt, Azure Synapse Analytics
Infrastructure Required (external cluster needed):
Trino, Presto, Spark, LakeSail, StarRocks, Doris, QuestDB, ClickHouse (server mode)
By Use Case¶
Local Development & Testing:
DuckDB (recommended), SQLite, DataFusion, Polars
Production Benchmarking:
Databricks, Snowflake, BigQuery, Redshift
Self-Hosted Analytics:
ClickHouse, StarRocks, Doris, QuestDB, PostgreSQL, Trino, Presto, Spark, LakeSail
GPU Workloads:
CUDF (NVIDIA GPUs required)
Quick Start by Platform¶
Local Platforms (No Setup Required)¶
# DuckDB - Default, included in base install
benchbox run --platform duckdb --benchmark tpch --scale 0.01
# SQLite - Included in base install
benchbox run --platform sqlite --benchmark tpch --scale 0.01
Cloud Platforms (Credentials Required)¶
# Databricks - Requires DATABRICKS_TOKEN and DATABRICKS_HOST
benchbox run --platform databricks --benchmark tpch --scale 1.0
# BigQuery - Requires GOOGLE_APPLICATION_CREDENTIALS
benchbox run --platform bigquery --benchmark tpch --scale 1.0
# Snowflake - Requires SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_ACCOUNT
benchbox run --platform snowflake --benchmark tpch --scale 1.0
Future Platforms¶
Development Roadmap - Planned platform and benchmark additions