BenchBox Architecture¶
Overview¶
BenchBox is a modular SQL and DataFrame benchmarking framework for OLAP databases. The architecture separates concerns into four layers:
Benchmarks — dataset definitions, schemas, queries, and data generation
Platforms — database adapters for SQL and DataFrame execution
Core — shared infrastructure (runner, results, validation, visualization, etc.)
CLI — user-facing commands and execution orchestration
Execution Model¶
BenchBox uses a lifecycle-based execution model (not Template Method). A benchmark run progresses through phases orchestrated by run_benchmark_lifecycle() in benchbox.core.runner.runner:
generate → load → warmup → power → throughput → maintenance
Key Types¶
Type |
Location |
Purpose |
|---|---|---|
|
|
Controls which phases to run |
|
|
Complete run output with phase results |
|
|
Benchmark name, scale, query selection (Pydantic) |
|
|
Connection and platform configuration (Pydantic) |
|
|
Abstract base for all SQL platform adapters |
SQL Execution Path¶
CLI (run command)
→ BenchmarkOrchestrator (cli/orchestrator.py)
→ run_benchmark_lifecycle() (core/runner/runner.py)
→ PlatformAdapter.execute_query() for each query
→ BenchmarkResults
The PlatformAdapter base class (benchbox/platforms/base/adapter.py) provides the interface that all 33 SQL platform adapters implement. Each adapter handles connection management, DDL generation, data loading, and query execution for its target database.
DataFrame Execution Path¶
DataFrame benchmarks use a parallel execution path:
CLI (run command with --platform *-df)
→ run_dataframe_benchmark() (core/runner/dataframe_runner.py)
→ DataFrameContext (core/dataframe/context.py)
→ ExpressionFamilyAdapter or PandasFamilyAdapter
→ BenchmarkResults
Type |
Location |
Purpose |
|---|---|---|
|
|
Protocol for table access and column references |
|
|
Base for Polars, PySpark, DataFusion, LakeSail |
|
|
Base for Pandas, Modin, cuDF, Dask |
The family-based adapter architecture means adding a new expression-style platform (e.g., Polars-like API) requires only implementing a thin adapter on top of ExpressionFamilyAdapter, inheriting query translation, tuning, and execution logic.
Benchmark Layer¶
Each benchmark (TPC-H, TPC-DS, SSB, ClickBench, etc.) lives under benchbox/core/<benchmark_id>/ and provides:
Schema — table definitions and DDL generation via
get_create_tables_sql(dialect, tuning_config)Queries — SQL templates with dialect translation via sqlglot
Data generation — using official TPC tools (dbgen/dsdgen) or built-in generators
Validation — expected result counts and answer verification
All benchmarks inherit from BaseBenchmark (benchbox/base.py). Benchmarks are registered in benchbox/core/benchmark_registry.py which maps CLI names (e.g., tpch) to class names and metadata.
There are currently 20 benchmarks across TPC standards, academic, industry, and primitives categories.
Platform Layer¶
SQL Platforms (33 adapters)¶
All SQL adapters inherit from PlatformAdapter and implement:
get_connection_from_pool()/close_connection()— connection lifecycleexecute_query()— query execution with timingget_create_tables_sql()— platform-specific DDLload_data()— bulk data loading
Platforms span local engines (DuckDB, SQLite, DataFusion), cloud warehouses (Snowflake, BigQuery, Databricks, Redshift), and specialized systems (ClickHouse, StarRocks, QuestDB, TimescaleDB, etc.).
DataFrame Platforms (8 adapters)¶
DataFrame adapters are organized by API family:
Expression family: Polars, PySpark, DataFusion, LakeSail
Pandas family: Pandas, Modin, cuDF, Dask
Core Infrastructure¶
The benchbox/core/ directory contains 39 subsystems:
Subsystem |
Purpose |
|---|---|
|
Benchmark lifecycle orchestration |
|
Result models, serialization, aggregation |
|
Answer validation, data verification |
|
ASCII chart generation and result rendering |
|
DataFrame execution context, profiling, tuning |
|
Query plan capture and analysis |
|
Unified tuning configuration system |
|
Sorted ingestion, clustering strategies |
|
Result comparison and regression detection |
|
Cloud cost estimation |
|
Statistical analysis of benchmark results |
|
Interface contracts and type validation |
CLI Layer¶
The CLI (benchbox/cli/) uses Click and provides 25+ commands. The main run command orchestrates the full benchmark lifecycle through BenchmarkOrchestrator (cli/orchestrator.py) → run_benchmark_lifecycle().
Key command groups: run, compare, visualize, report, metrics, tuning, platforms, shell, datagen, setup.
Visualization¶
BenchBox provides ASCII chart visualization via core/visualization/:
ASCII charts (
core/visualization/ascii/) — 15+ chart types rendered as terminal text (bar, box plot, heatmap, histogram, scatter, CDF, sparkline, etc.)ResultPlotter (
core/visualization/result_plotter.py) — normalizes JSON results and orchestrates chart renderingTemplates (
core/visualization/templates.py) — named chart combinations (e.g.,flagship,comparison,executive_summary)
MCP Integration¶
The benchbox/mcp/ package exposes BenchBox functionality as MCP (Model Context Protocol) tools, enabling AI assistants to discover, run, and analyze benchmarks.
Key Design Decisions¶
Lifecycle over Template Method — execution phases are data-driven, not inheritance-driven
Family-based DataFrame adapters — minimize code duplication across similar platforms
Official TPC tools — use dbgen/dsdgen for specification-compliant data generation
sqlglot translation — single query definition with automatic dialect translation
Lazy platform loading — heavy SDK imports deferred until platform is actually used
ASCII visualization — terminal-rendered charts for CI/CD and MCP integration