BenchBox Repository Structure¶
Top-Level Layout¶
BenchBox/
├── benchbox/ # Main package (see below)
├── tests/ # Test suite
├── docs/ # Documentation (Sphinx/MyST)
├── docker/ # Docker configurations
├── examples/ # Example scripts and configs
├── scripts/ # Development and CI scripts
├── _binaries/ # TPC tool binaries (dsdgen, dsqgen)
├── _project/ # Project management (TODOs, indexes)
├── _sources/ # TPC specification source files
├── pyproject.toml # Project metadata and dependencies
├── Makefile # Development task shortcuts
└── CLAUDE.md # AI assistant instructions
Package Structure (benchbox/)¶
Root Module Files¶
Benchmark wrapper classes that re-export core implementations:
benchbox/
├── __init__.py # Package exports
├── base.py # BaseBenchmark abstract base class
├── tpch.py # TPC-H wrapper
├── tpcds.py # TPC-DS wrapper
├── tpcdi.py # TPC-DI wrapper
├── ssb.py # Star Schema Benchmark wrapper
├── clickbench.py # ClickBench wrapper
├── h2odb.py # H2ODB wrapper
├── amplab.py # AMPLab wrapper
├── joinorder.py # Join Order Benchmark wrapper
├── coffeeshop.py # CoffeeShop wrapper
├── nyctaxi.py # NYC Taxi wrapper
├── tsbs_devops.py # TSBS DevOps wrapper
├── datavault.py # Data Vault wrapper
├── tpcds_obt.py # TPC-DS One Big Table wrapper
├── tpch_skew.py # TPC-H Skew wrapper
├── tpchavoc.py # TPC-Havoc wrapper
├── read_primitives.py # Read Primitives wrapper
├── write_primitives.py # Write Primitives wrapper
├── transaction_primitives.py # Transaction Primitives wrapper
└── metadata_primitives.py # Metadata Primitives wrapper
core/ — Core Infrastructure (39 subdirectories)¶
benchbox/core/
├── runner/ # Benchmark lifecycle execution
│ ├── runner.py # run_benchmark_lifecycle(), LifecyclePhases
│ ├── dataframe_runner.py # run_dataframe_benchmark()
│ └── conversion.py # Format conversion orchestration
├── results/ # Result models and serialization
│ ├── models.py # BenchmarkResults, ExecutionPhases, QueryExecution
│ ├── builder.py # ResultBuilder (centralized aggregation)
│ └── ... # loader, exporter, normalizer, etc.
├── schemas.py # Pydantic models (BenchmarkConfig, DatabaseConfig, etc.)
├── benchmark_registry.py # Benchmark name → class mapping and metadata
├── platform_registry.py # Platform capabilities and registration
│
├── validation/ # Data and result validation
├── visualization/ # ASCII chart generation (15+ chart types)
│ ├── result_plotter.py # ResultPlotter orchestration
│ ├── templates.py # Named chart combinations
│ └── ascii/ # Chart implementations (bar, box, heatmap, etc.)
├── dataframe/ # DataFrame execution context and profiling
│ ├── context.py # DataFrameContext protocol
│ ├── query.py # DataFrameQuery (dual pandas/expression impls)
│ ├── profiling.py # DataFrameProfiler
│ └── tuning/ # DataFrame tuning configuration
├── query_plans/ # Query plan capture and analysis
├── tuning/ # Unified SQL+DataFrame tuning system
├── data_organization/ # Sorted ingestion, clustering strategies
├── comparison/ # Cross-run result comparison
├── analysis/ # Statistical analysis
├── cost/ # Cloud cost estimation
├── contracts/ # Interface contracts
├── manifest/ # Data manifest tracking
├── expected_results/ # Expected query results for validation
├── databases/ # Database metadata
├── operations/ # Complex operation implementations
├── publishing/ # Result publishing
├── primitives/ # Shared primitives infrastructure
│
├── tpch/ # TPC-H benchmark implementation
├── tpcds/ # TPC-DS benchmark implementation
├── tpcdi/ # TPC-DI benchmark implementation
├── ssb/ # Star Schema Benchmark implementation
├── clickbench/ # ClickBench implementation
├── h2odb/ # H2ODB implementation
├── amplab/ # AMPLab implementation
├── joinorder/ # Join Order Benchmark implementation
├── coffeeshop/ # CoffeeShop implementation
├── nyctaxi/ # NYC Taxi implementation
├── tsbs_devops/ # TSBS DevOps implementation
├── datavault/ # Data Vault implementation
├── tpcds_obt/ # TPC-DS One Big Table implementation
├── tpch_skew/ # TPC-H Skew implementation
├── tpchavoc/ # TPC-Havoc implementation
├── read_primitives/ # Read Primitives implementation
├── write_primitives/ # Write Primitives implementation
├── transaction_primitives/ # Transaction Primitives implementation
├── metadata_primitives/ # Metadata Primitives implementation
├── ai_primitives/ # AI/ML Primitives implementation
└── utils/ # Core utilities
platforms/ — Database Adapters¶
benchbox/platforms/
├── base/ # Base adapter classes
│ ├── adapter.py # PlatformAdapter (abstract base, ~5000 lines)
│ ├── sql_execution.py # SQL execution engine
│ ├── data_loading.py # Data loading and staging
│ ├── models.py # Phase data models
│ └── ... # Mixins, validation, format capabilities
│
├── duckdb.py # DuckDB adapter
├── sqlite.py # SQLite adapter
├── postgresql.py # PostgreSQL adapter
├── snowflake.py # Snowflake adapter
├── bigquery.py # BigQuery adapter
├── databricks/ # Databricks adapter (subpackage)
├── redshift.py # Redshift adapter
├── athena.py # Athena adapter
├── clickhouse/ # ClickHouse adapter (subpackage)
├── clickhouse_cloud.py # ClickHouse Cloud adapter
├── datafusion.py # DataFusion SQL adapter
├── trino.py # Trino adapter
├── presto.py # Presto adapter
├── motherduck.py # MotherDuck adapter
├── pg_duckdb.py # pg_duckdb extension adapter
├── pg_mooncake.py # pg_mooncake extension adapter
├── firebolt.py # Firebolt adapter
├── doris.py # Apache Doris adapter
├── databend/ # Databend adapter (subpackage)
├── starrocks/ # StarRocks adapter (subpackage)
├── timescaledb.py # TimescaleDB adapter
├── questdb.py # QuestDB adapter
├── influxdb/ # InfluxDB adapter (subpackage)
├── azure_synapse.py # Azure Synapse adapter
├── fabric_warehouse.py # Microsoft Fabric Warehouse adapter
├── fabric_lakehouse.py # Microsoft Fabric Lakehouse adapter
├── fabric_spark.py # Microsoft Fabric Spark adapter
├── starburst.py # Starburst adapter
├── snowpark_connect.py # Snowpark Connect adapter
├── lakesail.py # LakeSail adapter
├── spark.py # Generic Spark adapter
├── pyspark/ # PySpark SQL adapter (subpackage)
├── polars_platform.py # Polars SQL adapter
├── cudf.py # cuDF SQL adapter
├── adapter_factory.py # Unified adapter factory (get_adapter)
│
└── dataframe/ # DataFrame platform adapters
├── expression_family.py # ExpressionFamilyAdapter base class
├── pandas_family.py # PandasFamilyAdapter base class
├── benchmark_mixin.py # BenchmarkExecutionMixin (run_benchmark)
├── polars_df.py # Polars DataFrame adapter
├── pandas_df.py # Pandas DataFrame adapter
├── pyspark_df.py # PySpark DataFrame adapter
├── datafusion_df.py # DataFusion DataFrame adapter
├── cudf_df.py # cuDF DataFrame adapter
├── modin_df.py # Modin DataFrame adapter
├── dask_df.py # Dask DataFrame adapter
├── lakesail_df.py # LakeSail DataFrame adapter
├── shared_loading.py # Shared data loading logic
└── platform_checker.py # Platform capability detection
cli/ — Command-Line Interface¶
benchbox/cli/
├── main.py # CLI entry point and command registration
├── orchestrator.py # BenchmarkOrchestrator (run lifecycle)
├── execution.py # BenchmarkExecutor
├── execution_pipeline.py # ExecutionPipeline
├── commands/ # Click command implementations
│ ├── run.py # benchbox run (main command)
│ ├── compare.py # benchbox compare
│ ├── visualize.py # benchbox visualize
│ ├── report.py # benchbox report (group: rankings, trends, etc.)
│ ├── metrics.py # benchbox metrics (group: qphh)
│ ├── aggregate.py # benchbox aggregate
│ ├── datagen.py # benchbox datagen
│ ├── setup.py # benchbox setup (cloud credentials)
│ ├── shell.py # benchbox shell
│ ├── convert.py # benchbox convert
│ ├── tuning_group.py # benchbox tuning (init, validate, defaults)
│ ├── show_plan.py # benchbox show-plan
│ ├── plan_history.py # benchbox plan-history
│ ├── download_answers.py # benchbox download-answers
│ └── ... # export, results, profile, checks, etc.
├── config.py # CLI configuration
├── display.py # Output formatting
├── progress.py # Progress bars
├── onboarding.py # Interactive onboarding
├── presentation/ # Output presentation layer
└── ... # Validation, platform hooks, etc.
Other Top-Level Packages¶
benchbox/
├── mcp/ # MCP server for AI assistant integration
├── utils/ # Shared utilities (file I/O, formatting, clock)
├── monitoring/ # Performance monitoring and profiling
├── security/ # Credential management
├── experimental/ # Experimental features
├── data/ # Static data resources
├── examples/ # Example configurations
├── release/ # Release management
└── _binaries/ # TPC tool binaries (platform-specific)