BenchBox Architecture¶
High-level overview of BenchBox’s design, components, and extension points.
Design Philosophy¶
BenchBox is built on three core principles:
Separation of Concerns: Benchmark logic, platform adapters, and query execution are independent layers
Extensibility: New benchmarks and platforms can be added without modifying core code
Standards Compliance: TPC benchmarks follow official specifications for reproducibility
Architecture Diagram¶
┌─────────────────────────────────────────────────────────────┐
│ CLI / Python API │
│ (User Entry Points) │
└──────────────────┬──────────────────────────────────────────┘
│
┌──────────────────┴──────────────────────────────────────────┐
│ Orchestration Layer │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Run Configuration │ Query Selection │ │
│ │ Parameter Generation│ Result Collection │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────┬──────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│Benchmark│ │Platform │ │ Results │
│ Layer │ │ Adapter │ │ Model │
└─────────┘ └─────────┘ └─────────┘
│ │ │
┌────▼────────────▼─────────────▼────┐
│ Database Connection │
│ (Platform-Specific Driver) │
└────────────────────────────────────┘
Core Components¶
1. Benchmark Layer¶
Location: benchbox/core/{benchmark}/
Responsibility: Encapsulates benchmark-specific logic
Key Classes:
BaseBenchmark: Abstract base class all benchmarks inherit from{Benchmark}: Implementation class (e.g.,TPCH,TPCDS){Benchmark}Generator: Data generation logic{Benchmark}Queries: Query templates and parameterization
Example:
from benchbox import TPCH
# Benchmark knows:
# - How to generate data (dbgen invocation)
# - How to retrieve queries (with parameter substitution)
# - Schema definitions
# - Validation rules
2. Platform Adapter Layer¶
Location: benchbox/platforms/{platform}/
Responsibility: Abstracts database-specific connection and execution logic
Key Classes:
{Platform}Adapter: Main adapter class (e.g.,DuckDBAdapter){Platform}Connection: Connection wrapper implementingDatabaseConnectionPlatform-specific optimizations (bulk loading, query hints)
Example:
from benchbox.platforms.duckdb import DuckDBAdapter
# Adapter handles:
# - Connection management
# - Data loading strategies (COPY, INSERT, external tables)
# - Query execution and error handling
# - Result collection and formatting
Supported Adapters:
DuckDB (embedded, local files)
ClickHouse (native protocol)
Databricks (SQL Warehouse, Unity Catalog)
BigQuery (serverless, cloud storage)
Snowflake (warehouse, stages)
Redshift (serverless, provisioned)
SQLite (testing, minimal datasets)
3. Results Model¶
Location: benchbox/core/results/
Responsibility: Structured representation of benchmark execution results
Key Classes:
BenchmarkResults: Main result object with timing, metadata, validationQueryResult: Individual query execution detailsExecutionPhases: Setup, power test, throughput test phasesValidationResult: Data quality and correctness checks
Schema:
{
"benchmark_name": "TPC-H",
"platform": "DuckDB",
"scale_factor": 1.0,
"execution_id": "tpch_1234567890",
"timestamp": "2025-10-12T10:30:00Z",
"total_execution_time": 45.2,
"query_results": [
{
"query_id": "q1",
"execution_time": 2.1,
"status": "SUCCESS",
"row_count": 4
}
],
"execution_phases": { ... },
"validation_status": "PASSED"
}
4. Connection Abstraction¶
Location: benchbox/core/connection.py
Responsibility: Unified interface for database operations
Key Interface:
class DatabaseConnection(ABC):
@abstractmethod
def execute(self, query: str) -> Any:
"""Execute query, return cursor/result"""
@abstractmethod
def fetchall(self, cursor) -> list:
"""Fetch all results from cursor"""
@abstractmethod
def close(self) -> None:
"""Close connection"""
All platform adapters implement this interface, enabling benchmark code to remain platform-agnostic.
5. Data Generation¶
Location: benchbox/core/{benchmark}/generator.py
Responsibility: Create synthetic benchmark data
Mechanisms:
TPC Benchmarks: Invoke official C binaries (
dbgen,dsdgen,datagen)Custom Benchmarks: Python-based generation using Faker, NumPy, Pandas
Output Formats: Parquet (default), CSV, JSON
Example:
# TPC-H uses official dbgen binary
generator = TPCHGenerator(scale_factor=1.0, output_dir="./data")
file_paths = generator.generate() # Returns list of .parquet files
# Custom benchmarks use Python
generator = CoffeeShopGenerator(scale_factor=0.001)
file_paths = generator.generate() # Generates with Faker
6. CLI Orchestration¶
Location: benchbox/cli/
Responsibility: Command-line interface and workflow orchestration
Commands:
# Run benchmark end-to-end
benchbox run --benchmark tpch --platform duckdb --scale 1
# Generate data only
benchbox datagen --benchmark tpcds --scale 0.1
# Dry run (preview queries)
benchbox run --benchmark tpch --dry-run ./output
# Check dependencies
benchbox check-deps --matrix
Orchestrator Flow:
Parse CLI arguments and config files
Validate platform dependencies
Initialize benchmark and platform adapter
Execute benchmark phases (setup, queries, validation)
Collect and save results
Generate reports
See: CLI Quick Start
Data Flow¶
End-to-End Execution Flow¶
1. User Command
├── benchbox run --benchmark tpch --platform duckdb --scale 1
└── Parsed by CLI → creates RunConfiguration
2. Benchmark Initialization
├── TPCH(scale_factor=1.0)
├── Checks if data exists at output_dir
└── generate_data() if needed
3. Platform Adapter Setup
├── DuckDBAdapter()
├── Establishes database connection
└── Creates schema (CREATE TABLE statements)
4. Data Loading Phase
├── Platform-specific bulk load (COPY FROM, external tables)
├── Validates row counts
└── Creates indexes/constraints
5. Query Execution Phase
├── For each query in benchmark:
│ ├── Get query text with parameters
│ ├── Execute via platform adapter
│ ├── Measure execution time
│ └── Collect results
└── Aggregate timing statistics
6. Results Collection
├── Create BenchmarkResults object
├── Include execution metadata (platform info, system profile)
├── Validate results (if validation rules exist)
└── Save to JSON file
7. Output
├── Print summary to console
├── Save detailed JSON to output_dir
└── Optional: Upload to cloud storage
Extension Points¶
Adding a New Benchmark¶
Create
benchbox/core/{benchmark}/directoryImplement classes:
{Benchmark}BenchmarkImpl(BaseBenchmark){Benchmark}Generator{Benchmark}Queries{Benchmark}Schema
Register in
benchbox/__init__.py
Adding a New Platform¶
Create
benchbox/platforms/{platform}/directoryImplement:
{Platform}Adapter{Platform}Connection(DatabaseConnection)
Add platform extras to
pyproject.tomlRegister in
benchbox/platforms/__init__.py
See: Adding New Platforms
Adding Query Parameter Variants¶
TPC benchmarks support query variants with different parameter substitutions:
# Get query with random parameters (default)
query = benchmark.get_query("q1")
# Get query with specific parameters
query = benchmark.get_query("q1", params={"date": "1998-09-02", "quantity": 24})
# Generate multiple variants for seed sweep
variants = benchmark.generate_query_variants("q1", count=5, seed_start=42)
See: TPC Patterns Usage
Design Patterns¶
1. Adapter Pattern¶
Platform adapters isolate database-specific logic, allowing benchmarks to remain platform-agnostic.
2. Template Method Pattern¶
BaseBenchmark defines the benchmark execution workflow, with subclasses providing specific implementations.
3. Strategy Pattern¶
Data loading strategies vary by platform (bulk COPY, INSERT batches, external tables) but implement a common interface.
4. Factory Pattern¶
get_platform_adapter(name) creates appropriate adapter instances based on configuration.
Performance Considerations¶
Data Loading Optimization¶
Parquet Format: Columnar storage for fast analytical queries
Bulk Loading: Platform-specific optimizations (DuckDB COPY, ClickHouse INSERT, BigQuery external tables)
Parallel Loading: Multi-threaded data ingestion where supported
Query Execution Optimization¶
Connection Pooling: Reuse connections across queries
Result Streaming: Fetch results incrementally for large result sets
Query Compilation: Leverage platform-specific prepared statements
Memory Management¶
Scale Factor Validation: Prevent OOM by validating scale vs. available memory
Streaming Results: Don’t materialize full result sets unless needed
Cleanup: Explicit connection and resource cleanup
Security Considerations¶
Credential Management¶
Environment variables for sensitive data
Support for platform-specific auth (OAuth, IAM, service accounts)
No credentials in configuration files or logs
SQL Injection Prevention¶
Parameterized queries where possible
Input validation on query IDs and parameters
No direct string concatenation into SQL
Data Privacy¶
Support for synthetic data only (no real customer data)
Anonymization support for custom benchmarks
Data residency controls for cloud platforms
Testing Architecture¶
Test Layers¶
Unit Tests (
tests/unit/): Component-level testingIntegration Tests (
tests/integration/): Database interaction testsLive Tests (
tests/integration/platforms/): Real cloud platform testing
See: Testing Guide