Base Benchmark API¶
The benchbox.base module provides the foundational abstract class that all benchmarks inherit from.
Overview¶
Every benchmark in BenchBox extends BaseBenchmark, which provides a standardized interface for:
Data generation and schema setup
Query execution and timing
Platform adapter integration
SQL dialect translation
Results collection and formatting
This abstraction ensures consistent behavior across all benchmark implementations (TPC-H, TPC-DS, ClickBench, etc.).
Quick Example¶
from benchbox.tpch import TPCH
from benchbox.platforms import DuckDBAdapter
# Create benchmark instance
benchmark = TPCH(scale_factor=0.01)
# Generate data files
data_files = benchmark.generate_data()
# Run with platform adapter
adapter = DuckDBAdapter()
results = benchmark.run_with_platform(adapter)
print(f"Completed {results.successful_queries}/{results.total_queries} queries")
print(f"Average query time: {results.average_query_time:.3f}s")
Core Classes¶
- class BaseBenchmark(scale_factor=1.0, output_dir=None, **kwargs)[source]¶
Bases:
VerbosityMixin,ABCBase class for all benchmarks.
All benchmarks inherit from this class.
- __init__(scale_factor=1.0, output_dir=None, **kwargs)[source]¶
Initialize a benchmark.
- Parameters:
scale_factor (float) – Scale factor (1.0 = standard size)
output_dir (str | Path | None) – Data output directory
**kwargs (Any) – Additional options
- get_data_source_benchmark()[source]¶
Return the canonical source benchmark when data is shared.
Benchmarks that reuse data generated by another benchmark (for example,
PrimitivesreusingTPC-Hdatasets) should override this method and return the lower-case identifier of the source benchmark. Benchmarks that produce their own data should returnNone(default).
- abstractmethod generate_data()[source]¶
Generate benchmark data.
- Returns:
List of data file paths
- Return type:
list[str | Path]
- abstractmethod get_queries()[source]¶
Get all benchmark queries.
- Returns:
Dictionary mapping query IDs to query strings
- Return type:
dict[str, str]
- abstractmethod get_query(query_id, *, params=None)[source]¶
Get a benchmark query.
- Parameters:
query_id (int | str) – Query ID
params (dict[str, Any] | None) – Optional parameters
- Returns:
Query string with parameters resolved
- Raises:
ValueError – If query_id is invalid
- Return type:
str
- setup_database(connection)[source]¶
Set up database with schema and data.
Creates necessary database schema and loads benchmark data into the database.
- Parameters:
connection (DatabaseConnection) – Database connection to set up
- Raises:
ValueError – If data generation fails
Exception – If database setup fails
- run_query(query_id, connection, params=None, fetch_results=False)[source]¶
Execute single query and return timing and results.
- Parameters:
query_id (int | str) – ID of the query to execute
connection (DatabaseConnection) – Database connection to execute query on
params (dict[str, Any] | None) – Optional parameters for query customization
fetch_results (bool) – Whether to fetch and return query results
- Returns:
query_id: Executed query ID
execution_time: Time taken to execute query in seconds
query_text: Executed query text
results: Query results if fetch_results=True, otherwise None
row_count: Number of rows returned (if results fetched)
- Return type:
Dictionary containing
- Raises:
ValueError – If query_id is invalid
Exception – If query execution fails
- run_benchmark(connection, query_ids=None, fetch_results=False, setup_database=True)[source]¶
Run the complete benchmark suite.
- Parameters:
connection (DatabaseConnection) – Database connection to execute queries on
query_ids (list[int | str] | None) – Optional list of specific query IDs to run (defaults to all)
fetch_results (bool) – Whether to fetch and return query results
setup_database (bool) – Whether to set up the database first
- Returns:
benchmark_name: Name of the benchmark
total_execution_time: Total time for all queries
total_queries: Number of queries executed
successful_queries: Number of queries that succeeded
failed_queries: Number of queries that failed
query_results: List of individual query results
setup_time: Time taken for database setup (if performed)
- Return type:
Dictionary containing
- Raises:
Exception – If benchmark execution fails
- run_with_platform(platform_adapter, **run_config)[source]¶
Run complete benchmark using platform-specific optimizations.
This method provides a unified interface for running benchmarks using database platform adapters that handle connection management, data loading optimizations, and query execution.
This is the standard method that all benchmarks should support for integration with the CLI and other orchestration tools.
- Parameters:
platform_adapter – Platform adapter instance (e.g., DuckDBAdapter)
**run_config – Configuration options: - categories: List of query categories to run (if benchmark supports) - query_subset: List of specific query IDs to run - connection: Connection configuration - benchmark_type: Type hint for optimizations (‘olap’, ‘oltp’, etc.)
- Returns:
BenchmarkResults object with execution results
Example
from benchbox.platforms import DuckDBAdapter
benchmark = SomeBenchmark(scale_factor=0.1) adapter = DuckDBAdapter() results = benchmark.run_with_platform(adapter)
- format_results(benchmark_result)[source]¶
Format benchmark results for display.
- Parameters:
benchmark_result (dict[str, Any]) – Result dictionary from run_benchmark()
- Returns:
Formatted string representation of the results
- Return type:
str
- translate_query(query_id, dialect)[source]¶
Translate a query to a specific SQL dialect.
- Parameters:
query_id (int | str) – The ID of the query to translate
dialect (str) – The target SQL dialect
- Returns:
The translated query string
- Raises:
ValueError – If the query_id is invalid
ImportError – If sqlglot is not installed
ValueError – If the dialect is not supported
- Return type:
str
- property benchmark_name: str¶
Get the human-readable benchmark name.
- create_enhanced_benchmark_result(platform, query_results, execution_metadata=None, phases=None, resource_utilization=None, performance_characteristics=None, **kwargs)[source]¶
Create a BenchmarkResults object with standardized fields.
This centralizes the logic for creating benchmark results that was previously duplicated across platform adapters and CLI orchestrator.
- Parameters:
platform (str) – Platform name (e.g., “DuckDB”, “ClickHouse”)
query_results (list[dict[str, Any]]) – List of query execution results
execution_metadata (dict[str, Any] | None) – Optional execution metadata
phases (dict[str, dict[str, Any]] | None) – Optional phase tracking information
resource_utilization (dict[str, Any] | None) – Optional resource usage metrics
performance_characteristics (dict[str, Any] | None) – Optional performance analysis
**kwargs (Any) – Additional fields to override defaults
- Returns:
Fully configured BenchmarkResults object
- Return type:
Key Methods¶
Data Generation¶
Query Access¶
- abstractmethod BaseBenchmark.get_queries()[source]¶
Get all benchmark queries.
- Returns:
Dictionary mapping query IDs to query strings
- Return type:
dict[str, str]
Required override - Returns all queries for the benchmark.
Example return value:
{ "q1": "SELECT ...", "q2": "SELECT ...", # ... }
- abstractmethod BaseBenchmark.get_query(query_id, *, params=None)[source]¶
Get a benchmark query.
- Parameters:
query_id (int | str) – Query ID
params (dict[str, Any] | None) – Optional parameters
- Returns:
Query string with parameters resolved
- Raises:
ValueError – If query_id is invalid
- Return type:
str
Required override - Get single query by ID with optional parameters.
Example:
query_sql = benchmark.get_query("q1", params={"date": "1998-09-02"})
Database Setup¶
- BaseBenchmark.setup_database(connection)[source]¶
Set up database with schema and data.
Creates necessary database schema and loads benchmark data into the database.
- Parameters:
connection (DatabaseConnection) – Database connection to set up
- Raises:
ValueError – If data generation fails
Exception – If database setup fails
Sets up database schema and loads data. Automatically calls
generate_data()if needed.
Execution¶
- BaseBenchmark.run_query(query_id, connection, params=None, fetch_results=False)[source]¶
Execute single query and return timing and results.
- Parameters:
query_id (int | str) – ID of the query to execute
connection (DatabaseConnection) – Database connection to execute query on
params (dict[str, Any] | None) – Optional parameters for query customization
fetch_results (bool) – Whether to fetch and return query results
- Returns:
query_id: Executed query ID
execution_time: Time taken to execute query in seconds
query_text: Executed query text
results: Query results if fetch_results=True, otherwise None
row_count: Number of rows returned (if results fetched)
- Return type:
Dictionary containing
- Raises:
ValueError – If query_id is invalid
Exception – If query execution fails
Execute single query and return detailed results including timing and row counts.
- BaseBenchmark.run_benchmark(connection, query_ids=None, fetch_results=False, setup_database=True)[source]¶
Run the complete benchmark suite.
- Parameters:
connection (DatabaseConnection) – Database connection to execute queries on
query_ids (list[int | str] | None) – Optional list of specific query IDs to run (defaults to all)
fetch_results (bool) – Whether to fetch and return query results
setup_database (bool) – Whether to set up the database first
- Returns:
benchmark_name: Name of the benchmark
total_execution_time: Total time for all queries
total_queries: Number of queries executed
successful_queries: Number of queries that succeeded
failed_queries: Number of queries that failed
query_results: List of individual query results
setup_time: Time taken for database setup (if performed)
- Return type:
Dictionary containing
- Raises:
Exception – If benchmark execution fails
Execute complete benchmark suite with optional filtering by query IDs.
Example:
# Run all queries results = benchmark.run_benchmark(connection) # Run specific queries only results = benchmark.run_benchmark( connection, query_ids=["q1", "q3", "q7"] )
- BaseBenchmark.run_with_platform(platform_adapter, **run_config)[source]¶
Run complete benchmark using platform-specific optimizations.
This method provides a unified interface for running benchmarks using database platform adapters that handle connection management, data loading optimizations, and query execution.
This is the standard method that all benchmarks should support for integration with the CLI and other orchestration tools.
- Parameters:
platform_adapter – Platform adapter instance (e.g., DuckDBAdapter)
**run_config – Configuration options: - categories: List of query categories to run (if benchmark supports) - query_subset: List of specific query IDs to run - connection: Connection configuration - benchmark_type: Type hint for optimizations (‘olap’, ‘oltp’, etc.)
- Returns:
BenchmarkResults object with execution results
Example
from benchbox.platforms import DuckDBAdapter
benchmark = SomeBenchmark(scale_factor=0.1) adapter = DuckDBAdapter() results = benchmark.run_with_platform(adapter)
Recommended entry point - Run benchmark using platform adapter for optimized execution.
This method delegates to the platform adapter’s
run_benchmark()implementation, which handles:Connection management
Data loading optimizations (bulk loading, parallel ingestion)
Query execution with retry logic
Results collection and validation
Example:
from benchbox.tpcds import TPCDS from benchbox.platforms import DatabricksAdapter benchmark = TPCDS(scale_factor=1) adapter = DatabricksAdapter( host="https://your-workspace.cloud.databricks.com", token="your-token", http_path="/sql/1.0/warehouses/abc123" ) results = benchmark.run_with_platform( adapter, query_subset=["q1", "q2", "q3"] # Optional filtering )
SQL Translation¶
- BaseBenchmark.translate_query(query_id, dialect)[source]¶
Translate a query to a specific SQL dialect.
- Parameters:
query_id (int | str) – The ID of the query to translate
dialect (str) – The target SQL dialect
- Returns:
The translated query string
- Raises:
ValueError – If the query_id is invalid
ImportError – If sqlglot is not installed
ValueError – If the dialect is not supported
- Return type:
str
Translate query to different SQL dialect using sqlglot.
Supported dialects: postgres, mysql, sqlite, duckdb, snowflake, bigquery, redshift, clickhouse, databricks, and more.
Note
Dialect Translation vs Platform Adapters: BenchBox can translate queries to many SQL dialects, but this doesn’t mean platform adapters exist for all those databases. Currently supported platforms: DuckDB, ClickHouse, Databricks, BigQuery, Redshift, Snowflake, SQLite. See Potential Future Platforms for planned platforms (PostgreSQL, MySQL, etc.).
Example:
# Translate TPC-H query to Snowflake dialect (fully supported) snowflake_sql = benchmark.translate_query("q1", dialect="snowflake") # Translate to BigQuery (fully supported) bigquery_sql = benchmark.translate_query("q1", dialect="bigquery") # Translate to PostgreSQL dialect (translation only - adapter not yet available) postgres_sql = benchmark.translate_query("q1", dialect="postgres")
Results Creation¶
- BaseBenchmark.create_enhanced_benchmark_result(platform, query_results, execution_metadata=None, phases=None, resource_utilization=None, performance_characteristics=None, **kwargs)[source]¶
Create a BenchmarkResults object with standardized fields.
This centralizes the logic for creating benchmark results that was previously duplicated across platform adapters and CLI orchestrator.
- Parameters:
platform (str) – Platform name (e.g., “DuckDB”, “ClickHouse”)
query_results (list[dict[str, Any]]) – List of query execution results
execution_metadata (dict[str, Any] | None) – Optional execution metadata
phases (dict[str, dict[str, Any]] | None) – Optional phase tracking information
resource_utilization (dict[str, Any] | None) – Optional resource usage metrics
performance_characteristics (dict[str, Any] | None) – Optional performance analysis
**kwargs (Any) – Additional fields to override defaults
- Returns:
Fully configured BenchmarkResults object
- Return type:
Create standardized
BenchmarkResultsobject with structured metadata.Used internally by platform adapters to ensure consistent result formatting.
Properties¶
- BaseBenchmark.benchmark_name str¶
Get the human-readable benchmark name.
Human-readable benchmark name (e.g., “TPC-H”, “ClickBench”).
- BaseBenchmark.scale_factor float¶
Data scale factor (1.0 = standard size, 0.01 = 1% size, 10 = 10x size).
- BaseBenchmark.output_dir Path¶
Directory where generated data files are stored.
Utility Methods¶
- BaseBenchmark.format_results(benchmark_result)[source]¶
Format benchmark results for display.
- Parameters:
benchmark_result (dict[str, Any]) – Result dictionary from run_benchmark()
- Returns:
Formatted string representation of the results
- Return type:
str
Format benchmark results dictionary into human-readable string.
- BaseBenchmark.get_data_source_benchmark()[source]¶
Return the canonical source benchmark when data is shared.
Benchmarks that reuse data generated by another benchmark (for example,
PrimitivesreusingTPC-Hdatasets) should override this method and return the lower-case identifier of the source benchmark. Benchmarks that produce their own data should returnNone(default).Returns name of source benchmark if this benchmark reuses data from another.
For example,
Primitivesbenchmark reuses TPC-H data, so it returns"tpch".
Best Practices¶
Always use platform adapters - Call
run_with_platform()instead of directrun_benchmark()for production use. Platform adapters provide optimized data loading and query execution.Handle scale factors carefully - Scale factors ≥1 must be integers. Use 0.1, 0.01, etc. for small-scale testing.
Check data generation - Call
generate_data()explicitly if you need to inspect or manipulate data files before loading.Use query subsets for debugging - Pass
query_subset=["q1"]to test single queries during development.Leverage SQL translation - Use
translate_query()to adapt queries to platform-specific dialects when needed.
See Also¶
Getting Started in 5 Minutes - Getting started guide with complete examples
Platform Selection Guide - Platform adapter documentation
/benchmarks/README - Available benchmark implementations
/usage/api-reference - High-level API overview