Base Benchmark API¶

The benchbox.base module provides the foundational abstract class that all benchmarks inherit from.

Overview¶

Every benchmark in BenchBox extends BaseBenchmark, which provides a standardized interface for:

Data generation and schema setup
Query execution and timing
Platform adapter integration
SQL dialect translation
Results collection and formatting

This abstraction ensures consistent behavior across all benchmark implementations (TPC-H, TPC-DS, ClickBench, etc.).

Quick Example¶

from benchbox.tpch import TPCH
from benchbox.platforms import DuckDBAdapter

# Create benchmark instance
benchmark = TPCH(scale_factor=0.01)

# Generate data files
data_files = benchmark.generate_data()

# Run with platform adapter
adapter = DuckDBAdapter()
results = benchmark.run_with_platform(adapter)

print(f"Completed {results.successful_queries}/{results.total_queries} queries")
print(f"Average query time: {results.average_query_time:.3f}s")

Core Classes¶

class BaseBenchmark(scale_factor=1.0, output_dir=None, **kwargs)[source]¶

Bases: VerbosityMixin, ABC

Base class for all benchmarks.

All benchmarks inherit from this class.

__init__(scale_factor=1.0, output_dir=None, **kwargs)[source]¶

Initialize a benchmark.

Parameters:

scale_factor (float) – Scale factor (1.0 = standard size)
output_dir (str | Path | None) – Data output directory
**kwargs (Any) – Additional options

get_data_source_benchmark()[source]¶

Return the canonical source benchmark when data is shared.

Benchmarks that reuse data generated by another benchmark (for example, Primitives reusing TPC-H datasets) should override this method and return the lower-case identifier of the source benchmark. Benchmarks that produce their own data should return None (default).

abstractmethod generate_data()[source]¶

Generate benchmark data.

Returns:: List of data file paths
Return type:: list[str | Path]

abstractmethod get_queries()[source]¶

Get all benchmark queries.

Returns:: Dictionary mapping query IDs to query strings
Return type:: dict[str, str]

abstractmethod get_query(query_id, *, params=None)[source]¶

Get a benchmark query.

Parameters:

query_id (int | str) – Query ID
params (dict[str, Any] | None) – Optional parameters

Returns:

Query string with parameters resolved

Raises:

ValueError – If query_id is invalid

Return type:

str

setup_database(connection)[source]¶

Set up database with schema and data.

Creates necessary database schema and loads benchmark data into the database.

Parameters:

connection (DatabaseConnection) – Database connection to set up

Raises:

ValueError – If data generation fails
Exception – If database setup fails

run_query(query_id, connection, params=None, fetch_results=False)[source]¶

Execute single query and return timing and results.

Parameters:

query_id (int | str) – ID of the query to execute
connection (DatabaseConnection) – Database connection to execute query on
params (dict[str, Any] | None) – Optional parameters for query customization
fetch_results (bool) – Whether to fetch and return query results

Returns:

query_id: Executed query ID
execution_time: Time taken to execute query in seconds
query_text: Executed query text
results: Query results if fetch_results=True, otherwise None
row_count: Number of rows returned (if results fetched)

Return type:

Dictionary containing

Raises:

ValueError – If query_id is invalid
Exception – If query execution fails

run_benchmark(connection, query_ids=None, fetch_results=False, setup_database=True)[source]¶

Run the complete benchmark suite.

Parameters:

connection (DatabaseConnection) – Database connection to execute queries on
query_ids (list[int | str] | None) – Optional list of specific query IDs to run (defaults to all)
fetch_results (bool) – Whether to fetch and return query results
setup_database (bool) – Whether to set up the database first

Returns:

benchmark_name: Name of the benchmark
total_execution_time: Total time for all queries
total_queries: Number of queries executed
successful_queries: Number of queries that succeeded
failed_queries: Number of queries that failed
query_results: List of individual query results
setup_time: Time taken for database setup (if performed)

Return type:

Dictionary containing

Raises:

Exception – If benchmark execution fails

run_with_platform(platform_adapter, **run_config)[source]¶

Run complete benchmark using platform-specific optimizations.

This method provides a unified interface for running benchmarks using database platform adapters that handle connection management, data loading optimizations, and query execution.

This is the standard method that all benchmarks should support for integration with the CLI and other orchestration tools.

Parameters:

platform_adapter – Platform adapter instance (e.g., DuckDBAdapter)
**run_config – Configuration options: - categories: List of query categories to run (if benchmark supports) - query_subset: List of specific query IDs to run - connection: Connection configuration - benchmark_type: Type hint for optimizations (‘olap’, ‘oltp’, etc.)

Returns:

BenchmarkResults object with execution results

Example

from benchbox.platforms import DuckDBAdapter

benchmark = SomeBenchmark(scale_factor=0.1) adapter = DuckDBAdapter() results = benchmark.run_with_platform(adapter)

format_results(benchmark_result)[source]¶

Format benchmark results for display.

Parameters:: benchmark_result (dict[str, Any]) – Result dictionary from run_benchmark()
Returns:: Formatted string representation of the results
Return type:: str

translate_query(query_id, dialect)[source]¶

Translate a query to a specific SQL dialect.

Parameters:

query_id (int | str) – The ID of the query to translate
dialect (str) – The target SQL dialect

Returns:

The translated query string

Raises:

ValueError – If the query_id is invalid
ImportError – If sqlglot is not installed
ValueError – If the dialect is not supported

Return type:

str

property benchmark_name: str¶: Get the human-readable benchmark name.

create_enhanced_benchmark_result(platform, query_results, execution_metadata=None, phases=None, resource_utilization=None, performance_characteristics=None, **kwargs)[source]¶

Create a BenchmarkResults object with standardized fields.

This centralizes the logic for creating benchmark results that was previously duplicated across platform adapters and CLI orchestrator.

Parameters:

platform (str) – Platform name (e.g., “DuckDB”, “ClickHouse”)
query_results (list[dict[str, Any]]) – List of query execution results
execution_metadata (dict[str, Any] | None) – Optional execution metadata
phases (dict[str, dict[str, Any]] | None) – Optional phase tracking information
resource_utilization (dict[str, Any] | None) – Optional resource usage metrics
performance_characteristics (dict[str, Any] | None) – Optional performance analysis
**kwargs (Any) – Additional fields to override defaults

Returns:

Fully configured BenchmarkResults object

Return type:

BenchmarkResults

Key Methods¶

Data Generation¶

abstractmethod BaseBenchmark.generate_data()[source]¶

Generate benchmark data.

Returns:: List of data file paths
Return type:: list[str | Path]

Required override - Each benchmark implements data generation logic.

Returns list of paths to generated data files (Parquet, CSV, etc.).

Query Access¶

abstractmethod BaseBenchmark.get_queries()[source]¶

Get all benchmark queries.

Returns:: Dictionary mapping query IDs to query strings
Return type:: dict[str, str]

Required override - Returns all queries for the benchmark.

Example return value:

{
    "q1": "SELECT ...",
    "q2": "SELECT ...",
    # ...
}

abstractmethod BaseBenchmark.get_query(query_id, *, params=None)[source]¶

Get a benchmark query.

Parameters:

query_id (int | str) – Query ID
params (dict[str, Any] | None) – Optional parameters

Returns:

Query string with parameters resolved

Raises:

ValueError – If query_id is invalid

Return type:

str

Required override - Get single query by ID with optional parameters.

Example:

query_sql = benchmark.get_query("q1", params={"date": "1998-09-02"})

Database Setup¶

BaseBenchmark.setup_database(connection)[source]¶

Set up database with schema and data.

Creates necessary database schema and loads benchmark data into the database.

Parameters:

connection (DatabaseConnection) – Database connection to set up

Raises:

ValueError – If data generation fails
Exception – If database setup fails

Sets up database schema and loads data. Automatically calls generate_data() if needed.

Execution¶

BaseBenchmark.run_query(query_id, connection, params=None, fetch_results=False)[source]¶

Execute single query and return timing and results.

Parameters:

query_id (int | str) – ID of the query to execute
connection (DatabaseConnection) – Database connection to execute query on
params (dict[str, Any] | None) – Optional parameters for query customization
fetch_results (bool) – Whether to fetch and return query results

Returns:

query_id: Executed query ID
execution_time: Time taken to execute query in seconds
query_text: Executed query text
results: Query results if fetch_results=True, otherwise None
row_count: Number of rows returned (if results fetched)

Return type:

Dictionary containing

Raises:

ValueError – If query_id is invalid
Exception – If query execution fails

Execute single query and return detailed results including timing and row counts.

BaseBenchmark.run_benchmark(connection, query_ids=None, fetch_results=False, setup_database=True)[source]¶

Run the complete benchmark suite.

Parameters:

connection (DatabaseConnection) – Database connection to execute queries on
query_ids (list[int | str] | None) – Optional list of specific query IDs to run (defaults to all)
fetch_results (bool) – Whether to fetch and return query results
setup_database (bool) – Whether to set up the database first

Returns:

benchmark_name: Name of the benchmark
total_execution_time: Total time for all queries
total_queries: Number of queries executed
successful_queries: Number of queries that succeeded
failed_queries: Number of queries that failed
query_results: List of individual query results
setup_time: Time taken for database setup (if performed)

Return type:

Dictionary containing

Raises:

Exception – If benchmark execution fails

Execute complete benchmark suite with optional filtering by query IDs.

Example:

# Run all queries
results = benchmark.run_benchmark(connection)

# Run specific queries only
results = benchmark.run_benchmark(
    connection,
    query_ids=["q1", "q3", "q7"]
)

BaseBenchmark.run_with_platform(platform_adapter, **run_config)[source]¶

Run complete benchmark using platform-specific optimizations.

This method provides a unified interface for running benchmarks using database platform adapters that handle connection management, data loading optimizations, and query execution.

This is the standard method that all benchmarks should support for integration with the CLI and other orchestration tools.

Parameters:

platform_adapter – Platform adapter instance (e.g., DuckDBAdapter)
**run_config – Configuration options: - categories: List of query categories to run (if benchmark supports) - query_subset: List of specific query IDs to run - connection: Connection configuration - benchmark_type: Type hint for optimizations (‘olap’, ‘oltp’, etc.)

Returns:

BenchmarkResults object with execution results

Example

from benchbox.platforms import DuckDBAdapter

benchmark = SomeBenchmark(scale_factor=0.1) adapter = DuckDBAdapter() results = benchmark.run_with_platform(adapter)

Recommended entry point - Run benchmark using platform adapter for optimized execution.

This method delegates to the platform adapter’s run_benchmark() implementation, which handles:

Connection management
Data loading optimizations (bulk loading, parallel ingestion)
Query execution with retry logic
Results collection and validation

Example:

from benchbox.tpcds import TPCDS
from benchbox.platforms import DatabricksAdapter

benchmark = TPCDS(scale_factor=1)
adapter = DatabricksAdapter(
    host="https://your-workspace.cloud.databricks.com",
    token="your-token",
    http_path="/sql/1.0/warehouses/abc123"
)

results = benchmark.run_with_platform(
    adapter,
    query_subset=["q1", "q2", "q3"]  # Optional filtering
)

SQL Translation¶

BaseBenchmark.translate_query(query_id, dialect)[source]¶

Translate a query to a specific SQL dialect.

Parameters:

query_id (int | str) – The ID of the query to translate
dialect (str) – The target SQL dialect

Returns:

The translated query string

Raises:

ValueError – If the query_id is invalid
ImportError – If sqlglot is not installed
ValueError – If the dialect is not supported

Return type:

str

Translate query to different SQL dialect using sqlglot.

Supported dialects: postgres, mysql, sqlite, duckdb, snowflake, bigquery, redshift, clickhouse, databricks, and more.

Note

Dialect Translation vs Platform Adapters: BenchBox can translate queries to many SQL dialects, but this doesn’t mean platform adapters exist for all those databases. Currently supported platforms: DuckDB, ClickHouse, Databricks, BigQuery, Redshift, Snowflake, SQLite. See Potential Future Platforms for planned platforms (PostgreSQL, MySQL, etc.).

Example:

# Translate TPC-H query to Snowflake dialect (fully supported)
snowflake_sql = benchmark.translate_query("q1", dialect="snowflake")

# Translate to BigQuery (fully supported)
bigquery_sql = benchmark.translate_query("q1", dialect="bigquery")

# Translate to PostgreSQL dialect (translation only - adapter not yet available)
postgres_sql = benchmark.translate_query("q1", dialect="postgres")

Results Creation¶

BaseBenchmark.create_enhanced_benchmark_result(platform, query_results, execution_metadata=None, phases=None, resource_utilization=None, performance_characteristics=None, **kwargs)[source]¶

Create a BenchmarkResults object with standardized fields.

This centralizes the logic for creating benchmark results that was previously duplicated across platform adapters and CLI orchestrator.

Parameters:

platform (str) – Platform name (e.g., “DuckDB”, “ClickHouse”)
query_results (list[dict[str, Any]]) – List of query execution results
execution_metadata (dict[str, Any] | None) – Optional execution metadata
phases (dict[str, dict[str, Any]] | None) – Optional phase tracking information
resource_utilization (dict[str, Any] | None) – Optional resource usage metrics
performance_characteristics (dict[str, Any] | None) – Optional performance analysis
**kwargs (Any) – Additional fields to override defaults

Returns:

Fully configured BenchmarkResults object

Return type:

BenchmarkResults

Create standardized BenchmarkResults object with structured metadata.

Used internally by platform adapters to ensure consistent result formatting.

Properties¶

BaseBenchmark.benchmark_name str¶

Get the human-readable benchmark name.

Human-readable benchmark name (e.g., “TPC-H”, “ClickBench”).

BaseBenchmark.scale_factor float¶: Data scale factor (1.0 = standard size, 0.01 = 1% size, 10 = 10x size).

BaseBenchmark.output_dir Path¶: Directory where generated data files are stored.

Utility Methods¶

BaseBenchmark.format_results(benchmark_result)[source]¶

Format benchmark results for display.

Parameters:: benchmark_result (dict[str, Any]) – Result dictionary from run_benchmark()
Returns:: Formatted string representation of the results
Return type:: str

Format benchmark results dictionary into human-readable string.

BaseBenchmark.get_data_source_benchmark()[source]¶

Return the canonical source benchmark when data is shared.

Returns name of source benchmark if this benchmark reuses data from another.

For example, Primitives benchmark reuses TPC-H data, so it returns "tpch".

Best Practices¶

Always use platform adapters - Call run_with_platform() instead of direct run_benchmark() for production use. Platform adapters provide optimized data loading and query execution.
Handle scale factors carefully - Scale factors ≥1 must be integers. Use 0.1, 0.01, etc. for small-scale testing.
Check data generation - Call generate_data() explicitly if you need to inspect or manipulate data files before loading.
Use query subsets for debugging - Pass query_subset=["q1"] to test single queries during development.
Leverage SQL translation - Use translate_query() to adapt queries to platform-specific dialects when needed.

Base Benchmark API¶

Overview¶

Quick Example¶

Core Classes¶

Key Methods¶

Data Generation¶

Query Access¶

Database Setup¶

Execution¶

SQL Translation¶

Results Creation¶

Properties¶

Utility Methods¶

Best Practices¶

See Also¶