Base Benchmark API

Tags reference python-api contributor

The benchbox.base module provides the foundational abstract class that all benchmarks inherit from.

Overview

Every benchmark in BenchBox extends BaseBenchmark, which provides a standardized interface for:

  • Data generation and schema setup

  • Query execution and timing

  • Platform adapter integration

  • SQL dialect translation

  • Results collection and formatting

This abstraction ensures consistent behavior across all benchmark implementations (TPC-H, TPC-DS, ClickBench, etc.).

Quick Example

from benchbox.tpch import TPCH
from benchbox.platforms import DuckDBAdapter

# Create benchmark instance
benchmark = TPCH(scale_factor=0.01)

# Generate data files
data_files = benchmark.generate_data()

# Run with platform adapter
adapter = DuckDBAdapter()
results = benchmark.run_with_platform(adapter)

print(f"Completed {results.successful_queries}/{results.total_queries} queries")
print(f"Average query time: {results.average_query_time:.3f}s")

Core Classes

class BaseBenchmark(scale_factor=1.0, output_dir=None, **kwargs)[source]

Bases: VerbosityMixin, ABC

Base class for all benchmarks.

All benchmarks inherit from this class.

__init__(scale_factor=1.0, output_dir=None, **kwargs)[source]

Initialize a benchmark.

Parameters:
  • scale_factor (float) – Scale factor (1.0 = standard size)

  • output_dir (str | Path | None) – Data output directory

  • **kwargs (Any) – Additional options

get_data_source_benchmark()[source]

Return the canonical source benchmark when data is shared.

Benchmarks that reuse data generated by another benchmark (for example, Primitives reusing TPC-H datasets) should override this method and return the lower-case identifier of the source benchmark. Benchmarks that produce their own data should return None (default).

abstractmethod generate_data()[source]

Generate benchmark data.

Returns:

List of data file paths

Return type:

list[str | Path]

abstractmethod get_queries()[source]

Get all benchmark queries.

Returns:

Dictionary mapping query IDs to query strings

Return type:

dict[str, str]

abstractmethod get_query(query_id, *, params=None)[source]

Get a benchmark query.

Parameters:
  • query_id (int | str) – Query ID

  • params (dict[str, Any] | None) – Optional parameters

Returns:

Query string with parameters resolved

Raises:

ValueError – If query_id is invalid

Return type:

str

setup_database(connection)[source]

Set up database with schema and data.

Creates necessary database schema and loads benchmark data into the database.

Parameters:

connection (DatabaseConnection) – Database connection to set up

Raises:
  • ValueError – If data generation fails

  • Exception – If database setup fails

run_query(query_id, connection, params=None, fetch_results=False)[source]

Execute single query and return timing and results.

Parameters:
  • query_id (int | str) – ID of the query to execute

  • connection (DatabaseConnection) – Database connection to execute query on

  • params (dict[str, Any] | None) – Optional parameters for query customization

  • fetch_results (bool) – Whether to fetch and return query results

Returns:

  • query_id: Executed query ID

  • execution_time: Time taken to execute query in seconds

  • query_text: Executed query text

  • results: Query results if fetch_results=True, otherwise None

  • row_count: Number of rows returned (if results fetched)

Return type:

Dictionary containing

Raises:
  • ValueError – If query_id is invalid

  • Exception – If query execution fails

run_benchmark(connection, query_ids=None, fetch_results=False, setup_database=True)[source]

Run the complete benchmark suite.

Parameters:
  • connection (DatabaseConnection) – Database connection to execute queries on

  • query_ids (list[int | str] | None) – Optional list of specific query IDs to run (defaults to all)

  • fetch_results (bool) – Whether to fetch and return query results

  • setup_database (bool) – Whether to set up the database first

Returns:

  • benchmark_name: Name of the benchmark

  • total_execution_time: Total time for all queries

  • total_queries: Number of queries executed

  • successful_queries: Number of queries that succeeded

  • failed_queries: Number of queries that failed

  • query_results: List of individual query results

  • setup_time: Time taken for database setup (if performed)

Return type:

Dictionary containing

Raises:

Exception – If benchmark execution fails

run_with_platform(platform_adapter, **run_config)[source]

Run complete benchmark using platform-specific optimizations.

This method provides a unified interface for running benchmarks using database platform adapters that handle connection management, data loading optimizations, and query execution.

This is the standard method that all benchmarks should support for integration with the CLI and other orchestration tools.

Parameters:
  • platform_adapter – Platform adapter instance (e.g., DuckDBAdapter)

  • **run_config – Configuration options: - categories: List of query categories to run (if benchmark supports) - query_subset: List of specific query IDs to run - connection: Connection configuration - benchmark_type: Type hint for optimizations (‘olap’, ‘oltp’, etc.)

Returns:

BenchmarkResults object with execution results

Example

from benchbox.platforms import DuckDBAdapter

benchmark = SomeBenchmark(scale_factor=0.1) adapter = DuckDBAdapter() results = benchmark.run_with_platform(adapter)

format_results(benchmark_result)[source]

Format benchmark results for display.

Parameters:

benchmark_result (dict[str, Any]) – Result dictionary from run_benchmark()

Returns:

Formatted string representation of the results

Return type:

str

translate_query(query_id, dialect)[source]

Translate a query to a specific SQL dialect.

Parameters:
  • query_id (int | str) – The ID of the query to translate

  • dialect (str) – The target SQL dialect

Returns:

The translated query string

Raises:
  • ValueError – If the query_id is invalid

  • ImportError – If sqlglot is not installed

  • ValueError – If the dialect is not supported

Return type:

str

property benchmark_name: str

Get the human-readable benchmark name.

create_enhanced_benchmark_result(platform, query_results, execution_metadata=None, phases=None, resource_utilization=None, performance_characteristics=None, **kwargs)[source]

Create a BenchmarkResults object with standardized fields.

This centralizes the logic for creating benchmark results that was previously duplicated across platform adapters and CLI orchestrator.

Parameters:
  • platform (str) – Platform name (e.g., “DuckDB”, “ClickHouse”)

  • query_results (list[dict[str, Any]]) – List of query execution results

  • execution_metadata (dict[str, Any] | None) – Optional execution metadata

  • phases (dict[str, dict[str, Any]] | None) – Optional phase tracking information

  • resource_utilization (dict[str, Any] | None) – Optional resource usage metrics

  • performance_characteristics (dict[str, Any] | None) – Optional performance analysis

  • **kwargs (Any) – Additional fields to override defaults

Returns:

Fully configured BenchmarkResults object

Return type:

BenchmarkResults

Key Methods

Data Generation

abstractmethod BaseBenchmark.generate_data()[source]

Generate benchmark data.

Returns:

List of data file paths

Return type:

list[str | Path]

Required override - Each benchmark implements data generation logic.

Returns list of paths to generated data files (Parquet, CSV, etc.).

Query Access

abstractmethod BaseBenchmark.get_queries()[source]

Get all benchmark queries.

Returns:

Dictionary mapping query IDs to query strings

Return type:

dict[str, str]

Required override - Returns all queries for the benchmark.

Example return value:

{
    "q1": "SELECT ...",
    "q2": "SELECT ...",
    # ...
}
abstractmethod BaseBenchmark.get_query(query_id, *, params=None)[source]

Get a benchmark query.

Parameters:
  • query_id (int | str) – Query ID

  • params (dict[str, Any] | None) – Optional parameters

Returns:

Query string with parameters resolved

Raises:

ValueError – If query_id is invalid

Return type:

str

Required override - Get single query by ID with optional parameters.

Example:

query_sql = benchmark.get_query("q1", params={"date": "1998-09-02"})

Database Setup

BaseBenchmark.setup_database(connection)[source]

Set up database with schema and data.

Creates necessary database schema and loads benchmark data into the database.

Parameters:

connection (DatabaseConnection) – Database connection to set up

Raises:
  • ValueError – If data generation fails

  • Exception – If database setup fails

Sets up database schema and loads data. Automatically calls generate_data() if needed.

Execution

BaseBenchmark.run_query(query_id, connection, params=None, fetch_results=False)[source]

Execute single query and return timing and results.

Parameters:
  • query_id (int | str) – ID of the query to execute

  • connection (DatabaseConnection) – Database connection to execute query on

  • params (dict[str, Any] | None) – Optional parameters for query customization

  • fetch_results (bool) – Whether to fetch and return query results

Returns:

  • query_id: Executed query ID

  • execution_time: Time taken to execute query in seconds

  • query_text: Executed query text

  • results: Query results if fetch_results=True, otherwise None

  • row_count: Number of rows returned (if results fetched)

Return type:

Dictionary containing

Raises:
  • ValueError – If query_id is invalid

  • Exception – If query execution fails

Execute single query and return detailed results including timing and row counts.

BaseBenchmark.run_benchmark(connection, query_ids=None, fetch_results=False, setup_database=True)[source]

Run the complete benchmark suite.

Parameters:
  • connection (DatabaseConnection) – Database connection to execute queries on

  • query_ids (list[int | str] | None) – Optional list of specific query IDs to run (defaults to all)

  • fetch_results (bool) – Whether to fetch and return query results

  • setup_database (bool) – Whether to set up the database first

Returns:

  • benchmark_name: Name of the benchmark

  • total_execution_time: Total time for all queries

  • total_queries: Number of queries executed

  • successful_queries: Number of queries that succeeded

  • failed_queries: Number of queries that failed

  • query_results: List of individual query results

  • setup_time: Time taken for database setup (if performed)

Return type:

Dictionary containing

Raises:

Exception – If benchmark execution fails

Execute complete benchmark suite with optional filtering by query IDs.

Example:

# Run all queries
results = benchmark.run_benchmark(connection)

# Run specific queries only
results = benchmark.run_benchmark(
    connection,
    query_ids=["q1", "q3", "q7"]
)
BaseBenchmark.run_with_platform(platform_adapter, **run_config)[source]

Run complete benchmark using platform-specific optimizations.

This method provides a unified interface for running benchmarks using database platform adapters that handle connection management, data loading optimizations, and query execution.

This is the standard method that all benchmarks should support for integration with the CLI and other orchestration tools.

Parameters:
  • platform_adapter – Platform adapter instance (e.g., DuckDBAdapter)

  • **run_config – Configuration options: - categories: List of query categories to run (if benchmark supports) - query_subset: List of specific query IDs to run - connection: Connection configuration - benchmark_type: Type hint for optimizations (‘olap’, ‘oltp’, etc.)

Returns:

BenchmarkResults object with execution results

Example

from benchbox.platforms import DuckDBAdapter

benchmark = SomeBenchmark(scale_factor=0.1) adapter = DuckDBAdapter() results = benchmark.run_with_platform(adapter)

Recommended entry point - Run benchmark using platform adapter for optimized execution.

This method delegates to the platform adapter’s run_benchmark() implementation, which handles:

  • Connection management

  • Data loading optimizations (bulk loading, parallel ingestion)

  • Query execution with retry logic

  • Results collection and validation

Example:

from benchbox.tpcds import TPCDS
from benchbox.platforms import DatabricksAdapter

benchmark = TPCDS(scale_factor=1)
adapter = DatabricksAdapter(
    host="https://your-workspace.cloud.databricks.com",
    token="your-token",
    http_path="/sql/1.0/warehouses/abc123"
)

results = benchmark.run_with_platform(
    adapter,
    query_subset=["q1", "q2", "q3"]  # Optional filtering
)

SQL Translation

BaseBenchmark.translate_query(query_id, dialect)[source]

Translate a query to a specific SQL dialect.

Parameters:
  • query_id (int | str) – The ID of the query to translate

  • dialect (str) – The target SQL dialect

Returns:

The translated query string

Raises:
  • ValueError – If the query_id is invalid

  • ImportError – If sqlglot is not installed

  • ValueError – If the dialect is not supported

Return type:

str

Translate query to different SQL dialect using sqlglot.

Supported dialects: postgres, mysql, sqlite, duckdb, snowflake, bigquery, redshift, clickhouse, databricks, and more.

Note

Dialect Translation vs Platform Adapters: BenchBox can translate queries to many SQL dialects, but this doesn’t mean platform adapters exist for all those databases. Currently supported platforms: DuckDB, ClickHouse, Databricks, BigQuery, Redshift, Snowflake, SQLite. See Potential Future Platforms for planned platforms (PostgreSQL, MySQL, etc.).

Example:

# Translate TPC-H query to Snowflake dialect (fully supported)
snowflake_sql = benchmark.translate_query("q1", dialect="snowflake")

# Translate to BigQuery (fully supported)
bigquery_sql = benchmark.translate_query("q1", dialect="bigquery")

# Translate to PostgreSQL dialect (translation only - adapter not yet available)
postgres_sql = benchmark.translate_query("q1", dialect="postgres")

Results Creation

BaseBenchmark.create_enhanced_benchmark_result(platform, query_results, execution_metadata=None, phases=None, resource_utilization=None, performance_characteristics=None, **kwargs)[source]

Create a BenchmarkResults object with standardized fields.

This centralizes the logic for creating benchmark results that was previously duplicated across platform adapters and CLI orchestrator.

Parameters:
  • platform (str) – Platform name (e.g., “DuckDB”, “ClickHouse”)

  • query_results (list[dict[str, Any]]) – List of query execution results

  • execution_metadata (dict[str, Any] | None) – Optional execution metadata

  • phases (dict[str, dict[str, Any]] | None) – Optional phase tracking information

  • resource_utilization (dict[str, Any] | None) – Optional resource usage metrics

  • performance_characteristics (dict[str, Any] | None) – Optional performance analysis

  • **kwargs (Any) – Additional fields to override defaults

Returns:

Fully configured BenchmarkResults object

Return type:

BenchmarkResults

Create standardized BenchmarkResults object with structured metadata.

Used internally by platform adapters to ensure consistent result formatting.

Properties

BaseBenchmark.benchmark_name str

Get the human-readable benchmark name.

Human-readable benchmark name (e.g., “TPC-H”, “ClickBench”).

BaseBenchmark.scale_factor float

Data scale factor (1.0 = standard size, 0.01 = 1% size, 10 = 10x size).

BaseBenchmark.output_dir Path

Directory where generated data files are stored.

Utility Methods

BaseBenchmark.format_results(benchmark_result)[source]

Format benchmark results for display.

Parameters:

benchmark_result (dict[str, Any]) – Result dictionary from run_benchmark()

Returns:

Formatted string representation of the results

Return type:

str

Format benchmark results dictionary into human-readable string.

BaseBenchmark.get_data_source_benchmark()[source]

Return the canonical source benchmark when data is shared.

Benchmarks that reuse data generated by another benchmark (for example, Primitives reusing TPC-H datasets) should override this method and return the lower-case identifier of the source benchmark. Benchmarks that produce their own data should return None (default).

Returns name of source benchmark if this benchmark reuses data from another.

For example, Primitives benchmark reuses TPC-H data, so it returns "tpch".

Best Practices

  1. Always use platform adapters - Call run_with_platform() instead of direct run_benchmark() for production use. Platform adapters provide optimized data loading and query execution.

  2. Handle scale factors carefully - Scale factors ≥1 must be integers. Use 0.1, 0.01, etc. for small-scale testing.

  3. Check data generation - Call generate_data() explicitly if you need to inspect or manipulate data files before loading.

  4. Use query subsets for debugging - Pass query_subset=["q1"] to test single queries during development.

  5. Leverage SQL translation - Use translate_query() to adapt queries to platform-specific dialects when needed.

See Also