H2O.ai Database Benchmark API

Tags reference python-api h2odb

Complete Python API reference for the H2O.ai Database Benchmark.

Overview

The H2O.ai Database Benchmark tests analytical database performance using real-world taxi trip data patterns. Developed by H2O.ai for their database benchmarking initiative, this benchmark focuses on fundamental analytical operations common in data science and machine learning workflows: aggregations, grouping, and time-series analysis.

Key Features:

  • Real-World Data: Based on NYC taxi trip data structure

  • Data Science Focus: Tests operations common in ML pipelines

  • Single-Table Design: Emphasizes aggregation performance

  • Time-Series Operations: Tests temporal aggregation patterns

  • Scalable: From 10K rows (SF 0.01) to 1B rows (SF 1000)

  • Analytics-Oriented: Focuses on data exploration patterns

Reference: https://h2oai.github.io/db-benchmark/

Quick Start

from benchbox.h2odb import H2ODB
from benchbox.platforms.duckdb import DuckDBAdapter

# Create benchmark
benchmark = H2ODB(scale_factor=0.1)
benchmark.generate_data()

# Run benchmark
adapter = DuckDBAdapter()
results = adapter.run_benchmark(benchmark)

API Reference

H2ODB Class

class H2ODB(scale_factor=1.0, output_dir=None, **kwargs)[source]

Bases: BaseBenchmark

H2O Database Benchmark implementation.

This class provides an implementation of the H2O Database Benchmark, including data generation and access to analytical queries for taxi trip data.

Reference: H2O.ai benchmarking suite for analytical workloads

__init__(scale_factor=1.0, output_dir=None, **kwargs)[source]

Initialize an H2O Database Benchmark instance.

Parameters:
  • scale_factor (float) – Scale factor for the benchmark (1.0 = ~1GB)

  • output_dir (str | Path | None) – Directory to output generated data files

  • **kwargs – Additional implementation-specific options

generate_data()[source]

Generate H2O Database Benchmark data.

Returns:

A list of paths to the generated data files

Return type:

list[str | Path]

get_queries(dialect=None)[source]

Get all H2O Database Benchmark queries.

Parameters:

dialect (str | None) – Target SQL dialect for query translation. If None, returns original queries.

Returns:

A dictionary mapping query IDs to query strings

Return type:

dict[str, str]

get_query(query_id, *, params=None)[source]

Get a specific H2O Database Benchmark query.

Parameters:
  • query_id (int | str) – The ID of the query to retrieve (Q1-Q10)

  • params (dict[str, Any] | None) – Optional parameters to customize the query

Returns:

The query string

Raises:

ValueError – If the query_id is invalid

Return type:

str

get_schema()[source]

Get the H2O Database Benchmark schema.

Returns:

A list of dictionaries describing the tables in the schema

Return type:

list[dict]

get_create_tables_sql(dialect='standard', tuning_config=None)[source]

Get SQL to create all H2O Database Benchmark tables.

Parameters:
  • dialect (str) – SQL dialect to use

  • tuning_config – Unified tuning configuration for constraint settings

Returns:

SQL script for creating all tables

Return type:

str

apply_verbosity(settings)

Apply verbosity settings to the mixin consumer.

property benchmark_name: str

Get the human-readable benchmark name.

create_enhanced_benchmark_result(platform, query_results, execution_metadata=None, phases=None, resource_utilization=None, performance_characteristics=None, **kwargs)

Create a BenchmarkResults object with standardized fields.

This centralizes the logic for creating benchmark results that was previously duplicated across platform adapters and CLI orchestrator.

Parameters:
  • platform (str) – Platform name (e.g., “DuckDB”, “ClickHouse”)

  • query_results (list[dict[str, Any]]) – List of query execution results

  • execution_metadata (dict[str, Any] | None) – Optional execution metadata

  • phases (dict[str, dict[str, Any]] | None) – Optional phase tracking information

  • resource_utilization (dict[str, Any] | None) – Optional resource usage metrics

  • performance_characteristics (dict[str, Any] | None) – Optional performance analysis

  • **kwargs (Any) – Additional fields to override defaults

Returns:

Fully configured BenchmarkResults object

Return type:

BenchmarkResults

format_results(benchmark_result)

Format benchmark results for display.

Parameters:

benchmark_result (dict[str, Any]) – Result dictionary from run_benchmark()

Returns:

Formatted string representation of the results

Return type:

str

get_data_source_benchmark()

Return the canonical source benchmark when data is shared.

Benchmarks that reuse data generated by another benchmark (for example, Primitives reusing TPC-H datasets) should override this method and return the lower-case identifier of the source benchmark. Benchmarks that produce their own data should return None (default).

log_debug_info(context='Debug')

Log comprehensive debug information including version details.

log_error_with_debug_info(error, context='Error')

Log an error with comprehensive debug information.

log_operation_complete(operation, duration=None, details='')
log_operation_start(operation, details='')
log_verbose(message)
log_version_warning()

Log version consistency warnings if any exist.

log_very_verbose(message)
property logger: Logger

Return the logger configured for the verbosity mixin consumer.

quiet: bool = False
run_benchmark(connection, query_ids=None, fetch_results=False, setup_database=True)

Run the complete benchmark suite.

Parameters:
  • connection (DatabaseConnection) – Database connection to execute queries on

  • query_ids (list[int | str] | None) – Optional list of specific query IDs to run (defaults to all)

  • fetch_results (bool) – Whether to fetch and return query results

  • setup_database (bool) – Whether to set up the database first

Returns:

  • benchmark_name: Name of the benchmark

  • total_execution_time: Total time for all queries

  • total_queries: Number of queries executed

  • successful_queries: Number of queries that succeeded

  • failed_queries: Number of queries that failed

  • query_results: List of individual query results

  • setup_time: Time taken for database setup (if performed)

Return type:

Dictionary containing

Raises:

Exception – If benchmark execution fails

run_query(query_id, connection, params=None, fetch_results=False)

Execute single query and return timing and results.

Parameters:
  • query_id (int | str) – ID of the query to execute

  • connection (DatabaseConnection) – Database connection to execute query on

  • params (dict[str, Any] | None) – Optional parameters for query customization

  • fetch_results (bool) – Whether to fetch and return query results

Returns:

  • query_id: Executed query ID

  • execution_time: Time taken to execute query in seconds

  • query_text: Executed query text

  • results: Query results if fetch_results=True, otherwise None

  • row_count: Number of rows returned (if results fetched)

Return type:

Dictionary containing

Raises:
  • ValueError – If query_id is invalid

  • Exception – If query execution fails

run_with_platform(platform_adapter, **run_config)

Run complete benchmark using platform-specific optimizations.

This method provides a unified interface for running benchmarks using database platform adapters that handle connection management, data loading optimizations, and query execution.

This is the standard method that all benchmarks should support for integration with the CLI and other orchestration tools.

Parameters:
  • platform_adapter – Platform adapter instance (e.g., DuckDBAdapter)

  • **run_config – Configuration options: - categories: List of query categories to run (if benchmark supports) - query_subset: List of specific query IDs to run - connection: Connection configuration - benchmark_type: Type hint for optimizations (‘olap’, ‘oltp’, etc.)

Returns:

BenchmarkResults object with execution results

Example

from benchbox.platforms import DuckDBAdapter

benchmark = SomeBenchmark(scale_factor=0.1) adapter = DuckDBAdapter() results = benchmark.run_with_platform(adapter)

setup_database(connection)

Set up database with schema and data.

Creates necessary database schema and loads benchmark data into the database.

Parameters:

connection (DatabaseConnection) – Database connection to set up

Raises:
  • ValueError – If data generation fails

  • Exception – If database setup fails

translate_query(query_id, dialect)

Translate a query to a specific SQL dialect.

Parameters:
  • query_id (int | str) – The ID of the query to translate

  • dialect (str) – The target SQL dialect

Returns:

The translated query string

Raises:
  • ValueError – If the query_id is invalid

  • ImportError – If sqlglot is not installed

  • ValueError – If the dialect is not supported

Return type:

str

verbose: bool = False
verbose_enabled: bool = False
verbose_level: int = 0
property verbosity_settings: VerbositySettings

Return the current verbosity settings.

very_verbose: bool = False

Constructor:

H2ODB(
    scale_factor: float = 1.0,
    output_dir: Optional[Union[str, Path]] = None,
    **kwargs
)

Parameters:

  • scale_factor (float): Data size multiplier (1.0 ≈ 1M rows, ~100MB)

  • output_dir (str | Path | None): Output directory for generated data

Schema Methods

get_create_tables_sql(dialect='standard', tuning_config=None) str

Get SQL to create the TRIPS table with taxi trip data schema.

Example:

schema_sql = benchmark.get_create_tables_sql(dialect="duckdb")
get_schema() list[dict]

Get detailed schema information for the TRIPS table.

Query Methods

get_query(query_id, *, params=None) str

Get specific H2O.ai query with optional parameters.

Parameters:

  • query_id (int | str): Query ID (Q1-Q10 or 1-10)

  • params (dict | None): Query parameters

Supported Parameters:

  • start_date (str): Start date for temporal filtering

  • end_date (str): End date for temporal filtering

  • min_fare (float): Minimum fare amount

  • max_fare (float): Maximum fare amount

Example:

# Basic aggregation query
count_query = benchmark.get_query("Q1")

# Temporal analysis with date range
temporal_query = benchmark.get_query("Q8", params={
    'start_date': '2020-01-01',
    'end_date': '2020-01-31'
})

# Statistical analysis with fare filter
stats_query = benchmark.get_query("Q9", params={
    'min_fare': 5.0,
    'max_fare': 100.0
})
get_queries(dialect=None) dict[str, str]

Get all H2O.ai queries (10 queries total).

Example:

queries = benchmark.get_queries()
print(f"Available queries: {list(queries.keys())}")
# Output: ['Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7', 'Q8', 'Q9', 'Q10']

Query Categories

The H2O.ai benchmark queries are organized by analytical operation type:

Basic Aggregation Queries (Q1-Q2):
  • Q1: Simple COUNT(*)

  • Q2: SUM and AVG aggregations

  • Performance focus: Sequential scan and basic aggregation speed

Grouping Queries (Q3-Q6):
  • Q3: Single-column GROUP BY

  • Q4: Multi-aggregate grouping

  • Q5: Two-column GROUP BY

  • Q6: Complex multi-column grouping

  • Performance focus: Hash aggregation and grouping algorithms

Temporal Analysis Queries (Q7-Q8):
  • Q7: Time-based grouping (hourly analysis)

  • Q8: Complex temporal aggregation with date filtering

  • Performance focus: Date/time function evaluation and temporal grouping

Advanced Analytics Queries (Q9-Q10):
  • Q9: Statistical analysis (STDDEV, percentiles)

  • Q10: Complex multi-metric analytics

  • Performance focus: Statistical function computation and sorting

Usage Examples

Basic Benchmark Execution

from benchbox.h2odb import H2ODB
from benchbox.platforms.duckdb import DuckDBAdapter

# Initialize with testing scale
benchmark = H2ODB(scale_factor=0.1)
data_files = benchmark.generate_data()

# Run with DuckDB
adapter = DuckDBAdapter(memory_limit="4GB")
results = adapter.run_benchmark(benchmark)

print(f"Queries: {results.total_queries}")
print(f"Average time: {results.average_query_time:.3f}s")

Query Group Testing

from benchbox.h2odb import H2ODB
from benchbox.platforms.duckdb import DuckDBAdapter
import time

benchmark = H2ODB(scale_factor=0.01)
benchmark.generate_data()

adapter = DuckDBAdapter()
conn = adapter.create_connection()
adapter.create_schema(benchmark, conn)
adapter.load_data(benchmark, conn, benchmark.output_dir)

# Test query groups
query_groups = {
    'Basic': ['Q1', 'Q2'],
    'Grouping': ['Q3', 'Q4', 'Q5', 'Q6'],
    'Temporal': ['Q7', 'Q8'],
    'Advanced': ['Q9', 'Q10']
}

params = {
    'start_date': '2020-01-01',
    'end_date': '2020-01-31',
    'min_fare': 5.0,
    'max_fare': 100.0
}

for group_name, query_ids in query_groups.items():
    print(f"\n{group_name} Queries:")
    for query_id in query_ids:
        query = benchmark.get_query(query_id, params=params)

        start = time.time()
        result = conn.execute(query).fetchall()
        elapsed = time.time() - start

        print(f"  {query_id}: {elapsed*1000:.1f} ms ({len(result)} rows)")

Aggregation Performance Analysis

from benchbox.h2odb import H2ODB
from benchbox.platforms.duckdb import DuckDBAdapter
import time
from statistics import mean

# Test aggregation performance across scales
scale_factors = [0.01, 0.1, 1.0]

for sf in scale_factors:
    print(f"\n=== Scale Factor {sf} ===")

    benchmark = H2ODB(scale_factor=sf)
    benchmark.generate_data()

    adapter = DuckDBAdapter()
    conn = adapter.create_connection()
    adapter.create_schema(benchmark, conn)
    adapter.load_data(benchmark, conn, benchmark.output_dir)

    # Test different aggregation patterns
    aggregation_tests = [
        ('Q1', 'Simple count'),
        ('Q2', 'Sum and avg'),
        ('Q3', 'Single-column GROUP BY'),
        ('Q5', 'Two-column GROUP BY'),
        ('Q9', 'Statistical functions')
    ]

    for query_id, description in aggregation_tests:
        query = benchmark.get_query(query_id)

        # Run 3 times for stability
        times = []
        for _ in range(3):
            start = time.time()
            result = conn.execute(query).fetchall()
            times.append(time.time() - start)

        print(f"{query_id} ({description}): {mean(times)*1000:.1f} ms")

Data Science Workflow Integration

from benchbox.h2odb import H2ODB
from benchbox.platforms.duckdb import DuckDBAdapter
import pandas as pd
import time

# Generate data for ML preprocessing simulation
benchmark = H2ODB(scale_factor=1.0)
benchmark.generate_data()

# Load into pandas for data science operations
trips_df = pd.read_csv(benchmark.output_dir / "trips.csv")

print("Data Science Operations Performance:")

# Feature engineering
start = time.time()
trips_df['hour'] = pd.to_datetime(trips_df['pickup_datetime']).dt.hour
trips_df['day_of_week'] = pd.to_datetime(trips_df['pickup_datetime']).dt.dayofweek
trips_df['trip_duration'] = (
    pd.to_datetime(trips_df['dropoff_datetime']) -
    pd.to_datetime(trips_df['pickup_datetime'])
).dt.total_seconds()
print(f"Feature engineering: {time.time() - start:.3f}s")

# Aggregation operations (similar to H2O queries)
start = time.time()
hourly_stats = trips_df.groupby('hour').agg({
    'fare_amount': ['sum', 'mean', 'std', 'count'],
    'trip_distance': ['mean'],
    'passenger_count': ['mean']
})
print(f"GroupBy aggregation: {time.time() - start:.3f}s")

# Statistical analysis
start = time.time()
vendor_stats = trips_df.groupby('vendor_id').agg({
    'fare_amount': ['count', 'sum', 'mean', 'std', 'min', 'max'],
    'tip_amount': ['mean', 'std']
})
print(f"Statistical analysis: {time.time() - start:.3f}s")

Best Practices

  1. Use Appropriate Scale Factors

    # Development and testing
    dev = H2ODB(scale_factor=0.01)  # ~10K rows, ~1 MB
    
    # Standard benchmark
    test = H2ODB(scale_factor=1.0)  # ~1M rows, ~100 MB
    
    # Large-scale testing
    prod = H2ODB(scale_factor=10.0)  # ~10M rows, ~1 GB
    
  2. Test Query Groups Separately

    # Test basic aggregation performance
    basic_queries = ['Q1', 'Q2']
    
    # Test grouping performance
    grouping_queries = ['Q3', 'Q4', 'Q5', 'Q6']
    
    # Test temporal analysis performance
    temporal_queries = ['Q7', 'Q8']
    
    # Test advanced analytics performance
    advanced_queries = ['Q9', 'Q10']
    
  3. Parameterize Temporal Queries

    params = {
        'start_date': '2020-01-01',
        'end_date': '2020-01-31',
        'min_fare': 5.0,
        'max_fare': 100.0
    }
    
    query = benchmark.get_query("Q8", params=params)
    
  4. Monitor Memory for Large Grouping Operations

    # Large scale factors may require memory configuration
    adapter = DuckDBAdapter(memory_limit="8GB")
    
  5. Use Multiple Iterations for Timing

    from statistics import mean
    
    times = []
    for _ in range(3):
        start = time.time()
        result = conn.execute(query).fetchall()
        times.append(time.time() - start)
    
    avg_time = mean(times)
    

Common Issues

Issue: Slow aggregation queries on large datasets
  • Solution: Use columnar storage and appropriate indices

  • Consider partitioning by date for temporal queries

  • Increase memory limits for large GROUP BY operations

Issue: Memory errors with high scale factors
  • Solution: Start with smaller scale factors (0.01, 0.1)

  • Increase database memory limits

  • Use external aggregation if available

Issue: Incorrect temporal analysis results
  • Solution: Ensure proper timezone handling

  • Filter out NULL or invalid dates

  • Use appropriate date truncation functions

See Also

External Resources