Benchmarks

Tags concept

This section provides systematic documentation for each benchmark implementation in BenchBox. Each benchmark is designed for specific testing scenarios and database evaluation needs.

Contents

Roadmap & Research

Potential Future Benchmarks

Snapshot of planned additions captured in the benchmark-expansion worktree, including NYC Taxi analytics, open data lake formats, TPC-H Skew, TSBS scenarios, and TPC-DS hardening goals.

TPC Standards

Official industry standards for comparing databases

TPC-H

TPC-H Decision Support Benchmark

  • Purpose: Standard decision support and analytical query performance

  • Queries: 22 business-oriented analytical queries

  • Schema: 8 tables modeling a wholesale supplier database

  • Scale: 0.001 to 1000+ scale factors supported

  • Use Cases: General analytical performance, database comparison, regression testing

TPC-DS

TPC-DS Decision Support Benchmark

  • Purpose: Complex analytical workloads with advanced SQL features

  • Queries: 99 complex analytical queries with CTEs, window functions

  • Schema: 24 tables modeling a retail data warehouse

  • Scale: 0.001 to 100+ scale factors supported

  • Use Cases: Advanced-level analytics, complex query optimization, modern SQL feature testing

TPC-DI

TPC-DI Data Integration Benchmark

  • Purpose: Data integration and ETL performance testing

  • Queries: Data transformation and loading operations

  • Schema: Dimensional data warehouse with fact and dimension tables

  • Scale: Variable based on data volume requirements

  • Use Cases: ETL performance, data pipeline testing, integration workflows

Data Modeling Variants

Alternative schema designs derived from TPC standards

TPC-H Data Vault

TPC-H Data Vault Benchmark

  • Purpose: TPC-H adapted for Data Vault 2.0 modeling pattern performance testing

  • Queries: 22 TPC-H queries adapted for Hub-Link-Satellite joins

  • Schema: 21 tables (7 Hubs, 6 Links, 8 Satellites) from TPC-H source

  • Scale: Same as TPC-H (0.01 to 100+)

  • Use Cases: Data Vault architecture evaluation, hub-link-satellite join performance, enterprise DWH patterns

Academic Benchmarks

Research benchmarks from academia

Star Schema Benchmark (SSB)

Star Schema Benchmark

  • Purpose: Star schema analytical query performance

  • Queries: 13 queries across 4 query flights

  • Schema: Classic star schema with fact and dimension tables

  • Scale: 1 to 1000+ scale factors supported

  • Use Cases: OLAP performance, dimensional modeling validation, columnar database testing

AMPLab

AMPLab Big Data Benchmark

  • Purpose: Big data processing and analytics performance

  • Queries: Web analytics and user behavior queries

  • Schema: Web analytics data model

  • Scale: Large-scale data processing focused

  • Use Cases: Big data performance, distributed computing, web analytics

Join Order Benchmark

Join Order Optimization Benchmark

  • Purpose: Query optimizer join order evaluation

  • Queries: 13 complex multi-table join queries

  • Schema: 21 tables modeling IMDB movie database

  • Scale: Complex join patterns with realistic cardinalities

  • Use Cases: Query optimizer testing, join algorithm evaluation, cardinality estimation

Industry Benchmarks

Real-world benchmarks from practitioners

ClickBench

ClickHouse Analytical Benchmark

  • Purpose: High-performance analytical query testing

  • Queries: 43 analytical queries from real web analytics

  • Schema: Web analytics event data

  • Scale: High-volume analytical workloads

  • Use Cases: Analytical database performance, real-time analytics, columnar storage

H2ODB

H2O Database Benchmark

  • Purpose: Data science and machine learning workloads

  • Queries: Aggregation and groupby operations

  • Schema: Data science oriented tables

  • Scale: Memory and computation intensive

  • Use Cases: Data science performance, in-memory analytics, aggregation optimization

NYC Taxi

NYC Taxi OLAP Benchmark

  • Purpose: Real-world transportation data analytics

  • Queries: 25 OLAP queries across 9 categories

  • Schema: NYC TLC trip records with taxi zones dimension

  • Scale: From 300K to 3B trips

  • Use Cases: Geospatial analytics, temporal aggregation, multi-dimensional OLAP

TSBS DevOps

Time Series Benchmark Suite for DevOps

  • Purpose: Infrastructure monitoring and observability workloads

  • Queries: 18 DevOps monitoring queries across 10 categories

  • Schema: CPU, memory, disk, network metrics with host tags

  • Scale: From 10 hosts to thousands, configurable duration

  • Use Cases: Time-series database evaluation, monitoring dashboards, alerting queries

CoffeeShop

CoffeeShop Point-of-Sale Benchmark

  • Purpose: Real-world transactional and analytical workloads

  • Queries: Point-of-sale operations and reporting

  • Schema: Retail transaction data model

  • Scale: Configurable based on business volume

  • Use Cases: Hybrid workload testing, real-world performance validation

BenchBox Primitives

Fundamental database operation testing

Read Primitives

Database Read Primitives Benchmark

  • Purpose: Individual database operation testing

  • Queries: 109 primitive operation queries across 26 categories

  • Schema: TPC-H schema for familiar data patterns

  • Scale: Lightweight focused testing

  • Use Cases: Unit testing, regression detection, optimizer sniff tests, development validation

Write Primitives

Database Write Operations Benchmark (v2)

  • Purpose: Comprehensive non-transactional write operation testing

  • Queries: 109 write operations across 6 categories (INSERT, UPDATE, DELETE, BULK_LOAD, MERGE, DDL)

  • Schema: TPC-H schema with 16 category-based staging tables

  • Scale: Configurable via scale factors

  • Platform Support: Broad (ClickHouse, BigQuery, Snowflake, Databricks, Redshift, DuckDB, SQLite)

  • Use Cases: Write performance testing, MERGE/UPSERT optimization, CDC workloads, ETL validation

Transaction Primitives

Database Transaction Testing Benchmark

  • Purpose: ACID transaction guarantees and isolation level testing

  • Queries: 8 transaction operations (COMMIT, ROLLBACK, SAVEPOINT, isolation levels)

  • Schema: TPC-H schema with transaction staging tables

  • Scale: Configurable via scale factors

  • Platform Support: Designed for ACID databases (PostgreSQL, MySQL, SQL Server, Oracle - adapters planned; currently limited support via DuckDB/SQLite)

  • Use Cases: Transaction isolation testing, ACID compliance validation, concurrency testing

BenchBox Experimental

Experimental benchmarks for specialized testing

TPC-DS-OBT

TPC-DS One Big Table Benchmark

  • Purpose: Denormalized wide-table analytical workloads

  • Queries: TPC-DS queries adapted for single-table schema

  • Schema: Single denormalized “One Big Table” with all TPC-DS attributes

  • Scale: Same as TPC-DS (1.0 to 100+)

  • Use Cases: Wide-table analytics, columnar database testing, denormalization trade-off analysis

TPC-Havoc

TPC-H Syntax Variants Benchmark

  • Purpose: Query optimizer stress testing through syntax variations

  • Queries: 220 variants (22 TPC-H queries × 10 syntax variants each)

  • Schema: TPC-H schema with identical data

  • Scale: Same as TPC-H but focused on optimization differences

  • Use Cases: Optimizer robustness testing, syntax transformation analysis, performance variance evaluation

TPC-H Skew

TPC-H Data Skew Benchmark

  • Purpose: Test optimizer behavior on non-uniform data distributions

  • Queries: 22 TPC-H queries with configurable skew parameters

  • Schema: TPC-H schema with Zipfian, normal, or exponential data distributions

  • Scale: Same as TPC-H (0.01 to 10+)

  • Use Cases: Skew handling, cardinality estimation testing, join strategy evaluation under data imbalance

Benchmark Selection Guide

By Use Case

General Performance Testing

  • Primary: TPC-H - Industry standard, widely comparable

  • Secondary: SSB - Simpler star schema alternative

  • Advanced: TPC-DS - Complex modern analytical workloads

  • Wide Tables: TPC-DS-OBT - Denormalized single-table analytics

  • Enterprise DWH: Data Vault - Hub-Link-Satellite modeling patterns

Query Optimizer Testing

Specific Operation Testing

Development and CI/CD

Research and Analysis

By Database Type

OLTP Databases

OLAP/Analytical Databases

  • TPC-DS for complex analytics

  • ClickBench for high-performance analytics

  • SSB for dimensional modeling

Distributed/Big Data Systems

  • AMPLab for big data workloads

  • TPC-DS for complex distributed queries

  • H2ODB for data science workloads

In-Memory Databases

By Performance Characteristics

CPU-Intensive Workloads

I/O-Intensive Workloads

Memory-Intensive Workloads

Common Patterns

Quick Performance Test

from benchbox import TPCH, Primitives

# Standard analytical performance
tpch = TPCH(scale_factor=0.1)
key_queries = [1, 3, 6, 12]  # Representative subset

# Focused operation testing
read_primitives = ReadPrimitives(scale_factor=0.01)
perf_critical = read_primitives.get_queries_by_category("aggregation")

Comprehensive Evaluation

from benchbox import TPCH, TPCDS, JoinOrder

# Multi-benchmark evaluation
benchmarks = {
    "tpch": TPCH(scale_factor=1.0),
    "tpcds": TPCDS(scale_factor=1.0),
    "joinorder": JoinOrder(scale_factor=0.1)
}

# Run full suite
results = {}
for name, benchmark in benchmarks.items():
    results[name] = run_benchmark(benchmark)

Regression Testing

from benchbox import ReadPrimitives

# Focused regression detection
read_primitives = ReadPrimitives(scale_factor=0.001)  # Fast execution
regression_queries = [
    "aggregation_basic",
    "join_inner_simple",
    "filter_selective",
    "sort_large_result"
]

# Compare against baseline
for query_id in regression_queries:
    current_time = measure_query(primitives.get_query(query_id))
    assert current_time < baseline_time * 1.1  # Allow 10% variance

Resource Characteristics

Query Complexity by Benchmark

Benchmark

Simple Queries

Complex Queries

Total Queries

TPC-H

Q1, Q6, Q14

Q2, Q9, Q21

22

TPC-DS

Q1, Q6, Q11

Q14, Q67, Q78

99

SSB

Q1.1-Q1.3

Q4.1-Q4.3

13

Join Order

2-3 table joins

6+ table joins

113

Primitives

Single-table scans

Complex aggregations

50+

ClickBench

Simple scans

Multi-column analytics

43

Note: Execution times vary significantly based on platform, hardware, and configuration. Run benchmarks to establish baselines for your environment.

Memory Requirements (Scale Factor 1.0)

Benchmark

Data Size

Working Memory

Peak Memory

TPC-H

~1GB

~2GB

~4GB

TPC-DS

~3GB

~6GB

~12GB

SSB

~600MB

~1GB

~2GB

Join Order

~17MB

~50MB

~100MB

Primitives

~1GB

~2GB

~4GB

Integration Examples

Each benchmark documentation includes:

  • Setup instructions for popular databases

  • Performance tuning recommendations

  • Common pitfalls and troubleshooting

  • Integration patterns for different use cases

  • Expected results and validation guidelines

Getting Help

Benchmark-Specific Issues

  • Check individual benchmark documentation for known issues

  • Review database-specific integration examples

  • Validate scale factor and configuration settings

General Support


Choose the right benchmark for your specific testing needs. For general performance evaluation, start with TPC-H. For focused testing, use Primitives. For optimizer research, explore Join Order Benchmark.