End-to-End (E2E) Testing Guide

Tags contributor guide testing e2e

This guide covers BenchBox’s end-to-end test suite, which validates complete benchmark workflows through the CLI interface.

Overview

The E2E test suite (tests/e2e/) validates that BenchBox works correctly as an integrated system. Unlike unit tests that test individual components or integration tests that test component interactions, E2E tests exercise the full CLI workflow from command invocation through result generation.

Test Count: 125+ test functions across 6 test modules

Test Structure

tests/e2e/
├── conftest.py                 # Fixtures and helpers
├── test_cli_options.py         # CLI option validation (27 tests)
├── test_error_handling.py      # Error handling coverage (25 tests)
├── test_result_validation.py   # Result schema validation (28 tests)
├── test_local_platforms.py     # Local platform tests (14 tests)
├── test_cloud_platforms.py     # Cloud platform dry-run tests (15 tests)
├── test_dataframe_platforms.py # DataFrame platform tests (16 tests)
└── utils/
    ├── platform_detection.py   # Platform availability detection
    └── result_validators.py    # Result JSON validation

Quick Start

Run All E2E Tests

# All E2E tests
uv run -- python -m pytest tests/e2e/ -v

# With parallel execution
uv run -- python -m pytest tests/e2e/ -n auto

Run Quick E2E Tests (Dry-Run Mode)

# Quick tests using dry-run mode (no actual benchmark execution)
make test-e2e-quick
# or
uv run -- python -m pytest -m e2e_quick

Run Local Platform Tests

# Full execution against local databases
uv run -- python -m pytest -m e2e_local

# Specific platform
uv run -- python -m pytest tests/e2e/test_local_platforms.py -v

Test Categories

CLI Options Tests (test_cli_options.py)

Validates all CLI options work correctly:

Option

Tests

--benchmark

TPC-H, TPC-DS, SSB, ClickBench, H2ODB selection

--scale

Scale factor validation and constraints

--phases

Phase selection (generate, load, power, throughput)

--queries

Query subset selection (Q1,Q6,Q17 format)

--tuning

Tuning modes (tuned, notuning, auto)

--compression

Compression formats (zstd, gzip, none)

--validation

Validation levels (exact, loose, range, disabled)

--seed

Deterministic seeding

--dry-run

Dry-run mode output

# Run CLI option tests
uv run -- python -m pytest tests/e2e/test_cli_options.py -v

# Test specific option
uv run -- python -m pytest tests/e2e/test_cli_options.py -k "scale" -v

Error Handling Tests (test_error_handling.py)

Validates proper error messages and exit codes:

  • Missing required parameters

  • Invalid platform/benchmark names

  • Invalid scale factors (TPC-DS requires scale ≥ 1)

  • Query format violations (max 100, alphanumeric IDs)

  • Phase validation errors

  • Compression format errors

  • Constraint violations

# Run error handling tests
uv run -- python -m pytest tests/e2e/test_error_handling.py -v

Result Validation Tests (test_result_validation.py)

Validates benchmark output files:

  • JSON schema compliance

  • Required fields present (platform, benchmark, scale_factor)

  • Metrics validation (successful_queries, execution_times)

  • Result file discovery and parsing

  • Manifest file validation

# Run result validation tests
uv run -- python -m pytest tests/e2e/test_result_validation.py -v

Local Platform Tests (test_local_platforms.py)

Full benchmark execution against local databases:

Platform

Tests

DuckDB

Full TPC-H execution, query subsets

SQLite

Full TPC-H execution

DataFusion

Full TPC-H execution

# Run local platform tests
uv run -- python -m pytest tests/e2e/test_local_platforms.py -v

# Run DuckDB tests only
uv run -- python -m pytest tests/e2e/test_local_platforms.py -k duckdb -v

Note: Local platform tests execute actual benchmarks and may take several minutes.

Cloud Platform Tests (test_cloud_platforms.py)

Dry-run tests for cloud platforms (no credentials required):

Platform

Tests

Snowflake

Dry-run validation, query generation

BigQuery

Dry-run validation, query generation

Redshift

Dry-run validation, query generation

Athena

Dry-run validation, query generation

Databricks

Dry-run validation, query generation

# Run cloud platform dry-run tests
uv run -- python -m pytest tests/e2e/test_cloud_platforms.py -v

DataFrame Platform Tests (test_dataframe_platforms.py)

Tests for DataFrame-based platforms:

Platform

Tests

Polars

Full TPC-H execution

Pandas

Full TPC-H execution

Dask

Full TPC-H execution

# Run DataFrame platform tests
uv run -- python -m pytest tests/e2e/test_dataframe_platforms.py -v

Test Markers

E2E tests use several pytest markers for selective execution:

Marker

Description

e2e

All E2E tests

e2e_quick

Quick dry-run tests

e2e_local

Local platform tests (full execution)

e2e_cloud

Cloud platform dry-run tests

e2e_dataframe

DataFrame platform tests

# Examples
uv run -- python -m pytest -m e2e_quick          # Quick tests only
uv run -- python -m pytest -m e2e_local          # Local platforms
uv run -- python -m pytest -m "e2e and not slow" # Fast E2E tests

Fixtures

The E2E test suite provides fixtures in conftest.py:

Directory Fixtures

@pytest.fixture
def results_dir(tmp_path):
    """Temporary directory for benchmark results."""

@pytest.fixture
def dry_run_dir(tmp_path):
    """Temporary directory for dry-run output."""

Platform Configuration Fixtures

@pytest.fixture
def duckdb_config():
    """Default DuckDB configuration."""
    return {"platform": "duckdb", "benchmark": "tpch", "scale": "0.01"}

@pytest.fixture
def snowflake_dry_run_config(dry_run_dir):
    """Snowflake dry-run configuration."""
    return {"platform": "snowflake", "benchmark": "tpch", "dry_run": str(dry_run_dir)}

Helper Functions

def build_cli_args(config, extra_args=None):
    """Build CLI arguments from config dictionary."""

def run_benchmark(config, extra_args=None, env=None, timeout=600):
    """Run a benchmark with configuration."""

def find_result_files(directory, pattern="*.json"):
    """Find result files in directory."""

Writing E2E Tests

Basic Test Pattern

import pytest
from tests.e2e.conftest import build_cli_args, run_benchmark

def test_basic_benchmark(duckdb_config, results_dir):
    """Test basic benchmark execution."""
    config = {**duckdb_config, "output": str(results_dir)}

    result = run_benchmark(config)

    assert result.returncode == 0
    assert "Benchmark completed" in result.stdout

Testing Error Conditions

def test_invalid_scale_factor():
    """Test that invalid scale factors produce clear errors."""
    config = {"platform": "duckdb", "benchmark": "tpcds", "scale": "0.1"}

    result = run_benchmark(config)

    assert result.returncode != 0
    assert "scale factor" in result.stderr.lower()

Testing Result Files

import json
from tests.e2e.conftest import find_result_files

def test_result_file_schema(duckdb_config, results_dir):
    """Test that result files match expected schema."""
    config = {**duckdb_config, "output": str(results_dir)}
    run_benchmark(config)

    result_files = find_result_files(results_dir / "results")
    assert len(result_files) > 0

    with open(result_files[0]) as f:
        data = json.load(f)

    assert "platform" in data
    assert "benchmark" in data
    assert "successful_queries" in data

CI/CD Integration

GitHub Actions

jobs:
  e2e-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: pip install -e .[dev]

      - name: Run E2E quick tests
        run: uv run -- python -m pytest -m e2e_quick -v

      - name: Run E2E local tests
        run: uv run -- python -m pytest -m e2e_local -v
        timeout-minutes: 15

Makefile Targets

test-e2e-quick:
	uv run -- python -m pytest -m e2e_quick -v

test-e2e-local:
	uv run -- python -m pytest -m e2e_local -v

test-e2e-all:
	uv run -- python -m pytest tests/e2e/ -v

Troubleshooting

Tests Timing Out

E2E tests have a 10-minute timeout by default. For slower systems:

# Increase timeout
uv run -- python -m pytest tests/e2e/ --timeout=900

ClickHouse Tests Without Dependencies

The test suite includes automatic stub generation for ClickHouse dependencies:

@pytest.fixture
def clickhouse_stub_dir(tmp_path):
    """Create minimal chDB and clickhouse_driver stubs."""
    # Stubs are created automatically for dry-run tests

Debugging Test Failures

# Run with verbose output
uv run -- python -m pytest tests/e2e/test_cli_options.py -v -s

# Run single test with debugging
uv run -- python -m pytest tests/e2e/test_cli_options.py::test_benchmark_selection -v -s --tb=long

Result File Inspection

# Find generated result files
find /tmp -name "*.json" -path "*/benchmark_results/*" 2>/dev/null

# Validate JSON schema
python -c "import json; json.load(open('result.json'))"