End-to-End (E2E) Testing Guide¶
This guide covers BenchBox’s end-to-end test suite, which validates complete benchmark workflows through the CLI interface.
Overview¶
The E2E test suite (tests/e2e/) validates that BenchBox works correctly as an integrated system. Unlike unit tests that test individual components or integration tests that test component interactions, E2E tests exercise the full CLI workflow from command invocation through result generation.
Test Count: 125+ test functions across 6 test modules
Test Structure¶
tests/e2e/
├── conftest.py # Fixtures and helpers
├── test_cli_options.py # CLI option validation (27 tests)
├── test_error_handling.py # Error handling coverage (25 tests)
├── test_result_validation.py # Result schema validation (28 tests)
├── test_local_platforms.py # Local platform tests (14 tests)
├── test_cloud_platforms.py # Cloud platform dry-run tests (15 tests)
├── test_dataframe_platforms.py # DataFrame platform tests (16 tests)
└── utils/
├── platform_detection.py # Platform availability detection
└── result_validators.py # Result JSON validation
Quick Start¶
Run All E2E Tests¶
# All E2E tests
uv run -- python -m pytest tests/e2e/ -v
# With parallel execution
uv run -- python -m pytest tests/e2e/ -n auto
Run Quick E2E Tests (Dry-Run Mode)¶
# Quick tests using dry-run mode (no actual benchmark execution)
make test-e2e-quick
# or
uv run -- python -m pytest -m e2e_quick
Run Local Platform Tests¶
# Full execution against local databases
uv run -- python -m pytest -m e2e_local
# Specific platform
uv run -- python -m pytest tests/e2e/test_local_platforms.py -v
Test Categories¶
CLI Options Tests (test_cli_options.py)¶
Validates all CLI options work correctly:
Option |
Tests |
|---|---|
|
TPC-H, TPC-DS, SSB, ClickBench, H2ODB selection |
|
Scale factor validation and constraints |
|
Phase selection (generate, load, power, throughput) |
|
Query subset selection (Q1,Q6,Q17 format) |
|
Tuning modes (tuned, notuning, auto) |
|
Compression formats (zstd, gzip, none) |
|
Validation levels (exact, loose, range, disabled) |
|
Deterministic seeding |
|
Dry-run mode output |
# Run CLI option tests
uv run -- python -m pytest tests/e2e/test_cli_options.py -v
# Test specific option
uv run -- python -m pytest tests/e2e/test_cli_options.py -k "scale" -v
Error Handling Tests (test_error_handling.py)¶
Validates proper error messages and exit codes:
Missing required parameters
Invalid platform/benchmark names
Invalid scale factors (TPC-DS requires scale ≥ 1)
Query format violations (max 100, alphanumeric IDs)
Phase validation errors
Compression format errors
Constraint violations
# Run error handling tests
uv run -- python -m pytest tests/e2e/test_error_handling.py -v
Result Validation Tests (test_result_validation.py)¶
Validates benchmark output files:
JSON schema compliance
Required fields present (platform, benchmark, scale_factor)
Metrics validation (successful_queries, execution_times)
Result file discovery and parsing
Manifest file validation
# Run result validation tests
uv run -- python -m pytest tests/e2e/test_result_validation.py -v
Local Platform Tests (test_local_platforms.py)¶
Full benchmark execution against local databases:
Platform |
Tests |
|---|---|
DuckDB |
Full TPC-H execution, query subsets |
SQLite |
Full TPC-H execution |
DataFusion |
Full TPC-H execution |
# Run local platform tests
uv run -- python -m pytest tests/e2e/test_local_platforms.py -v
# Run DuckDB tests only
uv run -- python -m pytest tests/e2e/test_local_platforms.py -k duckdb -v
Note: Local platform tests execute actual benchmarks and may take several minutes.
Cloud Platform Tests (test_cloud_platforms.py)¶
Dry-run tests for cloud platforms (no credentials required):
Platform |
Tests |
|---|---|
Snowflake |
Dry-run validation, query generation |
BigQuery |
Dry-run validation, query generation |
Redshift |
Dry-run validation, query generation |
Athena |
Dry-run validation, query generation |
Databricks |
Dry-run validation, query generation |
# Run cloud platform dry-run tests
uv run -- python -m pytest tests/e2e/test_cloud_platforms.py -v
DataFrame Platform Tests (test_dataframe_platforms.py)¶
Tests for DataFrame-based platforms:
Platform |
Tests |
|---|---|
Polars |
Full TPC-H execution |
Pandas |
Full TPC-H execution |
Dask |
Full TPC-H execution |
# Run DataFrame platform tests
uv run -- python -m pytest tests/e2e/test_dataframe_platforms.py -v
Test Markers¶
E2E tests use several pytest markers for selective execution:
Marker |
Description |
|---|---|
|
All E2E tests |
|
Quick dry-run tests |
|
Local platform tests (full execution) |
|
Cloud platform dry-run tests |
|
DataFrame platform tests |
# Examples
uv run -- python -m pytest -m e2e_quick # Quick tests only
uv run -- python -m pytest -m e2e_local # Local platforms
uv run -- python -m pytest -m "e2e and not slow" # Fast E2E tests
Fixtures¶
The E2E test suite provides fixtures in conftest.py:
Directory Fixtures¶
@pytest.fixture
def results_dir(tmp_path):
"""Temporary directory for benchmark results."""
@pytest.fixture
def dry_run_dir(tmp_path):
"""Temporary directory for dry-run output."""
Platform Configuration Fixtures¶
@pytest.fixture
def duckdb_config():
"""Default DuckDB configuration."""
return {"platform": "duckdb", "benchmark": "tpch", "scale": "0.01"}
@pytest.fixture
def snowflake_dry_run_config(dry_run_dir):
"""Snowflake dry-run configuration."""
return {"platform": "snowflake", "benchmark": "tpch", "dry_run": str(dry_run_dir)}
Helper Functions¶
def build_cli_args(config, extra_args=None):
"""Build CLI arguments from config dictionary."""
def run_benchmark(config, extra_args=None, env=None, timeout=600):
"""Run a benchmark with configuration."""
def find_result_files(directory, pattern="*.json"):
"""Find result files in directory."""
Writing E2E Tests¶
Basic Test Pattern¶
import pytest
from tests.e2e.conftest import build_cli_args, run_benchmark
def test_basic_benchmark(duckdb_config, results_dir):
"""Test basic benchmark execution."""
config = {**duckdb_config, "output": str(results_dir)}
result = run_benchmark(config)
assert result.returncode == 0
assert "Benchmark completed" in result.stdout
Testing Error Conditions¶
def test_invalid_scale_factor():
"""Test that invalid scale factors produce clear errors."""
config = {"platform": "duckdb", "benchmark": "tpcds", "scale": "0.1"}
result = run_benchmark(config)
assert result.returncode != 0
assert "scale factor" in result.stderr.lower()
Testing Result Files¶
import json
from tests.e2e.conftest import find_result_files
def test_result_file_schema(duckdb_config, results_dir):
"""Test that result files match expected schema."""
config = {**duckdb_config, "output": str(results_dir)}
run_benchmark(config)
result_files = find_result_files(results_dir / "results")
assert len(result_files) > 0
with open(result_files[0]) as f:
data = json.load(f)
assert "platform" in data
assert "benchmark" in data
assert "successful_queries" in data
CI/CD Integration¶
GitHub Actions¶
jobs:
e2e-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: pip install -e .[dev]
- name: Run E2E quick tests
run: uv run -- python -m pytest -m e2e_quick -v
- name: Run E2E local tests
run: uv run -- python -m pytest -m e2e_local -v
timeout-minutes: 15
Makefile Targets¶
test-e2e-quick:
uv run -- python -m pytest -m e2e_quick -v
test-e2e-local:
uv run -- python -m pytest -m e2e_local -v
test-e2e-all:
uv run -- python -m pytest tests/e2e/ -v
Troubleshooting¶
Tests Timing Out¶
E2E tests have a 10-minute timeout by default. For slower systems:
# Increase timeout
uv run -- python -m pytest tests/e2e/ --timeout=900
ClickHouse Tests Without Dependencies¶
The test suite includes automatic stub generation for ClickHouse dependencies:
@pytest.fixture
def clickhouse_stub_dir(tmp_path):
"""Create minimal chDB and clickhouse_driver stubs."""
# Stubs are created automatically for dry-run tests
Debugging Test Failures¶
# Run with verbose output
uv run -- python -m pytest tests/e2e/test_cli_options.py -v -s
# Run single test with debugging
uv run -- python -m pytest tests/e2e/test_cli_options.py::test_benchmark_selection -v -s --tb=long
Result File Inspection¶
# Find generated result files
find /tmp -name "*.json" -path "*/benchmark_results/*" 2>/dev/null
# Validate JSON schema
python -c "import json; json.load(open('result.json'))"