Contributing Benchmark Results

Thank you for contributing to the BenchBox community results dataset! Community-submitted results help everyone compare platforms on real workloads and grow the benchmark coverage.

Prerequisites

  1. Install BenchBox - follow the Getting Started guide

  2. Run a benchmark - you need a complete benchmark result to submit

Step-by-Step Submission Flow

1. Run a benchmark

Run a complete benchmark suite (not a cherry-picked subset of queries):

benchbox run --platform duckdb --benchmark tpch --scale 0.01

The result JSON is written to benchmark_runs/results/.

2. Package the submission

Use benchbox submit to create a submission package:

# Package the most recent result
benchbox submit --last --output ./submission

# Or specify a result file directly
benchbox submit benchmark_runs/results/tpch_sf001_duckdb_20260401_120000.json --output ./submission

# Preview what would be packaged (no files written)
benchbox submit --last --dry-run

This creates a submission/ directory containing:

File

Description

bundle/<result>.json

The canonical schema-v2 result bundle

bundle/<result>.plans.json

Query execution plans (if captured)

bundle/<result>.tuning.json

Tuning configuration (if used)

submission-manifest.json

Metadata: hash, benchmark, platform, contributor

CONTRIBUTING.md

PR instructions (for reference)

3. Fork and open a PR

  1. Fork the BenchBox repository on GitHub (or use your existing fork)

  2. Copy the contents of submission/bundle/ into results-data/bundles/ in your fork

  3. Copy submission/submission-manifest.json alongside the bundle files

  4. Regenerate the inventory before you commit:

    uv run -- python scripts/generate_corpus_inventory.py --write
    
  5. Commit and open a pull request against the published-results branch of joeharris76/BenchBox (the public repository)

Use this PR title format:

results: <benchmark> <platform> sf<scale>

Example: results: tpch DuckDB sf1.0

4. CI validation

When your PR is opened, the Validate Submission workflow runs automatically. It checks:

  • Schema compliance - the bundle is valid schema-v2 JSON with all required fields

  • Hash verification - the SHA-256 hash in the manifest matches the bundle contents

  • Sanity checks - no all-zero timings, no negative durations, valid platform/benchmark names

  • Metadata extraction - a summary comment is posted on the PR showing what the submission adds

If validation fails, the PR comment will explain what to fix. The workflow also checks that results-data/corpus-inventory.json matches the submitted bundles. If that check fails, rerun:

uv run -- python scripts/generate_corpus_inventory.py --write

5. Review and merge

A maintainer reviews the submission for quality and environment consistency. Once approved and merged, the docs CI workflow automatically rebuilds the results explorer with the new data.

What Makes a Good Submission

  • Complete benchmark suite - run the full query set, not a cherry-picked subset

  • Stable environment - run on a dedicated machine or instance, not a shared laptop under load

  • Default configuration - unless you’re specifically benchmarking a tuned configuration, use defaults

  • Reproducible - include enough environment metadata that someone else could replicate the run

  • Honest results - don’t hand-optimize queries or cherry-pick favorable runs

Trust Labels

Results in the explorer carry trust labels:

Label

Meaning

Maintainer Run

Generated by BenchBox CI or project maintainers

Community Submission

Contributed via PR from the community

CI

Generated by automated CI pipelines

Local

Local/development runs

Community submissions are labeled “Community” in the explorer to distinguish them from maintainer-curated results.

Quality Expectations

Submissions that don’t meet these criteria may be asked for revisions:

  1. Full query coverage - all queries in the benchmark must be included

  2. No synthetic data - results must come from actual benchmark execution

  3. Reasonable timings - query durations should be plausible for the platform and scale factor

  4. Valid metadata - benchmark ID, platform name, and scale factor must match known values

  5. Schema v2 format - only the current schema version is accepted

Running Validation Locally

You can validate your bundle before opening a PR:

# Validate a specific bundle
uv run -- python scripts/validate_submission.py path/to/result.json

# Validate all bundles in a directory
uv run -- python scripts/validate_submission.py results-data/bundles/

# Verify the inventory is current before you open the PR
uv run -- python scripts/generate_corpus_inventory.py --check

If you use pre-commit locally, install the hooks once so inventory drift is checked automatically:

pre-commit install

Questions?

Open an issue or start a discussion.

Maintainers: see Phase 2 Results Operations Runbook.