Contributing Benchmark Results¶
Thank you for contributing to the BenchBox community results dataset! Community-submitted results help everyone compare platforms on real workloads and grow the benchmark coverage.
Prerequisites¶
Install BenchBox - follow the Getting Started guide
Run a benchmark - you need a complete benchmark result to submit
Step-by-Step Submission Flow¶
1. Run a benchmark¶
Run a complete benchmark suite (not a cherry-picked subset of queries):
benchbox run --platform duckdb --benchmark tpch --scale 0.01
The result JSON is written to benchmark_runs/results/.
2. Package the submission¶
Use benchbox submit to create a submission package:
# Package the most recent result
benchbox submit --last --output ./submission
# Or specify a result file directly
benchbox submit benchmark_runs/results/tpch_sf001_duckdb_20260401_120000.json --output ./submission
# Preview what would be packaged (no files written)
benchbox submit --last --dry-run
This creates a submission/ directory containing:
File |
Description |
|---|---|
|
The canonical schema-v2 result bundle |
|
Query execution plans (if captured) |
|
Tuning configuration (if used) |
|
Metadata: hash, benchmark, platform, contributor |
|
PR instructions (for reference) |
3. Fork and open a PR¶
Fork the BenchBox repository on GitHub (or use your existing fork)
Copy the contents of
submission/bundle/intoresults-data/bundles/in your forkCopy
submission/submission-manifest.jsonalongside the bundle filesRegenerate the inventory before you commit:
uv run -- python scripts/generate_corpus_inventory.py --write
Commit and open a pull request against the
published-resultsbranch ofjoeharris76/BenchBox(the public repository)
Use this PR title format:
results: <benchmark> <platform> sf<scale>
Example: results: tpch DuckDB sf1.0
4. CI validation¶
When your PR is opened, the Validate Submission workflow runs automatically. It checks:
Schema compliance - the bundle is valid schema-v2 JSON with all required fields
Hash verification - the SHA-256 hash in the manifest matches the bundle contents
Sanity checks - no all-zero timings, no negative durations, valid platform/benchmark names
Metadata extraction - a summary comment is posted on the PR showing what the submission adds
If validation fails, the PR comment will explain what to fix. The workflow also
checks that results-data/corpus-inventory.json matches the submitted bundles.
If that check fails, rerun:
uv run -- python scripts/generate_corpus_inventory.py --write
5. Review and merge¶
A maintainer reviews the submission for quality and environment consistency. Once approved and merged, the docs CI workflow automatically rebuilds the results explorer with the new data.
What Makes a Good Submission¶
Complete benchmark suite - run the full query set, not a cherry-picked subset
Stable environment - run on a dedicated machine or instance, not a shared laptop under load
Default configuration - unless you’re specifically benchmarking a tuned configuration, use defaults
Reproducible - include enough environment metadata that someone else could replicate the run
Honest results - don’t hand-optimize queries or cherry-pick favorable runs
Trust Labels¶
Results in the explorer carry trust labels:
Label |
Meaning |
|---|---|
Maintainer Run |
Generated by BenchBox CI or project maintainers |
Community Submission |
Contributed via PR from the community |
CI |
Generated by automated CI pipelines |
Local |
Local/development runs |
Community submissions are labeled “Community” in the explorer to distinguish them from maintainer-curated results.
Quality Expectations¶
Submissions that don’t meet these criteria may be asked for revisions:
Full query coverage - all queries in the benchmark must be included
No synthetic data - results must come from actual benchmark execution
Reasonable timings - query durations should be plausible for the platform and scale factor
Valid metadata - benchmark ID, platform name, and scale factor must match known values
Schema v2 format - only the current schema version is accepted
Running Validation Locally¶
You can validate your bundle before opening a PR:
# Validate a specific bundle
uv run -- python scripts/validate_submission.py path/to/result.json
# Validate all bundles in a directory
uv run -- python scripts/validate_submission.py results-data/bundles/
# Verify the inventory is current before you open the PR
uv run -- python scripts/generate_corpus_inventory.py --check
If you use pre-commit locally, install the hooks once so inventory drift is checked automatically:
pre-commit install
Questions?¶
Open an issue or start a discussion.
Maintainers: see Phase 2 Results Operations Runbook.