Results Commands¶
Commands for exporting, viewing, and comparing benchmark results.
export - Export Results¶
Re-export existing benchmark results in different formats without re-running benchmarks. Useful for sharing results, generating reports, or converting to spreadsheet-friendly formats.
Options¶
RESULT_FILE: Path to result JSON file to export (optional argument)--format [json|csv|html]: Export format(s) - can specify multiple times (default: json)--output-dir TEXT: Output directory (default: benchmark_runs/results/)--last: Export most recent result file--benchmark TEXT: Filter by benchmark name when using –last--platform TEXT: Filter by platform name when using –last--force: Overwrite existing files without prompting
Supported Export Formats¶
JSON — Complete benchmark results in canonical schema format
Full metadata, metrics, and query results
Suitable for programmatic analysis and archival
Default format during benchmark runs
CSV — Flattened query results for spreadsheet analysis
Query-level details: execution times, status, rows returned
Compatible with Excel, Google Sheets, data analysis tools
Ideal for performance analysis and charting
HTML — Standalone report with formatted tables
Summary metrics and system information
Color-coded query results table
Ready to share with stakeholders
Usage Examples¶
# Export most recent result to CSV
benchbox export --last --format csv
# Export specific result file to multiple formats
benchbox export results/tpch_sf1_duckdb.json --format csv --format html
# Export latest TPC-H result to all formats
benchbox export --last --benchmark tpc_h --format json --format csv --format html
# Export latest DuckDB result to HTML
benchbox export --last --platform duckdb --format html
# Export to custom directory
benchbox export --last --format csv --output-dir ./reports/
# Export with specific file and force overwrite
benchbox export benchmark_runs/results/tpcds_sf10.json --format html --force
Common Workflows¶
Share Results with Team:
# Export recent result as HTML report
benchbox export --last --format html --output-dir ./team_reports/
# Share the HTML file via email or documentation
Analyze in Spreadsheet:
# Export to CSV for Excel/Sheets analysis
benchbox export --last --format csv --output-dir ~/Downloads/
# Open CSV in Excel for charting and analysis
Archive Benchmarks:
# Export all formats for comprehensive archival
benchbox export --last --format json --format csv --format html --output-dir ./archive/
Notes¶
The
exportcommand loads existing result files frombenchmark_runs/results/Schema version 1.0 is required (all recent benchmarks use this version)
Export preserves all metrics and metadata from original results
Large result files (TPC-DS at scale 100+) may take a few seconds to process
The –force flag skips confirmation prompts when overwriting existing files
Use
benchbox resultsto see available result files before exporting
results - Show Benchmark Results¶
Display exported benchmark results and execution history.
Options¶
--limit INTEGER: Number of results to show (default: 10)
Usage Examples¶
# Show recent results
benchbox results
# Show more results
benchbox results --limit 25
compare - Compare Benchmark Results¶
Compare two or more benchmark result files to analyze performance changes. Displays side-by-side query timing comparisons, geometric means, and regression detection suitable for CI/CD workflows.
Basic Syntax¶
benchbox compare BASELINE.json CURRENT.json [OPTIONS]
Options¶
RESULT_FILES: Two or more result JSON files to compare (first file is baseline, required)--fail-on-regression THRESHOLD: Exit with code 1 if any regression exceeds thresholdPercentage format:
10%,5.5%Decimal format:
0.1,0.05
--format [text|json|html]: Output format (default: text)--output FILE: Save comparison output to file instead of stdout--show-all-queries: Show all query comparisons (default: only regressions/improvements)
Output Formats¶
Text (default) — Human-readable comparison report
Color-coded indicators for performance changes
Geometric mean calculation across all queries
Per-query breakdown sorted by severity
Suitable for terminal viewing and logs
JSON — Machine-readable comparison data
Full comparison metrics and query-level details
Suitable for programmatic analysis and dashboards
Includes
performance_changes,query_comparisons, andsummarysections
HTML — Standalone comparison report
Formatted tables with color-coded severity
Summary statistics and per-query breakdown
Ready to share with stakeholders or archive
Usage Examples¶
Basic Comparison:
# Compare two result files
benchbox compare baseline.json current.json
# Compare with all queries shown (not just changes)
benchbox compare baseline.json current.json --show-all-queries
CI/CD Integration:
# Fail pipeline if any query regresses more than 10%
benchbox compare baseline.json current.json --fail-on-regression 10%
# Stricter threshold for critical paths
benchbox compare baseline.json current.json --fail-on-regression 5%
# Using decimal notation
benchbox compare baseline.json current.json --fail-on-regression 0.1
Export Comparison Reports:
# Export as JSON for dashboards
benchbox compare baseline.json current.json --format json --output comparison.json
# Generate HTML report for stakeholders
benchbox compare baseline.json current.json --format html --output report.html
# Save text report to file
benchbox compare baseline.json current.json --output comparison.txt
Comparison Output¶
The comparison report includes:
Summary Section:
Total queries compared
Count of improved, regressed, and unchanged queries
Overall assessment (improved/regressed/mixed)
Performance Metrics:
Average query time change
Total execution time change
Per-metric improvement indicators
Geometric Mean:
Baseline and current geometric means (standard benchmark metric)
Percentage change with severity indicator
Per-Query Breakdown:
Query ID, baseline time, current time, percentage change
Severity classification:
CRITICAL: >50% regressionMAJOR: >25% regressionMINOR: >10% regressionSLIGHT: >1% regressionFASTER: Any improvement
Severity Indicators¶
Indicator |
Meaning |
Threshold |
|---|---|---|
|
Improved (faster) |
Any negative change |
|
Unchanged |
<1% change |
|
Minor regression |
1-10% slower |
|
Major regression |
>10% slower |
Common Workflows¶
Regression Testing in CI/CD:
# 1. Run baseline benchmark (e.g., main branch)
benchbox run --platform duckdb --benchmark tpch --scale 0.1 \
--output ./baseline-results
# 2. Run current benchmark (e.g., feature branch)
benchbox run --platform duckdb --benchmark tpch --scale 0.1 \
--output ./current-results
# 3. Compare and fail on regression
benchbox compare \
baseline-results/results/*.json \
current-results/results/*.json \
--fail-on-regression 10%
Before/After Optimization Analysis:
# Run without tuning
benchbox run --platform snowflake --benchmark tpch --scale 1 \
--tuning notuning --output ./baseline
# Run with tuning
benchbox run --platform snowflake --benchmark tpch --scale 1 \
--tuning tuned --output ./optimized
# Compare results
benchbox compare \
baseline/results/tpch_*.json \
optimized/results/tpch_*.json \
--format html --output tuning-analysis.html
Cross-Platform Comparison:
# Compare DuckDB vs ClickHouse performance
benchbox compare \
duckdb-results/results/tpch_sf1.json \
clickhouse-results/results/tpch_sf1.json \
--show-all-queries
Exit Codes¶
Code |
Meaning |
|---|---|
|
Comparison completed successfully (no regression above threshold) |
|
Regression detected above |
Notes¶
The first result file is always treated as the baseline
Both result files must use schema version 1.0
Results should be from the same benchmark and scale factor for meaningful comparison
Multi-file comparison (>2 files) is planned for a future release
Use
benchbox resultsto find available result files for comparison
Python API¶
For programmatic comparison, use the ResultExporter class:
from pathlib import Path
from benchbox.core.results.exporter import ResultExporter
exporter = ResultExporter()
# Compare two result files
comparison = exporter.compare_results(
Path("baseline.json"),
Path("current.json")
)
# Check overall performance
perf = comparison['performance_changes']['average_query_time']
print(f"Average query time: {perf['change_percent']:.2f}% change")
if perf['improved']:
print("Performance improved!")
# Export as HTML report
report_path = exporter.export_comparison_report(comparison)
print(f"Report saved to: {report_path}")
See Result Analysis API for complete API documentation.