Results Commands¶

Commands for exporting, viewing, and comparing benchmark results.

`export` - Export Results¶

Re-export existing benchmark results in different formats without re-running benchmarks. Useful for sharing results, generating reports, or converting to spreadsheet-friendly formats.

Options¶

RESULT_FILE: Path to result JSON file to export (optional argument)
--format [json|csv|html]: Export format(s) - can specify multiple times (default: json)
--output-dir TEXT: Output directory (default: benchmark_runs/results/)
--last: Export most recent result file
--benchmark TEXT: Filter by benchmark name when using –last
--platform TEXT: Filter by platform name when using –last
--force: Overwrite existing files without prompting

Supported Export Formats¶

JSON — Complete benchmark results in canonical schema format

Full metadata, metrics, and query results
Suitable for programmatic analysis and archival
Default format during benchmark runs

CSV — Flattened query results for spreadsheet analysis

Query-level details: execution times, status, rows returned
Compatible with Excel, Google Sheets, data analysis tools
Ideal for performance analysis and charting

HTML — Standalone report with formatted tables

Summary metrics and system information
Color-coded query results table
Ready to share with stakeholders

Usage Examples¶

# Export most recent result to CSV
benchbox export --last --format csv

# Export specific result file to multiple formats
benchbox export results/tpch_sf1_duckdb.json --format csv --format html

# Export latest TPC-H result to all formats
benchbox export --last --benchmark tpc_h --format json --format csv --format html

# Export latest DuckDB result to HTML
benchbox export --last --platform duckdb --format html

# Export to custom directory
benchbox export --last --format csv --output-dir ./reports/

# Export with specific file and force overwrite
benchbox export benchmark_runs/results/tpcds_sf10.json --format html --force

Common Workflows¶

Share Results with Team:

# Export recent result as HTML report
benchbox export --last --format html --output-dir ./team_reports/
# Share the HTML file via email or documentation

Analyze in Spreadsheet:

# Export to CSV for Excel/Sheets analysis
benchbox export --last --format csv --output-dir ~/Downloads/
# Open CSV in Excel for charting and analysis

Archive Benchmarks:

# Export all formats for comprehensive archival
benchbox export --last --format json --format csv --format html --output-dir ./archive/

Notes¶

The export command loads existing result files from benchmark_runs/results/
Schema version 1.0 is required (all recent benchmarks use this version)
Export preserves all metrics and metadata from original results
Large result files (TPC-DS at scale 100+) may take a few seconds to process
The –force flag skips confirmation prompts when overwriting existing files
Use benchbox results to see available result files before exporting

`results` - Show Benchmark Results¶

Display exported benchmark results and execution history.

Options¶

--limit INTEGER: Number of results to show (default: 10)

Usage Examples¶

# Show recent results
benchbox results

# Show more results
benchbox results --limit 25

`compare` - Compare Benchmark Results¶

Compare two or more benchmark result files to analyze performance changes. Displays side-by-side query timing comparisons, geometric means, and regression detection suitable for CI/CD workflows.

Basic Syntax¶

benchbox compare BASELINE.json CURRENT.json [OPTIONS]

Options¶

RESULT_FILES: Two or more result JSON files to compare (first file is baseline, required)
--fail-on-regression THRESHOLD: Exit with code 1 if any regression exceeds threshold
- Percentage format: 10%, 5.5%
- Decimal format: 0.1, 0.05
--format [text|json|html]: Output format (default: text)
--output FILE: Save comparison output to file instead of stdout
--show-all-queries: Show all query comparisons (default: only regressions/improvements)

Output Formats¶

Text (default) — Human-readable comparison report

Color-coded indicators for performance changes
Geometric mean calculation across all queries
Per-query breakdown sorted by severity
Suitable for terminal viewing and logs

JSON — Machine-readable comparison data

Full comparison metrics and query-level details
Suitable for programmatic analysis and dashboards
Includes performance_changes, query_comparisons, and summary sections

HTML — Standalone comparison report

Formatted tables with color-coded severity
Summary statistics and per-query breakdown
Ready to share with stakeholders or archive

Usage Examples¶

Basic Comparison:

# Compare two result files
benchbox compare baseline.json current.json

# Compare with all queries shown (not just changes)
benchbox compare baseline.json current.json --show-all-queries

CI/CD Integration:

# Fail pipeline if any query regresses more than 10%
benchbox compare baseline.json current.json --fail-on-regression 10%

# Stricter threshold for critical paths
benchbox compare baseline.json current.json --fail-on-regression 5%

# Using decimal notation
benchbox compare baseline.json current.json --fail-on-regression 0.1

Export Comparison Reports:

# Export as JSON for dashboards
benchbox compare baseline.json current.json --format json --output comparison.json

# Generate HTML report for stakeholders
benchbox compare baseline.json current.json --format html --output report.html

# Save text report to file
benchbox compare baseline.json current.json --output comparison.txt

Comparison Output¶

The comparison report includes:

Summary Section:

Total queries compared
Count of improved, regressed, and unchanged queries
Overall assessment (improved/regressed/mixed)

Performance Metrics:

Average query time change
Total execution time change
Per-metric improvement indicators

Geometric Mean:

Baseline and current geometric means (standard benchmark metric)
Percentage change with severity indicator

Per-Query Breakdown:

Query ID, baseline time, current time, percentage change
Severity classification:
- CRITICAL: >50% regression
- MAJOR: >25% regression
- MINOR: >10% regression
- SLIGHT: >1% regression
- FASTER: Any improvement

Severity Indicators¶

Indicator	Meaning	Threshold
`🟢`	Improved (faster)	Any negative change
`⚪`	Unchanged	<1% change
`🟡`	Minor regression	1-10% slower
`🔴`	Major regression	>10% slower

Common Workflows¶

Regression Testing in CI/CD:

# 1. Run baseline benchmark (e.g., main branch)
benchbox run --platform duckdb --benchmark tpch --scale 0.1 \
  --output ./baseline-results

# 2. Run current benchmark (e.g., feature branch)
benchbox run --platform duckdb --benchmark tpch --scale 0.1 \
  --output ./current-results

# 3. Compare and fail on regression
benchbox compare \
  baseline-results/results/*.json \
  current-results/results/*.json \
  --fail-on-regression 10%

Before/After Optimization Analysis:

# Run without tuning
benchbox run --platform snowflake --benchmark tpch --scale 1 \
  --tuning notuning --output ./baseline

# Run with tuning
benchbox run --platform snowflake --benchmark tpch --scale 1 \
  --tuning tuned --output ./optimized

# Compare results
benchbox compare \
  baseline/results/tpch_*.json \
  optimized/results/tpch_*.json \
  --format html --output tuning-analysis.html

Cross-Platform Comparison:

# Compare DuckDB vs ClickHouse performance
benchbox compare \
  duckdb-results/results/tpch_sf1.json \
  clickhouse-results/results/tpch_sf1.json \
  --show-all-queries

Exit Codes¶

Code	Meaning
`0`	Comparison completed successfully (no regression above threshold)
`1`	Regression detected above `--fail-on-regression` threshold, or error occurred

Notes¶

The first result file is always treated as the baseline
Both result files must use schema version 1.0
Results should be from the same benchmark and scale factor for meaningful comparison
Multi-file comparison (>2 files) is planned for a future release
Use benchbox results to find available result files for comparison

Python API¶

For programmatic comparison, use the ResultExporter class:

from pathlib import Path
from benchbox.core.results.exporter import ResultExporter

exporter = ResultExporter()

# Compare two result files
comparison = exporter.compare_results(
    Path("baseline.json"),
    Path("current.json")
)

# Check overall performance
perf = comparison['performance_changes']['average_query_time']
print(f"Average query time: {perf['change_percent']:.2f}% change")

if perf['improved']:
    print("Performance improved!")

# Export as HTML report
report_path = exporter.export_comparison_report(comparison)
print(f"Report saved to: {report_path}")

See Result Analysis API for complete API documentation.

Results Commands¶

export - Export Results¶

Options¶

Supported Export Formats¶

Usage Examples¶

Common Workflows¶

Notes¶

results - Show Benchmark Results¶

Options¶

Usage Examples¶

compare - Compare Benchmark Results¶

Basic Syntax¶

Options¶

Output Formats¶

Usage Examples¶

Comparison Output¶

Severity Indicators¶

Common Workflows¶

Exit Codes¶

Notes¶

Python API¶

Related¶

`export` - Export Results¶

`results` - Show Benchmark Results¶

`compare` - Compare Benchmark Results¶