Result Export Formats¶
BenchBox exports benchmark results in multiple formats for analysis, visualization, and integration with external tools.
Export Commands¶
Basic Export¶
# Run benchmark and export results
benchbox run --platform duckdb --benchmark tpch --scale 0.1
# Results are automatically saved to benchmark_runs/results/
ls benchmark_runs/results/
# tpch_duckdb_sf0.01_20251212_143021.json
Export to Other Formats¶
# Export most recent result to CSV
benchbox export --last --format csv
# Export to multiple formats
benchbox export --last --format csv --format html
# Export a specific result file
benchbox export benchmark_runs/results/tpch_duckdb_sf0.01_20251212_143021.json --format csv --format html
Custom Output Directory¶
# Local directory
benchbox run --platform duckdb --benchmark tpch --output ./my_results/
# Cloud storage
benchbox run --platform snowflake --benchmark tpch --output s3://bucket/results/
JSON Format (Schema v1.1)¶
The JSON export is the canonical format containing complete benchmark details.
Schema Structure¶
{
"_schema": {
"version": "1.1",
"generator": "benchbox-exporter",
"generated_at": "2025-12-12T14:30:21.123456Z"
},
"benchmark": {
"name": "tpch",
"scale_factor": 0.1,
"version": "3.0.1"
},
"platform": {
"name": "duckdb",
"version": "0.10.0",
"dialect": "duckdb"
},
"execution": {
"timestamp": "2025-12-12T14:30:21.123456Z",
"duration_ms": 45230,
"status": "completed"
},
"phases": {
"setup": { ... },
"power_test": { ... },
"throughput_test": { ... }
},
"metrics": {
"geometric_mean_time": 1.234,
"power_at_size": 89.5,
"throughput_at_size": 156.2,
"qph_at_size": 1250.5
}
}
Field Reference¶
Schema Block¶
Field |
Type |
Description |
|---|---|---|
|
string |
Schema version (currently “1.1”) |
|
string |
Tool that generated the export |
|
string |
ISO 8601 timestamp |
Benchmark Block¶
Field |
Type |
Description |
|---|---|---|
|
string |
Benchmark identifier (tpch, tpcds, ssb, etc.) |
|
float |
Data scale factor |
|
string |
TPC specification version |
|
string |
Tuning mode (tuned, notuning, auto) |
Platform Block¶
Field |
Type |
Description |
|---|---|---|
|
string |
Platform identifier |
|
string |
Platform/driver version |
|
string |
SQL dialect used |
|
object |
Anonymized connection details |
Execution Block¶
Field |
Type |
Description |
|---|---|---|
|
string |
Run start time (ISO 8601) |
|
int |
Total execution time |
|
string |
completed, failed, partial |
|
array |
List of executed phases |
Phases Block¶
Setup Phase:
{
"setup": {
"data_generation": {
"duration_ms": 5230,
"status": "completed",
"tables_generated": 8,
"total_rows_generated": 150000,
"total_data_size_bytes": 45000000
},
"schema_creation": {
"duration_ms": 120,
"tables_created": 8,
"constraints_applied": 16,
"indexes_created": 8
},
"data_loading": {
"duration_ms": 3500,
"total_rows_loaded": 150000,
"tables_loaded": 8,
"per_table_stats": { ... }
}
}
}
Power Test Phase:
{
"power_test": {
"start_time": "2025-12-12T14:30:25.000Z",
"end_time": "2025-12-12T14:31:42.000Z",
"duration_ms": 77000,
"geometric_mean_time": 1.234,
"power_at_size": 89.5,
"query_executions": [
{
"query_id": "Q1",
"stream_id": "power",
"execution_order": 1,
"execution_time_ms": 1520,
"status": "success",
"rows_returned": 4,
"row_count_validation": {
"expected": 4,
"actual": 4,
"status": "valid"
}
}
]
}
}
Throughput Test Phase:
{
"throughput_test": {
"start_time": "2025-12-12T14:31:45.000Z",
"end_time": "2025-12-12T14:35:12.000Z",
"duration_ms": 207000,
"num_streams": 4,
"total_queries_executed": 88,
"throughput_at_size": 156.2,
"streams": [
{
"stream_id": 1,
"duration_ms": 51000,
"query_executions": [ ... ]
}
]
}
}
Metrics Block¶
Field |
Type |
Description |
|---|---|---|
|
float |
Geometric mean of query times (seconds) |
|
float |
TPC Power metric |
|
float |
TPC Throughput metric |
|
float |
Queries per hour at scale |
|
float |
Estimated run cost (cloud platforms) |
Query Execution Details¶
Each query execution record contains:
{
"query_id": "Q6",
"stream_id": "power",
"execution_order": 6,
"execution_time_ms": 234,
"status": "success",
"rows_returned": 1,
"row_count_validation": {
"expected": 1,
"actual": 1,
"status": "valid"
},
"cost": 0.0023,
"query_plan": {
"root_operation": "ProjectionExec",
"total_operators": 12,
"estimated_rows": 150000
},
"plan_fingerprint": "a1b2c3d4..."
}
CSV Format¶
CSV export provides tabular query-level data for spreadsheet analysis.
Query Results CSV¶
query_id,stream_id,execution_order,execution_time_ms,status,rows_returned,expected_rows,validation_status
Q1,power,1,1520,success,4,4,valid
Q2,power,2,892,success,460,460,valid
Q3,power,3,1230,success,10,10,valid
...
Summary CSV¶
metric,value
benchmark,tpch
scale_factor,0.1
platform,duckdb
geometric_mean_time,1.234
power_at_size,89.5
throughput_at_size,156.2
total_duration_ms,45230
HTML Format¶
HTML export generates a standalone report with formatted tables.
# Generate HTML report from most recent result
benchbox export --last --format html
# Export a specific result file to HTML
benchbox export benchmark_runs/results/tpch_duckdb_sf0.01_20251212_143021.json --format html
The HTML report includes:
Summary metrics card
Query results table with timing data
Phase duration breakdown
Validation status table
Platform and configuration details
Visualizing Results¶
Use benchbox visualize to generate ASCII charts from any result file:
# Auto-detect latest result and render all applicable charts
benchbox visualize
# Visualize a specific result file
benchbox visualize benchmark_runs/results/tpch_duckdb_sf0.01_20251212_143021.json
# Specific chart type
benchbox visualize benchmark_runs/results/*.json --chart-type performance_bar
# Save plain-text output to file
benchbox visualize benchmark_runs/results/*.json --no-color > charts.txt
See the Visualization Guide for chart types, templates, and customization options.
Loading Results in Python¶
Load JSON Results¶
import json
from pathlib import Path
# Load result file
result_file = Path("benchmark_runs/results/tpch_duckdb_sf0.01_20251212_143021.json")
with result_file.open() as f:
results = json.load(f)
# Access metrics
print(f"Power at Size: {results['metrics']['power_at_size']}")
print(f"Geometric Mean: {results['metrics']['geometric_mean_time']}s")
# Access query details
for query in results['phases']['power_test']['query_executions']:
print(f"{query['query_id']}: {query['execution_time_ms']}ms")
Load into Pandas¶
import pandas as pd
import json
# Load JSON
with open("benchmark_runs/results/tpch_duckdb_sf0.01_*.json") as f:
results = json.load(f)
# Convert queries to DataFrame
queries = results['phases']['power_test']['query_executions']
df = pd.DataFrame(queries)
# Analyze
print(df.describe())
print(df.groupby('query_id')['execution_time_ms'].mean())
Load CSV Results¶
import pandas as pd
# Load query results
df = pd.read_csv("benchmark_runs/results/tpch_duckdb_sf0.01_queries.csv")
# Quick analysis
print(f"Total queries: {len(df)}")
print(f"Mean execution time: {df['execution_time_ms'].mean():.2f}ms")
print(f"Slowest query: {df.loc[df['execution_time_ms'].idxmax(), 'query_id']}")
Visualization Examples¶
CLI Visualization¶
# Render all applicable charts for a result file
benchbox visualize benchmark_runs/results/tpch_duckdb_sf0.01_*.json
# Compare multiple platforms
benchbox visualize duckdb_result.json sqlite_result.json --template head_to_head
# Per-query histogram (auto-splits for large benchmarks)
benchbox visualize tpcds_result.json --chart-type query_histogram
Python API Visualization¶
from benchbox.core.visualization import ResultPlotter
from benchbox.core.visualization.ascii import ASCIIBarChart
from benchbox.core.visualization.ascii.bar_chart import BarData
# Load results from JSON files
plotter = ResultPlotter.from_sources(["results/duckdb.json", "results/sqlite.json"])
# Render a bar chart
bar_data = [BarData(label=r.platform, value=r.total_time_ms or 0) for r in plotter.results]
chart = ASCIIBarChart(data=bar_data, title="Platform Comparison")
print(chart.render())
# Export to plain-text file
from benchbox.core.visualization.exporters import export_ascii
export_ascii(
ascii_content=chart.render(),
output_dir="./charts",
base_name="platform_comparison",
format="txt",
)
Schema Versioning¶
Current Version: 1.1¶
Version 1.1 includes:
Query plan capture
Cost estimation
Row count validation details
Phase-level error tracking
Version History¶
Version |
Changes |
|---|---|
1.1 |
Added query plans, cost estimation, validation details |
1.0 |
Initial canonical schema |
Loading Legacy Results¶
from benchbox.core.results.loader import load_result
# Automatically handles schema versions
result = load_result("old_result.json")
print(f"Schema version: {result.schema_version}")
Anonymization¶
Results are anonymized by default to remove sensitive information:
Connection strings → hashed
Hostnames → generalized
Usernames → removed
API keys/tokens → stripped
Disable Anonymization¶
Anonymization can be configured programmatically via the Python API:
Anonymization Config¶
from benchbox.core.results.exporter import ResultExporter
from benchbox.core.results.anonymization import AnonymizationConfig
config = AnonymizationConfig(
anonymize_hostname=True,
anonymize_username=True,
anonymize_connection_string=True,
preserve_platform_version=True
)
exporter = ResultExporter(anonymize=True, anonymization_config=config)
Integration Examples¶
Export to Data Warehouse¶
import json
import pandas as pd
# Load results
with open("results.json") as f:
results = json.load(f)
# Flatten to table
queries = []
for q in results['phases']['power_test']['query_executions']:
queries.append({
'run_id': results['execution']['timestamp'],
'benchmark': results['benchmark']['name'],
'platform': results['platform']['name'],
'scale_factor': results['benchmark']['scale_factor'],
**q
})
df = pd.DataFrame(queries)
# Upload to warehouse
# df.to_sql('benchmark_queries', engine, if_exists='append')
CI/CD Integration¶
# Run benchmark and check threshold
benchbox run --platform duckdb --benchmark tpch --scale 0.01 \
--output ./results/
# Parse results in CI script
python -c "
import json
import sys
with open('results/tpch_duckdb_sf0.01_*.json') as f:
results = json.load(f)
power = results['metrics']['power_at_size']
if power < 50: # Performance threshold
print(f'FAIL: Power@Size {power} below threshold 50')
sys.exit(1)
print(f'PASS: Power@Size {power}')
"