Result Export Formats

Tags reference validation

BenchBox exports benchmark results in multiple formats for analysis, visualization, and integration with external tools.

Export Commands

Basic Export

# Run benchmark and export results
benchbox run --platform duckdb --benchmark tpch --scale 0.1

# Results are automatically saved to benchmark_runs/results/
ls benchmark_runs/results/
# tpch_duckdb_sf0.01_20251212_143021.json

Export to Other Formats

# Export most recent result to CSV
benchbox export --last --format csv

# Export to multiple formats
benchbox export --last --format csv --format html

# Export a specific result file
benchbox export benchmark_runs/results/tpch_duckdb_sf0.01_20251212_143021.json --format csv --format html

Custom Output Directory

# Local directory
benchbox run --platform duckdb --benchmark tpch --output ./my_results/

# Cloud storage
benchbox run --platform snowflake --benchmark tpch --output s3://bucket/results/

JSON Format (Schema v1.1)

The JSON export is the canonical format containing complete benchmark details.

Schema Structure

{
  "_schema": {
    "version": "1.1",
    "generator": "benchbox-exporter",
    "generated_at": "2025-12-12T14:30:21.123456Z"
  },
  "benchmark": {
    "name": "tpch",
    "scale_factor": 0.1,
    "version": "3.0.1"
  },
  "platform": {
    "name": "duckdb",
    "version": "0.10.0",
    "dialect": "duckdb"
  },
  "execution": {
    "timestamp": "2025-12-12T14:30:21.123456Z",
    "duration_ms": 45230,
    "status": "completed"
  },
  "phases": {
    "setup": { ... },
    "power_test": { ... },
    "throughput_test": { ... }
  },
  "metrics": {
    "geometric_mean_time": 1.234,
    "power_at_size": 89.5,
    "throughput_at_size": 156.2,
    "qph_at_size": 1250.5
  }
}

Field Reference

Schema Block

Field

Type

Description

version

string

Schema version (currently “1.1”)

generator

string

Tool that generated the export

generated_at

string

ISO 8601 timestamp

Benchmark Block

Field

Type

Description

name

string

Benchmark identifier (tpch, tpcds, ssb, etc.)

scale_factor

float

Data scale factor

version

string

TPC specification version

tuning

string

Tuning mode (tuned, notuning, auto)

Platform Block

Field

Type

Description

name

string

Platform identifier

version

string

Platform/driver version

dialect

string

SQL dialect used

connection_info

object

Anonymized connection details

Execution Block

Field

Type

Description

timestamp

string

Run start time (ISO 8601)

duration_ms

int

Total execution time

status

string

completed, failed, partial

phases_run

array

List of executed phases

Phases Block

Setup Phase:

{
  "setup": {
    "data_generation": {
      "duration_ms": 5230,
      "status": "completed",
      "tables_generated": 8,
      "total_rows_generated": 150000,
      "total_data_size_bytes": 45000000
    },
    "schema_creation": {
      "duration_ms": 120,
      "tables_created": 8,
      "constraints_applied": 16,
      "indexes_created": 8
    },
    "data_loading": {
      "duration_ms": 3500,
      "total_rows_loaded": 150000,
      "tables_loaded": 8,
      "per_table_stats": { ... }
    }
  }
}

Power Test Phase:

{
  "power_test": {
    "start_time": "2025-12-12T14:30:25.000Z",
    "end_time": "2025-12-12T14:31:42.000Z",
    "duration_ms": 77000,
    "geometric_mean_time": 1.234,
    "power_at_size": 89.5,
    "query_executions": [
      {
        "query_id": "Q1",
        "stream_id": "power",
        "execution_order": 1,
        "execution_time_ms": 1520,
        "status": "success",
        "rows_returned": 4,
        "row_count_validation": {
          "expected": 4,
          "actual": 4,
          "status": "valid"
        }
      }
    ]
  }
}

Throughput Test Phase:

{
  "throughput_test": {
    "start_time": "2025-12-12T14:31:45.000Z",
    "end_time": "2025-12-12T14:35:12.000Z",
    "duration_ms": 207000,
    "num_streams": 4,
    "total_queries_executed": 88,
    "throughput_at_size": 156.2,
    "streams": [
      {
        "stream_id": 1,
        "duration_ms": 51000,
        "query_executions": [ ... ]
      }
    ]
  }
}

Metrics Block

Field

Type

Description

geometric_mean_time

float

Geometric mean of query times (seconds)

power_at_size

float

TPC Power metric

throughput_at_size

float

TPC Throughput metric

qph_at_size

float

Queries per hour at scale

cost_estimate_usd

float

Estimated run cost (cloud platforms)

Query Execution Details

Each query execution record contains:

{
  "query_id": "Q6",
  "stream_id": "power",
  "execution_order": 6,
  "execution_time_ms": 234,
  "status": "success",
  "rows_returned": 1,
  "row_count_validation": {
    "expected": 1,
    "actual": 1,
    "status": "valid"
  },
  "cost": 0.0023,
  "query_plan": {
    "root_operation": "ProjectionExec",
    "total_operators": 12,
    "estimated_rows": 150000
  },
  "plan_fingerprint": "a1b2c3d4..."
}

CSV Format

CSV export provides tabular query-level data for spreadsheet analysis.

Query Results CSV

query_id,stream_id,execution_order,execution_time_ms,status,rows_returned,expected_rows,validation_status
Q1,power,1,1520,success,4,4,valid
Q2,power,2,892,success,460,460,valid
Q3,power,3,1230,success,10,10,valid
...

Summary CSV

metric,value
benchmark,tpch
scale_factor,0.1
platform,duckdb
geometric_mean_time,1.234
power_at_size,89.5
throughput_at_size,156.2
total_duration_ms,45230

HTML Format

HTML export generates a standalone report with formatted tables.

# Generate HTML report from most recent result
benchbox export --last --format html

# Export a specific result file to HTML
benchbox export benchmark_runs/results/tpch_duckdb_sf0.01_20251212_143021.json --format html

The HTML report includes:

  • Summary metrics card

  • Query results table with timing data

  • Phase duration breakdown

  • Validation status table

  • Platform and configuration details

Visualizing Results

Use benchbox visualize to generate ASCII charts from any result file:

# Auto-detect latest result and render all applicable charts
benchbox visualize

# Visualize a specific result file
benchbox visualize benchmark_runs/results/tpch_duckdb_sf0.01_20251212_143021.json

# Specific chart type
benchbox visualize benchmark_runs/results/*.json --chart-type performance_bar

# Save plain-text output to file
benchbox visualize benchmark_runs/results/*.json --no-color > charts.txt

See the Visualization Guide for chart types, templates, and customization options.

Loading Results in Python

Load JSON Results

import json
from pathlib import Path

# Load result file
result_file = Path("benchmark_runs/results/tpch_duckdb_sf0.01_20251212_143021.json")
with result_file.open() as f:
    results = json.load(f)

# Access metrics
print(f"Power at Size: {results['metrics']['power_at_size']}")
print(f"Geometric Mean: {results['metrics']['geometric_mean_time']}s")

# Access query details
for query in results['phases']['power_test']['query_executions']:
    print(f"{query['query_id']}: {query['execution_time_ms']}ms")

Load into Pandas

import pandas as pd
import json

# Load JSON
with open("benchmark_runs/results/tpch_duckdb_sf0.01_*.json") as f:
    results = json.load(f)

# Convert queries to DataFrame
queries = results['phases']['power_test']['query_executions']
df = pd.DataFrame(queries)

# Analyze
print(df.describe())
print(df.groupby('query_id')['execution_time_ms'].mean())

Load CSV Results

import pandas as pd

# Load query results
df = pd.read_csv("benchmark_runs/results/tpch_duckdb_sf0.01_queries.csv")

# Quick analysis
print(f"Total queries: {len(df)}")
print(f"Mean execution time: {df['execution_time_ms'].mean():.2f}ms")
print(f"Slowest query: {df.loc[df['execution_time_ms'].idxmax(), 'query_id']}")

Visualization Examples

CLI Visualization

# Render all applicable charts for a result file
benchbox visualize benchmark_runs/results/tpch_duckdb_sf0.01_*.json

# Compare multiple platforms
benchbox visualize duckdb_result.json sqlite_result.json --template head_to_head

# Per-query histogram (auto-splits for large benchmarks)
benchbox visualize tpcds_result.json --chart-type query_histogram

Python API Visualization

from benchbox.core.visualization import ResultPlotter
from benchbox.core.visualization.ascii import ASCIIBarChart
from benchbox.core.visualization.ascii.bar_chart import BarData

# Load results from JSON files
plotter = ResultPlotter.from_sources(["results/duckdb.json", "results/sqlite.json"])

# Render a bar chart
bar_data = [BarData(label=r.platform, value=r.total_time_ms or 0) for r in plotter.results]
chart = ASCIIBarChart(data=bar_data, title="Platform Comparison")
print(chart.render())

# Export to plain-text file
from benchbox.core.visualization.exporters import export_ascii

export_ascii(
    ascii_content=chart.render(),
    output_dir="./charts",
    base_name="platform_comparison",
    format="txt",
)

Schema Versioning

Current Version: 1.1

Version 1.1 includes:

  • Query plan capture

  • Cost estimation

  • Row count validation details

  • Phase-level error tracking

Version History

Version

Changes

1.1

Added query plans, cost estimation, validation details

1.0

Initial canonical schema

Loading Legacy Results

from benchbox.core.results.loader import load_result

# Automatically handles schema versions
result = load_result("old_result.json")
print(f"Schema version: {result.schema_version}")

Anonymization

Results are anonymized by default to remove sensitive information:

  • Connection strings → hashed

  • Hostnames → generalized

  • Usernames → removed

  • API keys/tokens → stripped

Disable Anonymization

Anonymization can be configured programmatically via the Python API:

Anonymization Config

from benchbox.core.results.exporter import ResultExporter
from benchbox.core.results.anonymization import AnonymizationConfig

config = AnonymizationConfig(
    anonymize_hostname=True,
    anonymize_username=True,
    anonymize_connection_string=True,
    preserve_platform_version=True
)

exporter = ResultExporter(anonymize=True, anonymization_config=config)

Integration Examples

Export to Data Warehouse

import json
import pandas as pd

# Load results
with open("results.json") as f:
    results = json.load(f)

# Flatten to table
queries = []
for q in results['phases']['power_test']['query_executions']:
    queries.append({
        'run_id': results['execution']['timestamp'],
        'benchmark': results['benchmark']['name'],
        'platform': results['platform']['name'],
        'scale_factor': results['benchmark']['scale_factor'],
        **q
    })

df = pd.DataFrame(queries)

# Upload to warehouse
# df.to_sql('benchmark_queries', engine, if_exists='append')

CI/CD Integration

# Run benchmark and check threshold
benchbox run --platform duckdb --benchmark tpch --scale 0.01 \
  --output ./results/

# Parse results in CI script
python -c "
import json
import sys

with open('results/tpch_duckdb_sf0.01_*.json') as f:
    results = json.load(f)

power = results['metrics']['power_at_size']
if power < 50:  # Performance threshold
    print(f'FAIL: Power@Size {power} below threshold 50')
    sys.exit(1)
print(f'PASS: Power@Size {power}')
"