Benchmark Result Schema (v1.1)

Tags reference validation

The canonical JSON schema describes every benchmark export produced by BenchBox. It applies uniformly to CLI and platform execution flows now that all pipelines emit the lifecycle-aware BenchmarkResults model, replacing the legacy BenchmarkResult payloads.

Top-Level Structure

Key

Description

schema_version

Literal string "1.1". Increment when the shape changes.

benchmark

Identifier, human-readable name, and optional version of the benchmark.

execution

Execution metadata (ID, timestamp, duration, executor/platform, database name, optional metadata).

configuration

Run configuration including scale factor, concurrency level, query subset, and options.

system

Anonymised system profile (OS, architecture, CPU count, memory, Python version, hostname, username).

results

Metrics organized into queries, timing, optional tables, and optional metrics.

validation

status (PASSED, FAILED, PARTIAL, UNKNOWN) and optional structured details.

driver

Driver metadata: package, requested/resolved versions, and auto-install flag.

export

Export audit block with timestamp, exporter tool identifier, schema version, and anonymisation flag.

platform

(Optional) Platform metadata (name, additional metadata, tunings, info). Present for adapter runs.

phases

(Optional) Execution phase breakdown when available.

metadata

(Optional) Auxiliary metadata (e.g., anonymous_machine_id).

Results Payload

results.queries

Key

Description

total

Total number of queries dispatched.

successful

Number of queries that completed with status == "SUCCESS".

failed

Number of queries that failed (derived).

success_rate

Ratio of successful / total, rounded to 4 decimal places (0.0 when total == 0).

details

Ordered list of per-query metrics.

definitions

Optional nested mapping of query definitions (present when BenchmarkResults.query_definitions is populated).

Each entry of details uses the following fields (missing values are omitted). This nested array replaces the legacy top-level query_details payload that existed in the v2/v3 exports.

  • id (query identifier, e.g., Q1)

  • name

  • status

  • sequence (execution order)

  • stream_id (for throughput workloads)

  • iteration (optional) — iteration number for power tests with warm-up/measurement loops

  • run_type (optional)warmup or measurement classification when iterations are used

  • execution_time_ms

  • rows_returned

  • error_message

  • sql_text (truncated to 4096 characters with a /* truncated */ marker)

  • sql_truncated (present only when truncation occurred)

  • resource_usage

  • row_count_validation (optional) — nested object containing per-query validation results:

    • expected — expected number of rows

    • actual — actual number of rows returned

    • status — validation status (PASSED, FAILED, SKIPPED)

    • error (optional) — error message when validation fails

    • warning (optional) — warning message when validation is skipped

Why 4096 characters? BenchBox caps SQL text at 4KB per query to keep JSON exports compact (<5 KB/query) while still preserving useful debugging context. Consumers can detect truncation programmatically via the sql_truncated flag.

results.timing

  • total_ms: Total query execution time in milliseconds.

  • avg_ms: Average query time (milliseconds) across successful queries.

  • min_ms / max_ms: Minimum and maximum execution time (present for CLI results).

  • data_loading_ms: Optional data loading duration (milliseconds).

  • schema_creation_ms: Optional schema creation duration (milliseconds).

results.tables

Populated for platform adapter exports:

  • total_rows_loaded

  • data_size_mb

  • per_table: mapping of table name to row count (or other relevant figures).

results.metrics

Flexible container for benchmark-specific metrics:

  • summary: arbitrary key/value metrics (populated from BenchmarkResults.performance_summary or other custom extras attached during result construction).

  • performance_summary: aggregated figures from platform adapters.

  • performance_characteristics: optional qualitative metrics.

  • tpc: standard TPC metrics such as power_at_size, throughput_at_size, qphh_at_size, and geometric_mean_ms.

Export Block

The exporter populates:

"export": {
  "timestamp": "2025-09-01T12:00:00.000123",
  "tool": "benchbox-exporter",
  "schema_version": "1.1",
  "anonymized": true
}

When anonymisation is enabled an anonymised system profile is merged into system and an anonymous_machine_id is emitted under metadata.

Validation

The validation block has a two-tier structure optimized for size and accessibility:

  • validation.status — overall validation status: PASSED, FAILED, PARTIAL, or UNKNOWN

  • validation.summary — lightweight summary metrics:

    • total_tables_validated — number of tables validated

    • tables_with_data — number of tables with data

    • error_count — number of validation errors

    • warning_count — number of validation warnings

  • validation.details_location — pointer to full validation details (phases.setup.validation.validation_details)

Full validation details (including verified_tables, accessible_tables, and other detailed information) are available at phases.setup.validation.validation_details to avoid duplicating large arrays.

For per-query validation results, see the row_count_validation field in results.queries.details.

For anonymised exports the system profile is scrubbed prior to validation and anonymisation metadata is recorded under export.anonymized and metadata.anonymous_machine_id. Anonymisation therefore never breaks schema validation because the payload is transformed before CanonicalResultValidator runs.

Compatibility Notes

  • Legacy schema_version values ("2.0", "3.0") and their supporting fields (e.g., benchmark_info, performance_metrics, top-level query_results) have been removed.

  • Consumers must migrate to the canonical layout before the first public release.

  • Version 1.1 adds optional iteration and run_type fields to results.queries.details so consumers can distinguish warm-up runs from measurement runs while keeping the payload backward compatible for clients that ignore unknown keys.

  • Future schema adjustments should bump SCHEMA_VERSION and update this document.