Chart Types¶
BenchBox provides chart types optimized for benchmarking narratives, rendered as ASCII art directly in the terminal. Each chart is designed for a specific analytical purpose and can be generated via benchbox visualize or programmatically through the Python API. No external dependencies are required.
Performance Bar Chart¶
Purpose: Compare execution times across platforms, runs, or configurations. The most common chart for “which platform is fastest?” questions.
Key Features:
Automatic best/worst highlighting (green for fastest, red for slowest)
Fractional-block precision (Unicode ▏▎▍▌▋▊▉█ characters)
Automatic sorting by performance (configurable)
Platform labels with execution time annotations
Performance Comparison (ms)
──────────────────────────────────────────────────────────
DuckDB ██████████████████ 142.3
Polars ████████████ 98.7
SQLite ████████████████████████████████████ 285.1
Pandas ██████████████████████████████████████ 320.5
Legend: ■ Best ■ Worst
TPC-H SF1 comparison across 4 platforms showing Polars as fastest (green) and Pandas as slowest (red)
Data Requirements:
total_time_msoravg_time_msper resultAt least 1 result (multiple for comparison)
Best For:
Platform shootouts (“DuckDB vs Snowflake”)
Before/after optimization comparisons
Multi-benchmark summaries
Power@Size Bar Chart¶
Purpose: Compare TPC Power@Size scores across platforms or versions. Power@Size is a TPC-defined throughput metric expressed in queries-per-hour at a given scale factor - higher is better.
Key Features:
Automatic best/worst highlighting (highest score = green, lowest = red)
Longer bars indicate better throughput performance (inverted vs execution time)
Silently excluded for non-TPC benchmarks (no Power@Size in result)
Power@Size Comparison (Power@Size)
──────────────────────────────────────────────────────────
DuckDB 1.5.0-dev ███████████████████████████████████████ 669.3K
DuckDB 1.4.4 ██████████████████████████████████████ 630.9K
DuckDB 1.3.2 ███████████████████████████████████████ 614.9K
DuckDB 1.1.3 ██████████████████████████████████████ 594.7K
DuckDB 1.2.2 ███████████████████████████████████ 500.5K
DuckDB 1.0.0 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 385.8K
TPC-DS SF=10 version comparison showing Power@Size progression from v1.0.0 to v1.5.0-dev
Data Requirements:
summary.tpc_metrics.power_at_sizein result JSON (present in TPC-H and TPC-DS runs)At least 1 result (multiple for comparison)
Best For:
TPC-H and TPC-DS version-over-version comparisons
Tracking throughput improvement across platform upgrades
Executive summaries of TPC benchmark results
Time-Series Line Chart¶
Purpose: Track performance evolution over time, across versions, or across sequential runs. Essential for regression detection and trend analysis.
Key Features:
Multi-line support for comparing platforms over time
Least-squares regression trend overlay
Unique marker per series (*, +, o, x, ^, v)
Automatic legend with series names
Performance Trend (ms)
──────────────────────────────────────────────────────────
│
350 │*
│ *
300 │ *
│ *
250 │ * +
│ * + +
200 │ * +
│ +
150 │
└────────────────────────────────────────────────────
v0.8 v0.9 v1.0 v1.1 v1.2 v1.3
Legend:
─*─ DuckDB
─+─ Polars
DuckDB and Polars performance over 6 releases showing improvement trends
Data Requirements:
At least 2 results with timestamps or execution IDs
total_time_msper result
Best For:
Version-over-version performance tracking
Regression detection in CI/CD
Long-term platform evolution stories
Cost-Performance Scatter Plot¶
Purpose: Visualize the price/performance tradeoff with Pareto frontier highlighting. Critical for cloud platform ROI analysis.
Key Features:
Pareto frontier highlighting (optimal cost-performance points marked with ◆)
Platform labels positioned at data points
Performance score calculation (queries/hour or inverse latency)
Auto-scaled axes with padding
Cost vs Performance
──────────────────────────────────────────────────────────
Perf │
(qph)│ * Redshift
400 │
│ ◆ DuckDB
300 │ ◆ Polars
│ ──────────────────── Pareto frontier
200 │ * Pandas
│
100 │ * SQLite
│
0 └────────────────────────────────────────────────────
$0 $50 $100 $150 $200 $250
Cost (USD)
Platform comparison showing Pareto frontier highlighting cost-efficient options
Data Requirements:
cost_summary.total_costper resultPerformance metric (
avg_time_msortotal_time_ms)
Best For:
Cloud platform cost comparisons
ROI analysis for platform selection
“Bang for your buck” narratives
Query Variance Heatmap¶
Purpose: Show per-query performance patterns across platforms. Reveals which queries each platform handles well or poorly.
Key Features:
Query (rows) × Platform (columns) grid
5-level intensity scale (░ ▒ ▓ █) from fast to slow
Color gradient: green (fast) → yellow (medium) → red (slow)
Automatic sorting by query ID
Query Execution Heatmap (ms)
──────────────────────────────────────────────────────────
DuckDB Polars Pandas
Q1 │ ░░ 12.3 ▒▒ 15.2 ▓▓ 45.8
Q2 │ ░░ 8.5 ░░ 9.1 ▒▒ 22.4
Q3 │ ▒▒ 25.1 ░░ 18.3 ██ 85.2
Q4 │ ░░ 11.2 ▓▓ 40.5 ██120.1
Q5 │ ▒▒ 30.8 ▒▒ 28.9 ▓▓ 52.3
Q6 │ ░░ 5.1 ░░ 4.8 ░░ 8.2
Scale: ░ ▒ ▓ █ (fast → slow)
Range: 4.8 - 120.1 ms
TPC-H per-query heatmap showing performance patterns across platforms
Data Requirements:
At least 2 platforms with per-query
execution_time_msQuery results with
query_ididentifiers
Best For:
Identifying problematic queries per platform
TPC-H/TPC-DS per-query analysis
Query optimization targeting
Distribution Box Plot¶
Purpose: Show latency distributions to understand variance, outliers, and consistency beyond simple averages.
Key Features:
Box plot with quartiles (Q1, median, Q3)
Whiskers at min/max with cap characters (╷ ╵)
Outlier markers (o) at actual scaled positions
Statistics summary below plot
Latency Distribution (ms)
──────────────────────────────────────────────────────────
╷ ┌──────┬──────┐ ╷
DuckDB ├─────┤ │ ├───────────┤
╵ └──────┴──────┘ ╵
╷ ┌────┬────────┐ ╷
Polars ├───┤ │ ├───────┤
╵ └────┴────────┘ ╵
╷ ┌──────────┬─────────────┐ ╷
Pandas ├──────────┤ │ ├─────┤ o
╵ └──────────┴─────────────┘ ╵
────────────┼──────────┼──────────┼──────────┼────────┼──
0 100 200 300 400 ms
Statistics:
DuckDB: median=120.5, mean=125.3, std=45.2
Polars: median=85.2, mean=90.1, std=32.8
Pandas: median=210.8, mean=225.4, std=78.5
Latency distribution comparison showing Polars as most consistent (tightest box)
Data Requirements:
Per-query
execution_time_msvaluesMultiple queries per platform for meaningful distribution
Best For:
Consistency analysis (low variance = predictable performance)
Outlier detection
SLA compliance stories (P95/P99 analysis)
Query Latency Histogram¶
Purpose: Display per-query execution latency with one vertical bar per query. Provides immediate visual insight into which queries are fastest/slowest and the overall latency distribution pattern.
Key Features:
One vertical bar per query showing execution time
Automatic best/worst highlighting (green for fastest, orange for slowest)
Mean reference line for quick comparison
Natural query ID sorting (Q1, Q2, Q10 - not Q1, Q10, Q2)
Auto-splits large query sets - when queries exceed 33, automatically generates multiple charts (e.g., TPC-DS 99 queries → 3 charts)
Multi-platform grouped bar support for comparisons
Query Latency Histogram (Execution Time (ms))
──────────────────────────────────────────────────────────
150 │ ██
│ ██
100 │ ██ ██ ██
│ ██ ██ ██ ██ ██
50 │ ██ ██ ██ ██ ██ ██ ██
│ ██ ██ ██ ██ ██ ██ ██
0 └────────────────────────────────────────────────
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
····· Mean: 87.5 ms
Legend: ■ Best ■ Worst
Data Requirements:
Per-query
execution_time_msvaluesQuery results with
query_ididentifiersAggregates multiple iterations per query (mean latency)
Best For:
Quick identification of slow/fast queries
TPC-DS (99 queries) and ClickBench (43 queries) analysis
Spotting outlier queries that need optimization
Per-query comparison across platforms
Chart Splitting: For large benchmarks, the histogram automatically splits into readable chunks:
Benchmark |
Queries |
Charts Generated |
|---|---|---|
TPC-H |
22 |
1 chart |
SSB |
13 |
1 chart |
ClickBench |
43 |
2 charts (1-33, 34-43) |
TPC-DS |
99 |
3 charts (1-33, 34-66, 67-99) |
Comparison Bar Chart¶
Purpose: Side-by-side paired bars per query showing baseline vs comparison performance. Essential for before/after analysis.
Key Features:
Paired rows per query: baseline bar + comparison bar
Inline percentage change (green for improvement, red for regression)
Scale note showing value per block character
Handles outlier truncation with overflow indicator
SQL vs DataFrame Comparison (ms)
──────────────────────────────────────────────────────────
Q1 SQL ████████████████████████ 240.5
DF ██████████████████ 180.2 -25.1%
Q2 SQL ████████████████ 160.0
DF ████████████████████████ 240.8 +50.5%
Q3 SQL ████████████████████ 200.0
DF ████████████████ 160.0 -20.0%
──────────────────────────────────────────────────────────
each █ ≈ 10.0ms
improvement (negative %) regression (positive %)
Data Requirements:
Two result files (baseline + comparison)
Per-query
execution_time_msvalues
Best For:
Before/after optimization comparisons
SQL vs DataFrame API performance
Version-over-version per-query analysis
Diverging Bar Chart¶
Purpose: Percentage change bars centered on zero, showing improvements to the left and regressions to the right. Sorted by magnitude for quick triage.
Key Features:
Left side (green): improvements; right side (red): regressions
Fill intensity based on magnitude (░ weak, ▒ medium, ▓ strong)
Overflow arrows (──►) for extreme outliers beyond clipping threshold
Summary line with improvement/stable/regression counts
Regression / Improvement Distribution
──────────────────────────────────────────────────────────
Faster | Slower
Q6 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ | -57.2%
Q14 ▓▓▓▓▓▓▓▓▓▓▓ | -38.1%
Q3 ░░░░░░░░░░ | -5.2%
Q1 | ░░░░░░░░░ +4.8%
Q17 | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒ +23.4%
Q21 | ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒──► +726.0%
──────────────────────────────────────────────────────────
2 improved 1 stable 3 regressed
Data Requirements:
Two result files (baseline + comparison)
Per-query
execution_time_msvalues
Best For:
Regression triage after upgrades
Identifying queries that improved vs degraded
Quick visual summary of change distribution
Summary Box¶
Purpose: Key statistics in a bordered panel providing an at-a-glance overview of benchmark results or a comparison between two runs.
Key Features:
Bordered box with Unicode box-drawing characters
Comparison mode: baseline → comparison with percentage change
Color-coded percentages (green for improvement, red for regression)
Best/worst query callouts
┌──────────────────────────────────────────────┐
│ Benchmark Summary │
├──────────────────────────────────────────────┤
│ Geo Mean: 142.3ms → 98.7ms -30.6% │
│ Total: 3.2s → 2.1s -34.4% │
│ Queries: 22 │
├──────────────────────────────────────────────┤
│ 5 improved 12 stable 5 regressed │
├──────────────────────────────────────────────┤
│ Best: Q6 (-57.2%), Q14 (-38.1%) │
│ Worst: Q21 (+726%), Q17 (+23.4%) │
└──────────────────────────────────────────────┘
Data Requirements:
At least 1 result file (2 for comparison mode)
total_time_msand/or per-queryexecution_time_ms
Best For:
Executive summaries at the top of reports
CI/CD log output (quick pass/fail context)
Pairing with detailed charts for full analysis
Chart Selection Guide¶
Scenario |
Recommended Chart |
|---|---|
“Which platform is fastest?” |
Performance Bar |
“What is the TPC throughput score?” |
Power@Size Bar |
“How has performance changed over time?” |
Time-Series Line |
“Which platform has best price/performance?” |
Cost-Performance Scatter |
“Which queries does each platform struggle with?” |
Query Variance Heatmap |
“Which individual queries are slowest?” |
Query Latency Histogram |
“How consistent is query latency?” |
Distribution Box |
“How did each query change between runs?” |
Comparison Bar |
“Which queries regressed or improved?” |
Diverging Bar |
“Give me a quick summary” |
Summary Box |
“Complete platform evaluation” |
Use |
Data Requirements Summary¶
Chart Type |
Required Fields |
Minimum Results |
|---|---|---|
Performance Bar |
|
1 |
Power@Size Bar |
|
1 (TPC benchmarks only) |
Time-Series Line |
|
2 |
Cost-Performance Scatter |
|
1 (2+ useful) |
Query Variance Heatmap |
|
2 platforms |
Query Latency Histogram |
|
1 |
Distribution Box |
|
1 |
Comparison Bar |
|
2 (baseline + comparison) |
Diverging Bar |
|
2 (baseline + comparison) |
Summary Box |
|
1 (2 for comparison) |