BenchBox Experimental¶
Emerging benchmarks for specialized testing and novel workloads.
What Makes a Benchmark “Experimental”?¶
Experimental benchmarks in BenchBox share one or more of these characteristics:
- Newly developed
Recently created benchmarks that haven’t yet been validated across many platforms or use cases. They may evolve as we learn from real-world usage.
- Limited adoption
Benchmarks that address real needs but haven’t achieved widespread industry acceptance. They may become standards or remain niche tools.
- Specialized focus
Benchmarks targeting emerging workloads (AI/ML, time-series, metadata) that don’t fit traditional OLAP categories. The methodology for testing these workloads is still evolving.
- Research-oriented
Benchmarks designed to explore database behavior under unusual conditions (skewed data, adversarial queries) rather than measure typical performance.
Why Include Experimental Benchmarks?¶
The database landscape evolves rapidly. Workloads that seemed exotic five years ago are now common:
AI/ML integration - Vector similarity, embedding storage, feature serving
Time-series analytics - IoT data, observability, financial markets
Metadata-heavy workloads - Data catalogs, schema evolution, lineage tracking
Adversarial conditions - Skewed data, optimizer-hostile queries, chaos testing
Experimental benchmarks let BenchBox stay ahead of these trends. Some will prove their worth and graduate to standard benchmarks. Others will inform the design of better benchmarks. All contribute to understanding database performance in emerging scenarios.
Using Experimental Benchmarks¶
When working with experimental benchmarks, keep these considerations in mind:
- Expect change
Schemas, queries, and methodologies may evolve. Pin to specific BenchBox versions for reproducible results.
- Validate relevance
Check whether the benchmark’s assumptions match your use case. An AI primitives benchmark designed for embedding retrieval may not apply to your vector search workload.
- Contribute feedback
Experimental benchmarks improve through usage. Report issues, suggest improvements, and share results to help refine them.
- Interpret cautiously
Results may be less reliable than established benchmarks. Use them for directional guidance, not definitive platform selection.
Experimental Benchmarks in BenchBox¶
Benchmark |
Focus |
Status |
|---|---|---|
TPC-HAVOC |
Chaos testing, failure injection, recovery performance |
Research prototype |
TPC-H Skew |
Data skew effects on query performance |
Methodology validation |
Data Vault |
Data Vault 2.0 modeling patterns |
Schema finalization |
AI Primitives |
Vector operations, embedding queries, ML serving |
Active development |
Metadata Primitives |
Schema operations, catalog queries, lineage |
Early stage |
NYC Taxi |
Real-world transportation analytics |
Stable, may graduate |
TSBS DevOps |
Time-series database benchmark (monitoring workload) |
Adaptation in progress |
Graduation Criteria¶
Experimental benchmarks may graduate to standard categories when they meet these criteria:
Stable specification - No significant methodology changes for 6+ months
Platform coverage - Tested on 5+ platforms with consistent results
Community validation - External usage and feedback confirming utility
Documentation complete - Full specification, data generation, and analysis guides
Included Benchmarks¶
See Also¶
BenchBox Primitives - Stable BenchBox-created benchmarks
Academic Benchmarks - Research benchmarks with established methodology
Time Series Benchmark Suite - Original TSBS project