Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
BenchBox 0.1.0 documentation
BenchBox 0.1.0 documentation

Getting Started

  • BenchBox Documentation
  • Installation & Environment Setup
  • Getting Started in 5 Minutes
  • BenchBox CLI – Quick Reference
  • MCP Integration Guide
  • Tutorials
    • Your First Benchmark
    • Understanding Benchmark Results
    • Comparing Platforms
    • DataFrame Benchmarking Quickstart
  • Your First Benchmark

User Guide

  • User Guide
    • Installation & Environment Setup
    • Getting Started in 5 Minutes
    • BenchBox CLI – Quick Reference
    • Configuration Handbook
    • Examples
    • Examples Directory Guide
    • BenchBox Examples
    • Dry Run Mode
    • Data Generation
    • SQL Dialect Translation Guide
    • Intelligent Guidance and Enhanced CLI
    • Frequently Asked Questions (FAQ)
    • Troubleshooting Guide
    • Usage Overview
  • Configuration Handbook
  • Data Generation
  • Dry Run Mode
  • Examples Directory Guide
  • Understanding Benchmark Results
  • Troubleshooting Guide
  • Frequently Asked Questions (FAQ)
  • Concepts
    • BenchBox Architecture
    • Benchmarking Workflows
    • Data Model
    • Glossary
    • Database Benchmarking Tools Compared

Benchmarks

  • Benchmarks
    • AI/ML Primitives Benchmark
    • AMPLab Big Data Benchmark
    • ClickBench (ClickHouse Analytics Benchmark)
    • CoffeeShop Benchmark
    • TPC-H Data Vault Benchmark
    • Potential Future Benchmarks
    • H2O DB Benchmark
    • Join Order Benchmark Implementation
    • Metadata Primitives Benchmark
    • NYC Taxi OLAP Benchmark
    • Read Primitives Benchmark
    • Star Schema Benchmark (SSB)
    • TPC-DI Benchmark
    • TPC-DS Benchmark
    • TPC-H Benchmark
    • TPC-Havoc Benchmark
    • TPC-H Skew Benchmark
    • Transaction Primitives Benchmark
    • TSBS DevOps Benchmark
    • Write Primitives Benchmark
    • Academic Benchmarks
      • Star Schema Benchmark (SSB)
      • AMPLab Big Data Benchmark
      • Join Order Benchmark Implementation
    • BenchBox Experimental
      • TPC-Havoc Benchmark
      • TPC-H Skew Benchmark
      • TPC-H Data Vault Benchmark
      • AI/ML Primitives Benchmark
      • Metadata Primitives Benchmark
      • NYC Taxi OLAP Benchmark
      • TSBS DevOps Benchmark
    • BenchBox Primitives
      • Read Primitives Benchmark
      • Write Primitives Benchmark
      • Transaction Primitives Benchmark
    • Industry Benchmarks
      • ClickBench (ClickHouse Analytics Benchmark)
      • H2O DB Benchmark
      • CoffeeShop Benchmark
    • TPC Standards
      • TPC-H Benchmark
      • TPC-DS Benchmark
      • TPC-DI Benchmark
  • TPC Standards
    • TPC-H Benchmark
    • TPC-DS Benchmark
    • TPC-DI Benchmark
  • Industry Benchmarks
    • ClickBench (ClickHouse Analytics Benchmark)
    • H2O DB Benchmark
    • CoffeeShop Benchmark
  • BenchBox Primitives
    • Read Primitives Benchmark
    • Write Primitives Benchmark
    • Transaction Primitives Benchmark
  • BenchBox Experimental
    • TPC-Havoc Benchmark
    • TPC-H Skew Benchmark
    • TPC-H Data Vault Benchmark
    • AI/ML Primitives Benchmark
    • Metadata Primitives Benchmark
    • NYC Taxi OLAP Benchmark
    • TSBS DevOps Benchmark
  • Potential Future Benchmarks

Platforms

  • Platform Documentation
    • Platform Selection Guide
    • Multi-Platform Database Support
    • Platform Comparison Matrix
    • Platform Deployment Modes
    • DataFrame Platforms
    • DuckDB Platform
    • SQLite Platform
    • Polars Platform
    • Pandas DataFrame Platform
    • Modin DataFrame Platform
    • Dask DataFrame Platform
    • cuDF DataFrame Platform
    • PySpark Platform (SQL & DataFrame)
    • DataFusion DataFrame Platform
    • Apache Spark Platform
    • ClickHouse Local Mode
    • PostgreSQL Platform
    • PrestoDB Platform Guide
    • Trino Platform
    • Snowflake Platform
    • Databricks Platform
    • BigQuery Platform
    • Redshift Platform
    • MotherDuck Platform
    • Starburst Platform
    • AWS Athena Platform
    • Amazon Athena for Apache Spark Platform
    • Firebolt Platform
    • Microsoft Fabric
    • Azure Analytics Platforms
    • Apache DataFusion Platform Guide
    • InfluxDB
    • TimescaleDB Platform
    • AWS Glue Platform
    • GCP Dataproc Platform
    • GCP Dataproc Serverless Platform
    • Amazon EMR Serverless Platform
    • Microsoft Fabric Spark Platform
    • Azure Synapse Spark Platform
    • Snowpark Connect for Spark
    • Potential Future Platforms
  • Platform Selection Guide
  • Multi-Platform Database Support
  • Platform Comparison Matrix
  • SQL Platforms
    • Apache DataFusion Platform Guide
    • ClickHouse Local Mode
    • Polars Platform
    • PostgreSQL Platform
    • PrestoDB Platform Guide
    • Trino Platform
    • Apache Spark Platform
  • DataFrame Platforms
    • DataFrame Platforms
    • Polars Platform
    • Pandas DataFrame Platform
    • Modin DataFrame Platform
    • Dask DataFrame Platform
    • cuDF DataFrame Platform
    • PySpark Platform (SQL & DataFrame)
    • DataFusion DataFrame Platform
  • Cloud Platforms
    • AWS Athena Platform
    • Firebolt Platform
    • Microsoft Fabric
    • Azure Analytics Platforms
  • Potential Future Platforms

Guides

  • Guides
    • TPC Benchmark Guides
      • TPC-H Official Benchmark Guide
      • TPC-DS Official Benchmark Guide
      • TPC-DI Deployment Guide
      • TPC-DI ETL Implementation Guide
      • TPC Maintenance Phase: Complete Guide
      • TPC Patterns Usage Guide
      • TPC Test Result Validation System
    • Platform Comparison Guide
    • DataFrame Cross-Platform Comparison
    • Migrating to DataFrame Benchmarking
    • Row Count Validation
    • Cloud Storage Support
    • Data Compression in BenchBox
    • MCP Integration Guide
    • Query Plan Analysis
    • DataFrame Performance Benchmarks
    • DataFrame Performance Optimization Guide
  • TPC Benchmark Guides
    • TPC-H Official Benchmark Guide
    • TPC-DS Official Benchmark Guide
    • TPC-DI Deployment Guide
    • TPC-DI ETL Implementation Guide
    • TPC Maintenance Phase: Complete Guide
    • TPC Patterns Usage Guide
    • TPC Test Result Validation System
  • Advanced Topics
    • Power Run Iterations and Concurrent Query Execution
    • Optimizer Sniff Test Queries
    • Performance Monitoring
    • Advanced Performance Optimization Guide
    • Performance & Tuning
      • Performance Monitoring
      • Advanced Performance Optimization Guide
      • Power Run Iterations and Concurrent Query Execution
    • Open Table Formats Guide
    • Custom Benchmarks
    • CI/CD Integration Guide
    • Customization & Extensions
      • Custom Benchmarks
      • Optimizer Sniff Test Queries
  • Open Table Formats Guide
  • Advanced Performance Optimization Guide
  • Power Run Iterations and Concurrent Query Execution
  • Optimizer Sniff Test Queries
  • CI/CD Integration Guide
  • Custom Benchmarks
  • BenchBox Visualization Architecture
    • Chart Generation Guide
    • Chart Types
    • Customization
    • Templates
    • CLI Reference: benchbox visualize
  • Cloud Storage Support
  • Data Compression in BenchBox

Reference

  • Reference Documentation
    • BenchBox CLI Reference
      • run - Run Benchmarks
      • convert - Convert Data Formats
      • shell - Interactive SQL Shell
      • platforms - Platform Management
      • Utility Commands
      • Tuning Commands
      • Results Commands
      • Configuration
      • Common Workflows
      • Troubleshooting
    • CLI Command Reference
      • BenchBox CLI Reference
        • run - Run Benchmarks
        • convert - Convert Data Formats
        • shell - Interactive SQL Shell
        • platforms - Platform Management
        • Utility Commands
        • Tuning Commands
        • Results Commands
        • Configuration
        • Common Workflows
        • Troubleshooting
      • run - Run Benchmarks
      • shell - Interactive SQL Shell
      • platforms - Platform Management
      • Tuning Commands
      • Results Commands
      • Configuration
      • Utility Commands
      • Common Workflows
      • Troubleshooting
    • API Reference
    • MCP Server Reference
    • Result Export Formats
    • Benchmark Result Schema (v1.1)
    • Python API Reference
      • Base Benchmark API
      • Benchmark API Reference
        • Benchmark APIs
        • TPC-H Benchmark API
        • TPC-DS Benchmark API
        • TPC-DI Benchmark API
        • SSB (Star Schema Benchmark) API
        • ClickBench Benchmark API
        • Join Order Benchmark API
        • AMPLab Big Data Benchmark API
        • H2O.ai Database Benchmark API
        • Read Primitives Benchmark API
        • Write Primitives Benchmark API
      • Results API
      • Result Analysis API
      • DuckDB Platform Adapter
      • Apache DataFusion Platform Adapter
      • SQLite Platform Adapter
      • ClickHouse Platform Adapter
      • Databricks Platform Adapter
      • BigQuery Platform Adapter
      • Snowflake Platform Adapter
      • Amazon Redshift Platform Adapter
      • Cloud Storage Integration API
      • Data Validation Utilities API
      • Utilities & Helpers
        • Utility Functions API
        • Tuning Configuration API
        • Cloud Storage Integration API
        • Result Analysis API
        • Data Validation Utilities API
        • Performance Monitoring Utilities API
        • Additional Utilities API
      • Additional Utilities API
      • Performance Monitoring Utilities API
      • Tuning Configuration API
    • Platform API Reference
      • DuckDB Platform Adapter
      • Apache DataFusion Platform Adapter
      • ClickHouse Platform Adapter
      • Databricks Platform Adapter
      • BigQuery Platform Adapter
      • Snowflake Platform Adapter
      • Amazon Redshift Platform Adapter
      • SQLite Platform Adapter
  • BenchBox CLI Reference
    • run - Run Benchmarks
    • convert - Convert Data Formats
    • shell - Interactive SQL Shell
    • platforms - Platform Management
    • Utility Commands
    • Tuning Commands
    • Results Commands
    • Configuration
    • Common Workflows
    • Troubleshooting
  • Python API Reference
    • Base Benchmark API
    • Benchmark API Reference
      • Benchmark APIs
      • TPC-H Benchmark API
      • TPC-DS Benchmark API
      • TPC-DI Benchmark API
      • SSB (Star Schema Benchmark) API
      • ClickBench Benchmark API
      • Join Order Benchmark API
      • AMPLab Big Data Benchmark API
      • H2O.ai Database Benchmark API
      • Read Primitives Benchmark API
      • Write Primitives Benchmark API
    • Results API
    • Result Analysis API
    • DuckDB Platform Adapter
    • Apache DataFusion Platform Adapter
    • SQLite Platform Adapter
    • ClickHouse Platform Adapter
    • Databricks Platform Adapter
    • BigQuery Platform Adapter
    • Snowflake Platform Adapter
    • Amazon Redshift Platform Adapter
    • Cloud Storage Integration API
    • Data Validation Utilities API
    • Utilities & Helpers
      • Utility Functions API
      • Tuning Configuration API
      • Cloud Storage Integration API
      • Result Analysis API
      • Data Validation Utilities API
      • Performance Monitoring Utilities API
      • Additional Utilities API
    • Additional Utilities API
    • Performance Monitoring Utilities API
    • Tuning Configuration API
  • MCP Server Reference
  • Benchmark Result Schema (v1.1)
  • API Reference

Development

  • Developer Documentation
    • Getting Started with Development
      • Development Guide
      • BenchBox Testing Guide
      • Live Integration Tests
    • Development Guide
    • Platform Development
      • Adding New Platform Adapters
      • Adding a New DataFrame Platform
      • Runtime Module Architecture Overview
      • DB API 2.0: Foundation of BenchBox Platform Support
    • Adding New Platform Adapters
    • Adding a New DataFrame Platform
    • Architecture & Design
      • Design & Architecture
        • BenchBox Architecture Design Document
        • BenchBox Repository Structure
      • BenchBox Architecture Design Document
      • BenchBox Repository Structure
      • Import Patterns and Lazy Loading
    • Data & Dependencies
      • Data Sharing Between Benchmarks
      • Dependency Compatibility Matrix
      • Primitives Query Catalog
      • TPC Binary Auto-Compilation Guide
    • Import Patterns and Lazy Loading
    • TPC Binary Auto-Compilation Guide
    • BenchBox Testing Guide
    • Primitives Query Catalog
    • Runtime Module Architecture Overview
    • Data Sharing Between Benchmarks
    • Dependency Compatibility Matrix
    • DB API 2.0: Foundation of BenchBox Platform Support
    • BenchBox Test Quality Guidelines
    • BenchBox Platform Configuration Audit
  • Getting Started with Development
    • Development Guide
    • BenchBox Testing Guide
    • Live Integration Tests
  • Architecture & Design
    • Design & Architecture
      • BenchBox Architecture Design Document
      • BenchBox Repository Structure
    • BenchBox Architecture Design Document
    • BenchBox Repository Structure
    • Import Patterns and Lazy Loading
  • Platform Development
    • Adding New Platform Adapters
    • Adding a New DataFrame Platform
    • Runtime Module Architecture Overview
    • DB API 2.0: Foundation of BenchBox Platform Support
  • BenchBox Architecture Design Document
  • Testing Documentation
    • End-to-End (E2E) Testing Guide
    • Live Integration Tests

Browse by Tag

  • Browse by Tag
    • By Audience
      • beginner (19)
      • intermediate (75)
      • advanced (28)
      • contributor (24)
    • By Benchmark
      • tpc-h (13)
      • tpc-ds (9)
      • tpc-di (5)
      • tpc-havoc (1)
      • tpch-skew (1)
      • ssb (2)
      • clickbench (2)
      • h2odb (2)
      • join-order (1)
      • amplab (1)
      • nyctaxi (1)
      • coffeeshop (1)
      • datavault (1)
      • tsbs-devops (1)
      • read-primitives (1)
      • write-primitives (1)
      • transaction-primitives (1)
      • metadata-primitives (1)
      • ai-primitives (1)
      • custom-benchmark (20)
    • By Platform
      • duckdb (4)
      • sqlite (2)
      • postgresql (1)
      • datafusion (1)
      • snowflake (3)
      • databricks (2)
      • bigquery (2)
      • redshift (1)
      • motherduck (1)
      • starburst (1)
      • clickhouse (2)
      • trino (1)
      • presto (1)
      • firebolt (1)
      • timescaledb (1)
      • influxdb (1)
      • athena (1)
      • aws-glue (1)
      • emr-serverless (1)
      • athena-spark (1)
      • dataproc (1)
      • dataproc-serverless (1)
      • azure (1)
      • fabric (1)
      • fabric-spark (1)
      • synapse-spark (1)
      • spark (1)
      • pyspark (1)
      • pandas (1)
      • polars (2)
      • dask (1)
      • modin (1)
      • cudf (1)
      • datafusion-df (1)
    • By Platform Type
      • sql-platform (18)
      • dataframe-platform (16)
      • cloud-platform (19)
      • embedded-platform (3)
      • cloud-storage (3)
    • By Content Type
      • guide (86)
      • tutorial (5)
      • reference (73)
      • concept (31)
      • quickstart (4)
    • By Feature
      • architecture (1)
      • cli (16)
      • cloud (2)
      • e2e (1)
      • python-api (37)
      • configuration (3)
      • data-generation (3)
      • performance (11)
      • tuning (3)
      • validation (12)
      • testing (5)
      • visualization (6)
Back to top
Edit this page
View this page

Data & Dependencies¶

Tags contributor reference

Guides for managing data generation, dependencies, and specialized components.

  • Data Sharing Between Benchmarks
  • Dependency Compatibility Matrix
  • Primitives Query Catalog
  • TPC Binary Auto-Compilation Guide
Next
Data Sharing Between Benchmarks
Previous
Import Patterns and Lazy Loading
Copyright © 2025, Joe Harris
Made with Sphinx and @pradyunsg's Furo