Power Run Iterations and Concurrent Query Execution¶
Comprehensive guide to BenchBox’s execution modes for benchmark testing with statistical confidence and throughput analysis.
Overview¶
BenchBox provides two execution modes:
Power Run Iterations: Execute multiple test iterations sequentially to gather statistical confidence and eliminate outliers
Concurrent Query Execution: Execute queries in parallel to test throughput, scalability, and multi-user scenarios
These features are essential for:
Production performance evaluation
Database stress testing
Scalability analysis
Statistical confidence in benchmark results
Regression testing with confidence intervals
Power Run Iterations¶
What are Power Run Iterations?¶
Power Run Iterations execute the same benchmark test multiple times with different query orderings per iteration, providing statistical analysis of performance consistency. This approach:
Eliminates outliers from single-run measurements
Provides confidence intervals for benchmark results
Detects performance variance and consistency issues
Enables statistical comparison between configurations
Follows TPC specifications with proper stream permutations (each iteration uses different stream_id)
Tests different query interaction patterns to stress query optimizers
Key Benefits¶
Statistical Confidence: Get mean, median, standard deviation, and confidence intervals
Performance Consistency: Identify systems with high performance variance
Outlier Detection: Automatically identify and handle anomalous runs
Regression Testing: Compare performance changes with statistical significance
Professional Reporting: Generate publication-quality benchmark reports
Configuration Options¶
# benchbox.yaml
execution:
power_run:
iterations: 5 # Number of test iterations to run
warm_up_iterations: 2 # Warm-up runs (excluded from statistics)
timeout_per_iteration_minutes: 90 # Maximum time per iteration
fail_fast: false # Stop on first failure vs. continue
collect_metrics: true # Collect detailed performance metrics
Usage Examples¶
Basic Power Run Setup¶
from benchbox.utils import PowerRunExecutor, ExecutionConfigHelper
from benchbox import TPCH
# Configure power run settings
config_helper = ExecutionConfigHelper()
config_helper.enable_power_run_iterations(
iterations=5, # 5 test iterations
warm_up_iterations=2 # 2 warm-up iterations
)
# Create benchmark
tpch = TPCH(scale_factor=0.1)
# Execute power runs
power_executor = PowerRunExecutor()
def create_power_test():
from benchbox.core.tpch.power_test import TPCHPowerTest
return TPCHPowerTest(
benchmark=tpch,
connection_string="duckdb:test.db",
scale_factor=0.1
)
result = power_executor.execute_power_runs(create_power_test)
# Statistical analysis
print(f"Average Power@Size: {result.avg_power_at_size:.2f}")
print(f"Std Deviation: {result.power_at_size_stdev:.2f}")
print(f"Confidence: {result.iterations_successful}/{result.iterations_completed}")
Advanced-level Statistical Analysis¶
import numpy as np
from scipy import stats
# Get power@size values from all iterations
power_values = [iter_result.power_at_size for iter_result in result.iteration_results]
# Calculate confidence interval (95%)
confidence_level = 0.95
degrees_freedom = len(power_values) - 1
sample_mean = np.mean(power_values)
sample_standard_error = stats.sem(power_values)
confidence_interval = stats.t.interval(
confidence_level,
degrees_freedom,
sample_mean,
sample_standard_error
)
print(f"95% Confidence Interval: {confidence_interval[0]:.2f} - {confidence_interval[1]:.2f}")
print(f"Coefficient of Variation: {(result.power_at_size_stdev / result.avg_power_at_size) * 100:.1f}%")
Performance Profiles for Power Runs¶
BenchBox provides pre-configured profiles configured for different testing scenarios:
Quick Profile¶
Use Case: Rapid development testing
Iterations: 1 (no statistical analysis)
Warm-up: 0 iterations
Timeout: 30 minutes
Best For: Development, unit testing
Standard Profile¶
Use Case: Balanced performance testing
Iterations: 3 + 1 warm-up
Timeout: 60 minutes per iteration
Best For: Regular performance evaluation
Thorough Profile¶
Use Case: Comprehensive analysis
Iterations: 5 + 2 warm-up
Timeout: 120 minutes per iteration
Best For: Production evaluation, research
Stress Profile¶
Use Case: Maximum statistical confidence
Iterations: 10 + 3 warm-up
Timeout: 180 minutes per iteration
Best For: Official benchmarking, publications
Concurrent Query Execution¶
What is Concurrent Query Execution?¶
Concurrent Query Execution runs multiple query streams simultaneously to test:
Throughput performance under multi-user load
Scalability characteristics as load increases
Resource contention behavior
Multi-threading efficiency of database engines
Real-world performance with concurrent users
Key Benefits¶
Throughput Measurement: Queries per second under concurrent load
Scalability Testing: Performance scaling with concurrent streams
Resource Analysis: CPU, memory, I/O utilization under load
Multi-User Simulation: Real-world concurrent access patterns
Bottleneck Identification: Find system limitations and contention points
Configuration Options¶
execution:
concurrent_queries:
enabled: true # Enable concurrent execution
max_concurrent: 4 # Maximum concurrent streams
query_timeout_seconds: 600 # Individual query timeout
stream_timeout_seconds: 7200 # Total stream timeout
retry_failed_queries: true # Retry failed queries
max_retries: 5 # Maximum retry attempts
Usage Examples¶
Basic Concurrent Execution¶
from benchbox.utils import ConcurrentQueryExecutor, ExecutionConfigHelper
from benchbox import TPCH
# Configure concurrent execution
config_helper = ExecutionConfigHelper()
config_helper.enable_concurrent_queries(max_concurrent=4)
# Create benchmark
tpch = TPCH(scale_factor=0.1)
# Execute concurrent queries
concurrent_executor = ConcurrentQueryExecutor()
def create_throughput_test(stream_id):
from benchbox.core.tpch.throughput_test import TPCHThroughputTest
return TPCHThroughputTest(
benchmark=tpch,
connection_string=f"duckdb:stream_{stream_id}.db",
num_streams=1, # Each executor handles one stream
verbose=False
)
result = concurrent_executor.execute_concurrent_queries(create_throughput_test)
# Throughput analysis
print(f"Throughput: {result.throughput_queries_per_second:.2f} queries/sec")
print(f"Success Rate: {result.queries_successful}/{result.queries_executed}")
print(f"Concurrent Streams: {len(result.stream_results)}")
Scalability Analysis¶
# Test scalability across different concurrency levels
concurrency_levels = [1, 2, 4, 8, 16]
throughput_results = {}
for level in concurrency_levels:
print(f"\nTesting with {level} concurrent streams...")
config_helper.get_concurrent_queries_settings().max_concurrent = level
result = concurrent_executor.execute_concurrent_queries(
create_throughput_test,
num_streams=level
)
throughput_results[level] = {
'throughput': result.throughput_queries_per_second,
'success_rate': result.queries_successful / result.queries_executed,
'avg_duration': result.total_duration / level
}
# Analyze scalability
print(f"\n Scalability Analysis:")
print(f"{'Streams':<8} {'Throughput':<12} {'Success Rate':<12} {'Scalability':<12}")
print("-" * 50)
base_throughput = throughput_results[1]['throughput']
for level, metrics in throughput_results.items():
scalability = metrics['throughput'] / base_throughput
print(f"{level:<8} {metrics['throughput']:<12.2f} "
f"{metrics['success_rate']:<12.1%} {scalability:<12.2f}x")
System Optimization¶
Automatic Resource-Based Configuration¶
BenchBox automatically optimizes execution settings based on available system resources:
import psutil
from benchbox.utils import ExecutionConfigHelper
config_helper = ExecutionConfigHelper()
# Auto-optimize based on system specs
cpu_cores = psutil.cpu_count()
memory_gb = psutil.virtual_memory().total / (1024**3)
config_helper.optimize_for_system(cpu_cores=cpu_cores, memory_gb=memory_gb)
# View configured settings
summary = config_helper.get_execution_summary()
print(f"Optimized for {cpu_cores} cores, {memory_gb:.1f}GB RAM:")
print(f"- Max concurrent streams: {summary['concurrent_queries']['max_streams']}")
print(f"- Power run timeout: {summary['power_run']['settings']['timeout_per_iteration_minutes']} min")
Optimization Rules¶
CPU Core Optimization:
max_concurrent = min(8, max(2, cpu_cores // 4))Prevents over-subscription of CPU resources
Scales appropriately from 2-core to 32+ core systems
Memory Optimization:
< 8GB: Extended timeouts (120min power, 600s queries)
8-16GB: Standard timeouts (60min power, 300s queries)
> 16GB: Reduced timeouts (45min power, 180s queries)
Storage Optimization:
SSD systems: Higher concurrency, shorter timeouts
HDD systems: Lower concurrency, longer timeouts
Network storage: Conservative settings with retries
Comprehensive Testing Workflow¶
Production Evaluation Workflow¶
from benchbox import TPCH
from benchbox.utils import ExecutionConfigHelper, PowerRunExecutor, ConcurrentQueryExecutor
# 1. System Analysis and Optimization
config_helper = ExecutionConfigHelper()
config_helper.apply_performance_profile('thorough') # Comprehensive testing
import psutil
config_helper.optimize_for_system(
cpu_cores=psutil.cpu_count(),
memory_gb=psutil.virtual_memory().total / (1024**3)
)
# 2. Benchmark Setup
tpch = TPCH(scale_factor=1.0) # Production scale
# 3. Power Run Testing (Statistical Confidence)
print("Phase 1: Power Run Analysis")
power_executor = PowerRunExecutor(config_helper.config_manager)
def create_power_test():
from benchbox.core.tpch.power_test import TPCHPowerTest
return TPCHPowerTest(
benchmark=tpch,
connection_string="duckdb:production_test.db",
scale_factor=1.0,
warm_up=True,
validation=True
)
power_result = power_executor.execute_power_runs(create_power_test)
print(f" Single-Stream Performance:")
print(f" Average: {power_result.avg_power_at_size:.2f} Power@Size")
print(f" Std Dev: {power_result.power_at_size_stdev:.2f}")
print(f" Range: {power_result.min_power_at_size:.2f} - {power_result.max_power_at_size:.2f}")
# 4. Concurrent Query Testing (Throughput Analysis)
print("\nPhase 2: Concurrent Throughput Analysis")
concurrent_executor = ConcurrentQueryExecutor(config_helper.config_manager)
def create_throughput_test(stream_id):
from benchbox.core.tpch.throughput_test import TPCHThroughputTest
return TPCHThroughputTest(
benchmark=tpch,
connection_string=f"duckdb:throughput_stream_{stream_id}.db",
num_streams=1,
verbose=False
)
concurrent_result = concurrent_executor.execute_concurrent_queries(create_throughput_test)
print(f" Multi-Stream Performance:")
print(f" Throughput: {concurrent_result.throughput_queries_per_second:.2f} queries/sec")
print(f" Success Rate: {concurrent_result.queries_successful}/{concurrent_result.queries_executed}")
print(f" Concurrent Streams: {len(concurrent_result.stream_results)}")
# 5. Comprehensive Analysis
efficiency = concurrent_result.throughput_queries_per_second / (power_result.avg_power_at_size / 3600)
print(f"\n Performance Analysis:")
print(f" Single-stream efficiency: {power_result.avg_power_at_size:.2f} Power@Size")
print(f" Multi-stream efficiency: {efficiency:.2f}x scaling")
print(f" Performance consistency: ±{power_result.power_at_size_stdev:.1f} Power@Size")
Best Practices¶
Power Run Iterations¶
Iteration Count Selection:
Development: 1-3 iterations
Performance evaluation: 3-5 iterations
Official benchmarking: 5-10 iterations
Warm-up Iterations:
Always include 1-2 warm-up iterations
Warm-up eliminates cold-start effects
Excluded from statistical calculations
Timeout Configuration:
Small scale factors: 30-60 minutes
Large scale factors: 60-180 minutes
Factor in system performance characteristics
Statistical Analysis:
Calculate confidence intervals for comparisons
Use coefficient of variation to assess consistency
Report both mean and median values
Concurrent Query Execution¶
Concurrency Selection:
Start with
CPU cores / 4for initial testingScale up to find appropriate concurrent level
Monitor system resource utilization
Resource Monitoring:
Watch for CPU over-subscription
Monitor memory usage and swapping
Check I/O bottlenecks and disk utilization
Timeout Management:
Set generous timeouts for initial testing
Adjust based on observed query execution times
Include retry logic for transient failures
Result Interpretation:
Linear scaling indicates good parallelization
Sub-linear scaling suggests resource contention
Throughput decrease indicates over-subscription
System Optimization¶
Resource Assessment:
Use system profiling before configuration
Consider available vs. total memory
Account for other system processes
Environment Configuration:
Disable unnecessary background processes
Set appropriate database configuration
Ensure adequate temporary disk space
Validation:
Always validate configurations before execution
Test with smaller scale factors first
Monitor resource usage during execution
TPC Specification Compliance¶
BenchBox ensures compliance with official TPC specifications for query ordering in both power and throughput tests.
Power Run Compliance¶
Each power run iteration uses a different stream permutation to ensure varied query interaction patterns:
# Iteration 0: Uses TPC-H stream 0 permutation [14, 2, 9, 20, 6, 17, 18, 8, 21, 13, 3, 22, 16, 4, 11, 15, 1, 10, 19, 5, 7, 12]
# Iteration 1: Uses TPC-H stream 1 permutation [21, 3, 18, 5, 11, 7, 6, 20, 17, 12, 16, 15, 13, 10, 2, 8, 14, 19, 9, 22, 1, 4]
# Iteration 2: Uses TPC-H stream 2 permutation [6, 17, 14, 16, 19, 10, 9, 2, 15, 8, 5, 22, 12, 7, 13, 18, 1, 4, 20, 3, 11, 21]
config = PowerRunSettings()
config.iterations = 5 # Each iteration uses different stream (0, 1, 2, 3, 4)
Concurrent Query Compliance¶
Each concurrent stream uses a different permutation as mandated by TPC specifications:
# Stream 0: TPC-H permutation[0] or TPC-DS stream 0 permutation
# Stream 1: TPC-H permutation[1] or TPC-DS stream 1 permutation
# Stream 2: TPC-H permutation[2] or TPC-DS stream 2 permutation
config = ConcurrentQueriesSettings()
config.max_concurrent = 4 # Each stream gets different permutation (0, 1, 2, 3)
Why TPC Compliance Matters¶
Valid Comparisons: Results comparable to official TPC publications
Optimizer Testing: Different orderings stress different optimization paths
Statistical Validity: Varied patterns provide meaningful performance statistics
Research Validity: Academic research can rely on specification-compliant results
Verification¶
Check result compliance:
# Verify first query in TPC-H power test uses stream permutation
assert result.query_results[0]['query_id'] == 14 # Stream 0 starts with query 14
assert result.query_results[0]['stream_id'] == 0 # Stream ID recorded
assert result.query_results[0]['position'] == 1 # Position in permutation
For detailed compliance documentation, see: _project/TPC_SPECIFICATION_COMPLIANCE.md
Troubleshooting¶
Common Issues¶
Power Run Failures:
Memory exhaustion: Reduce scale factor or increase timeout
High variability: Check for system interference or thermal throttling
Iteration failures: Enable detailed logging to identify specific issues
Concurrent Execution Problems:
Deadlocks: Reduce concurrency level or enable retries
Resource contention: Lower concurrent stream count
Timeout errors: Increase timeout values or optimize queries
Configuration Issues:
Invalid settings: Use validation before execution
System limitations: Apply auto-optimization for current hardware
Profile conflicts: Reset to default before applying new profiles
Performance Optimization Tips¶
Database Configuration:
Optimize memory allocation (75% of available RAM)
Enable appropriate number of worker threads
Configure temporary storage on fast disks
System Tuning:
Disable CPU frequency scaling during tests
Set high-performance power profile
Minimize background process interference
Test Environment:
Use dedicated systems for benchmarking
Ensure consistent thermal conditions
Minimize network and disk I/O interference
This systematic approach to power run iterations and concurrent query execution provides professional-grade benchmarking capabilities with statistical rigor and real-world relevance.