TPC Test Result Validation System¶
A systematic validation system for TPC benchmark test results that ensures compliance with official TPC specification requirements.
Overview¶
The TPC Test Result Validation System provides:
Comprehensive Validation: Validates all aspects of TPC test results including completeness, query execution, timing, data integrity, and metrics
Compliance Checking: Ensures results meet official TPC specification requirements
Certification Readiness: Validates certification requirements and generates readiness reports
Audit Trail: Tracks all validation activities for reproducibility and compliance auditing
Multi-Benchmark Support: Supports TPC-H, TPC-DS, TPC-DI, and other TPC benchmarks
Detailed Reporting: Generates systematic validation reports with issue tracking and metrics
Architecture¶
The validation system consists of several key components:
Core Components¶
TPCResultValidator: Main validation engine that coordinates all validation activitiesValidationReport: Comprehensive report containing all validation results, issues, and metricsAuditTrail: Tracks all validation events for reproducibility and compliance auditing
Validators¶
CompletenessValidator: Ensures all required tests completed successfullyQueryResultValidator: Validates query execution results and performanceTimingValidator: Validates timing measurements and precision requirementsDataIntegrityValidator: Validates data integrity during maintenance operationsMetricsValidator: Validates metric calculations and statistical validityComplianceChecker: Checks overall TPC compliance requirementsCertificationChecker: Validates certification readiness and requirements
Installation¶
The validation system is included in the BenchBox library:
from benchbox.core.tpc_validation import TPCResultValidator, ValidationLevel
Quick Start¶
Basic Usage¶
from benchbox.core.tpc_validation import TPCResultValidator, ValidationLevel
# Create validator
validator = TPCResultValidator()
# Prepare test results (see Test Results Format section)
test_results = {
"benchmark_name": "TPC-H",
"scale_factor": 1.0,
"test_start_time": "2023-01-01T10:00:00Z",
"test_end_time": "2023-01-01T11:00:00Z",
"query_results": {
"1": {
"status": "success",
"execution_time": 5.2,
"row_count": 100,
"results": [{"col1": "value1", "col2": "value2"}]
}
},
"data_generation": {
"generation_time": 120.5,
"generated_tables": ["customer", "orders", "lineitem"]
},
"metrics": {
"avg_query_time": 5.2,
"total_query_time": 5.2
}
}
# Validate results
report = validator.validate(test_results, ValidationLevel.STANDARD)
# Check results
print(f"Overall Result: {report.overall_result.value}")
print(f"Validation Score: {report.metrics.get('validation_score', 0):.1f}%")
print(f"Issues Found: {len(report.issues)}")
Configuration¶
You can customize the validation behavior with configuration:
config = {
"validators": {
"completeness": {
"required_queries": {"TPC-H": list(range(1, 23))},
"required_tables": ["customer", "orders", "lineitem"]
},
"timing": {
"max_execution_time": 3600,
"precision_threshold": 0.001
},
"certification": {
"required_documentation": ["test_report", "environment_spec"],
"performance_thresholds": {
"avg_query_time": 30.0
}
}
}
}
validator = TPCResultValidator(config)
Test Results Format¶
The validation system expects test results in the following format:
test_results = {
# Required fields
"benchmark_name": "TPC-H", # Benchmark name (TPC-H, TPC-DS, TPC-DI)
"scale_factor": 1.0, # Scale factor used
"test_start_time": "2023-01-01T10:00:00Z", # ISO format timestamp
"test_end_time": "2023-01-01T11:00:00Z", # ISO format timestamp
# Query execution results
"query_results": {
"1": {
"status": "success", # success, failed, timeout
"execution_time": 5.2, # Execution time in seconds
"row_count": 100, # Number of rows returned
"results": [...], # Optional: actual query results
"error": "..." # Optional: error message if failed
}
},
# Data generation information
"data_generation": {
"generation_time": 120.5, # Time to generate data in seconds
"generated_tables": ["customer", "orders", "lineitem"]
},
# Calculated metrics
"metrics": {
"avg_query_time": 5.2,
"total_query_time": 5.2,
"queries_per_second": 0.19
},
# Optional: Maintenance operations (for TPC-DS, TPC-DI)
"maintenance_operations": {
"insert_operation": {
"status": "success",
"start_time": "2023-01-01T10:30:00Z",
"end_time": "2023-01-01T10:35:00Z",
"records_affected": 5000
}
},
# Optional: ETL operations (for TPC-DI)
"etl_operations": {
"extract_customers": {
"status": "success",
"start_time": "2023-01-01T09:00:00Z",
"end_time": "2023-01-01T09:15:00Z",
"records_processed": 150000
}
},
# Reproducibility information
"reproducibility": {
"seed": 12345,
"timestamp": "2023-01-01T10:00:00Z",
"environment": "test_env"
},
# Test isolation
"test_isolation": {
"isolated": True
},
# Documentation
"documentation": {
"test_report": "path/to/test_report.pdf",
"environment_spec": "path/to/env_spec.json"
}
}
Validation Levels¶
The system supports three validation levels:
ValidationLevel.BASIC¶
Basic completeness checks
Query execution validation
Simple timing validation
ValidationLevel.STANDARD¶
All basic validations
Data integrity validation
Metrics validation
Compliance checking
ValidationLevel.CERTIFICATION¶
All standard validations
Certification readiness checks
Enhanced documentation requirements
Performance threshold validation
Validation Results¶
ValidationResult Enum¶
PASSED: All validations passedWARNING: Validations passed with warningsFAILED: One or more validations failedSKIPPED: Validation was skipped
ValidationReport¶
The ValidationReport contains:
Overall validation result
Individual validator results
Detailed issues with levels (ERROR, WARNING, INFO)
Calculated metrics
Execution summary
Audit trail
Certification status
Individual Validators¶
CompletenessValidator¶
Validates that all required test components are present:
Required queries executed
Required tables generated
Required maintenance operations completed
Execution metadata present
QueryResultValidator¶
Validates query execution results:
All queries executed successfully
Execution times within acceptable bounds
Row counts reasonable
Result data integrity
Schema validation (if configured)
TimingValidator¶
Validates timing measurements:
Total test time reasonable
Query execution times consistent
Timing precision adequate
Data generation timing reasonable
DataIntegrityValidator¶
Validates data integrity during maintenance operations:
Maintenance operations successful
Referential integrity maintained
Data consistency preserved
Transaction isolation maintained
MetricsValidator¶
Validates calculated metrics:
Required metrics present
Metric values within expected ranges
Calculated metrics match reported values
Statistical validity of results
ComplianceChecker¶
Checks TPC compliance requirements:
Benchmark-specific requirements (TPC-H: 22 queries, TPC-DS: 99 queries)
Scale factor compliance
Test isolation requirements
Reproducibility requirements
CertificationChecker¶
Validates certification readiness:
Performance thresholds met
Documentation complete
Test completeness for certification
Overall readiness assessment
Integration with Existing Benchmarks¶
The validation system integrates smoothly with existing TPC benchmarks:
TPC-H Integration¶
from benchbox import TPCH
from benchbox.core.tpc_validation import TPCResultValidator
# Create benchmark
benchmark = TPCH(scale_factor=1.0)
# Run benchmark (collect results)
test_results = run_benchmark_and_collect_results(benchmark)
# Validate results
validator = TPCResultValidator()
report = validator.validate(test_results, ValidationLevel.STANDARD)
TPC-DS Integration¶
from benchbox.tpcds import TPCDSBenchmark
from benchbox.core.tpc_validation import TPCResultValidator
# Create benchmark
benchmark = TPCDSBenchmark(scale_factor=1.0)
# Run benchmark (collect results)
test_results = run_benchmark_and_collect_results(benchmark)
# Validate results with TPC-DS specific config
config = {
"validators": {
"completeness": {
"required_queries": {"TPC-DS": list(range(1, 100))},
"required_maintenance_ops": ["insert_sales", "update_inventory"]
}
}
}
validator = TPCResultValidator(config)
report = validator.validate(test_results, ValidationLevel.CERTIFICATION)
Custom Validators¶
You can create custom validators for specific requirements:
from benchbox.core.tpc_validation import BaseValidator, ValidationResult
class CustomBusinessRuleValidator(BaseValidator):
def validate(self, test_results, report):
# Custom validation logic
query_results = test_results.get("query_results", {})
if len(query_results) < 5:
report.add_issue(
"ERROR",
f"Minimum 5 queries required, found {len(query_results)}",
{"query_count": len(query_results)},
self.name
)
return ValidationResult.FAILED
return ValidationResult.PASSED
# Use custom validator
validator = TPCResultValidator()
validator.validators.append(CustomBusinessRuleValidator("custom_business_rules"))
Report Management¶
Saving Reports¶
from pathlib import Path
# Save validation report
validator.save_report(report, Path("validation_report.json"))
# Save to specific directory
report_dir = Path("validation_reports")
report_dir.mkdir(exist_ok=True)
validator.save_report(report, report_dir / "tpch_validation.json")
Loading Reports¶
# Load validation report
loaded_report = validator.load_report(Path("validation_report.json"))
# Access report data
print(f"Report ID: {loaded_report.validation_id}")
print(f"Overall Result: {loaded_report.overall_result.value}")
Report Analysis¶
# Get issues by level
errors = report.get_issues_by_level("ERROR")
warnings = report.get_issues_by_level("WARNING")
# Get issues by validator
timing_issues = report.get_issues_by_validator("timing")
compliance_issues = report.get_issues_by_validator("compliance")
# Access metrics
validation_score = report.metrics.get("validation_score", 0)
total_queries = report.execution_summary.get("total_queries", 0)
success_rate = report.execution_summary.get("success_rate", 0)
Examples¶
Basic Example¶
from benchbox.core.tpc_validation import create_sample_test_results, TPCResultValidator
# Create sample test results
test_results = create_sample_test_results()
# Validate
validator = TPCResultValidator()
report = validator.validate(test_results)
# Print results
print(f"Result: {report.overall_result.value}")
print(f"Issues: {len(report.issues)}")
Certification Example¶
# Configure for certification
config = {
"validators": {
"certification": {
"required_documentation": ["test_report", "environment_spec"],
"performance_thresholds": {
"avg_query_time": 30.0
}
}
}
}
validator = TPCResultValidator(config)
report = validator.validate(test_results, ValidationLevel.CERTIFICATION)
print(f"Certification Status: {report.certification_status}")
Multi-Benchmark Suite¶
# Run validation suite across multiple benchmarks
benchmarks = ["TPC-H", "TPC-DS", "TPC-DI"]
suite_results = {}
for benchmark in benchmarks:
test_results = run_benchmark(benchmark)
report = validator.validate(test_results)
suite_results[benchmark] = report
# Generate compliance summary
compliance_summary = {
benchmark: {
"compliant": len(report.get_issues_by_level("ERROR")) == 0,
"score": report.metrics.get("validation_score", 0)
}
for benchmark, report in suite_results.items()
}
Best Practices¶
1. Configuration Management¶
Use configuration files for complex validation setups
Store benchmark-specific configurations separately
Version control your validation configurations
2. Error Handling¶
Always check validation results before proceeding
Handle validation failures gracefully
Log validation issues for debugging
3. Performance Considerations¶
Use appropriate validation levels for your use case
Consider caching validation results for repeated runs
Profile validation performance for large test suites
4. Compliance Tracking¶
Save validation reports for audit trails
Track compliance over time
Generate regular compliance summaries
5. Integration¶
Integrate validation into your CI/CD pipeline
Use validation results to gate releases
Monitor validation trends over time
Troubleshooting¶
Common Issues¶
Missing Required Fields
Ensure all required fields are present in test results
Check field names and formats match specifications
Timing Validation Failures
Verify timestamp formats are ISO 8601 compliant
Check for reasonable execution times
Ensure timing consistency across measurements
Query Result Validation Issues
Verify all queries completed successfully
Check row counts and result formats
Validate error handling for failed queries
Compliance Failures
Check benchmark-specific requirements
Verify all required queries are present
Ensure maintenance operations are included where required
Debug Mode¶
Enable debug logging for detailed validation information:
import logging
logging.basicConfig(level=logging.DEBUG)
validator = TPCResultValidator()
report = validator.validate(test_results)
API Reference¶
TPCResultValidator¶
__init__(config=None): Create validator with optional configurationvalidate(test_results, validation_level=ValidationLevel.STANDARD): Validate test resultssave_report(report, output_path): Save validation report to fileload_report(input_path): Load validation report from filecreate_default_config(): Create default configuration
ValidationReport¶
add_issue(level, message, details=None, validator_name=""): Add validation issueget_issues_by_level(level): Get issues by severity levelget_issues_by_validator(validator_name): Get issues by validatorto_dict(): Convert report to dictionary for serialization
BaseValidator¶
validate(test_results, report): Perform validation (abstract method)_check_required_fields(data, required_fields): Check for required fields
Contributing¶
To contribute to the TPC validation system:
Create new validators by inheriting from
BaseValidatorAdd benchmark-specific validation rules
Extend the configuration system
Add new validation levels
Improve error reporting and diagnostics
License¶
This validation system is part of the BenchBox library and follows the same license terms.