Additional Utilities API¶
Complete Python API reference for additional BenchBox utilities.
Overview¶
BenchBox provides several focused utility modules for common tasks: scale factor formatting, dependency validation, and system information collection. These utilities enable consistent naming, dependency management, and environment documentation.
Utilities Covered:
Scale Factor Formatting: Consistent naming for files, directories, and schemas
Dependency Validation: Verify dependencies match lock file requirements
System Information: Collect system and hardware information
Scale Factor Utilities¶
Utilities for consistent scale factor formatting across BenchBox.
Overview¶
The scale factor utilities provide standardized formatting for scale factors in filenames, directory names, and schema names. This ensures consistency across the framework.
Formatting Rules:
Values >= 1: No leading zero (
sf1,sf10,sf100)Values < 1: Leading zero + decimal digits (
sf01,sf001,sf0001)Non-integer values >= 1: Remove decimal point (
sf15for 1.5)
Quick Start¶
from benchbox.utils.scale_factor import (
format_scale_factor,
format_benchmark_name,
format_data_directory,
format_schema_name
)
# Format scale factors
print(format_scale_factor(1.0)) # "sf1"
print(format_scale_factor(0.1)) # "sf01"
print(format_scale_factor(0.01)) # "sf001"
print(format_scale_factor(10.0)) # "sf10"
# Format names
print(format_benchmark_name("tpch", 1.0)) # "tpch_sf1"
print(format_data_directory("tpcds", 0.1)) # "tpcds_sf01_data"
print(format_schema_name("ssb", 10.0)) # "ssb_sf10"
API Reference¶
- format_scale_factor(scale_factor)[source]¶
Format scale factor for filenames and identifiers.
Rules: - Values >= 1: No leading zero (sf1, sf10, sf100) - Values < 1: Leading zero + decimal digits (sf01, sf001, sf0001) - Leading zero implies the value is < 1
Examples: - 1.0 -> sf1 (no leading zero) - 0.1 -> sf01 (leading zero implies < 1) - 0.01 -> sf001 (leading zero implies < 1) - 0.001 -> sf0001 (leading zero implies < 1) - 10 -> sf10 (no leading zero) - 1.5 -> sf15 (remove decimal point, no leading zero)
- Parameters:
scale_factor (float) – The scale factor to format
- Returns:
Formatted scale factor string (e.g., “sf1”, “sf01”, “sf001”)
- Return type:
str
Signature:
format_scale_factor(scale_factor: float) -> str
Parameters:
scale_factor (float): Scale factor value
Returns: Formatted scale factor string (e.g., “sf1”, “sf01”, “sf001”)
Examples:
# Integer values >= 1
format_scale_factor(1.0) # "sf1"
format_scale_factor(10.0) # "sf10"
format_scale_factor(100.0) # "sf100"
# Decimal values < 1
format_scale_factor(0.1) # "sf01"
format_scale_factor(0.01) # "sf001"
format_scale_factor(0.001) # "sf0001"
# Non-integer values >= 1
format_scale_factor(1.5) # "sf15"
format_scale_factor(2.25) # "sf225"
- format_benchmark_name(benchmark_name, scale_factor)[source]¶
Format benchmark name with scale factor.
- Parameters:
benchmark_name (str) – Name of the benchmark (e.g., “tpch”, “tpcds”)
scale_factor (float) – Scale factor value
- Returns:
Formatted benchmark name (e.g., “tpch_sf1”, “tpcds_sf01”)
- Return type:
str
Signature:
format_benchmark_name(benchmark_name: str, scale_factor: float) -> str
Parameters:
benchmark_name (str): Benchmark name (e.g., “tpch”, “tpcds”)
scale_factor (float): Scale factor value
Returns: Formatted benchmark name (e.g., “tpch_sf1”, “tpcds_sf01”)
Examples:
format_benchmark_name("tpch", 1.0) # "tpch_sf1"
format_benchmark_name("tpcds", 0.1) # "tpcds_sf01"
format_benchmark_name("ssb", 10.0) # "ssb_sf10"
- format_data_directory(benchmark_name, scale_factor)[source]¶
Format data directory name with scale factor.
- Parameters:
benchmark_name (str) – Name of the benchmark
scale_factor (float) – Scale factor value
- Returns:
Formatted directory name (e.g., “tpch_sf1_data”, “tpcds_sf01_data”)
- Return type:
str
Signature:
format_data_directory(benchmark_name: str, scale_factor: float) -> str
Parameters:
benchmark_name (str): Benchmark name
scale_factor (float): Scale factor value
Returns: Formatted directory name (e.g., “tpch_sf1_data”, “tpcds_sf01_data”)
Examples:
format_data_directory("tpch", 1.0) # "tpch_sf1_data"
format_data_directory("tpcds", 0.1) # "tpcds_sf01_data"
format_data_directory("ssb", 10.0) # "ssb_sf10_data"
- format_schema_name(benchmark_name, scale_factor)[source]¶
Format database schema name with scale factor.
- Parameters:
benchmark_name (str) – Name of the benchmark
scale_factor (float) – Scale factor value
- Returns:
Formatted schema name (e.g., “tpch_sf1”, “tpcds_sf01”)
- Return type:
str
Signature:
format_schema_name(benchmark_name: str, scale_factor: float) -> str
Parameters:
benchmark_name (str): Benchmark name
scale_factor (float): Scale factor value
Returns: Formatted schema name (e.g., “tpch_sf1”, “tpcds_sf01”)
Examples:
format_schema_name("tpch", 1.0) # "tpch_sf1"
format_schema_name("tpcds", 0.1) # "tpcds_sf01"
format_schema_name("ssb", 10.0) # "ssb_sf10"
Usage Examples¶
Consistent File Naming¶
from pathlib import Path
from benchbox.utils.scale_factor import format_data_directory
benchmark = "tpch"
scale_factor = 1.0
# Create data directory with consistent naming
data_dir = Path("data") / format_data_directory(benchmark, scale_factor)
data_dir.mkdir(parents=True, exist_ok=True)
print(f"Data directory: {data_dir}")
# Output: data/tpch_sf1_data
Database Schema Naming¶
from benchbox.utils.scale_factor import format_schema_name
def create_benchmark_schema(conn, benchmark, scale_factor):
"""Create database schema with consistent naming."""
schema_name = format_schema_name(benchmark, scale_factor)
conn.execute(f"CREATE SCHEMA IF NOT EXISTS {schema_name}")
conn.execute(f"USE SCHEMA {schema_name}")
print(f"Created schema: {schema_name}")
create_benchmark_schema(conn, "tpch", 1.0)
# Output: Created schema: tpch_sf1
Result File Naming¶
import json
from benchbox.utils.scale_factor import format_benchmark_name
def save_results(results, benchmark, scale_factor):
"""Save results with consistent naming."""
name = format_benchmark_name(benchmark, scale_factor)
filename = f"results_{name}.json"
with open(filename, "w") as f:
json.dump(results, f, indent=2)
print(f"Saved results to: {filename}")
save_results(benchmark_results, "tpcds", 0.1)
# Output: Saved results to: results_tpcds_sf01.json
Dependency Validation Utilities¶
Utilities for validating BenchBox dependency definitions.
Overview¶
The dependency validation utilities verify that all declared dependencies in pyproject.toml have corresponding locked versions in uv.lock that satisfy the declared specifiers. This ensures dependency consistency and helps catch dependency issues early.
Key Features:
Validate core dependencies
Validate optional dependencies (extras)
Build compatibility matrix
Python version compatibility checking
CLI tool for CI/CD integration
Quick Start¶
from pathlib import Path
from benchbox.utils.dependency_validation import (
_load_toml,
validate_dependency_versions
)
# Load dependency files
pyproject_data = _load_toml(Path("pyproject.toml"))
lock_data = _load_toml(Path("uv.lock"))
# Validate dependencies
problems = validate_dependency_versions(pyproject_data, lock_data)
if problems:
print("❌ Dependency validation failed:")
for problem in problems:
print(f" - {problem}")
else:
print("✅ All dependencies validated successfully")
API Reference¶
- validate_dependency_versions(pyproject_data, lock_data)[source]¶
Validate that every declared dependency has a satisfying locked version.
Returns a list of problems discovered (empty list indicates success).
Signature:
validate_dependency_versions(
pyproject_data: Mapping[str, object],
lock_data: Mapping[str, object]
) -> list[str]
Parameters:
pyproject_data (Mapping): Parsed pyproject.toml data
lock_data (Mapping): Parsed uv.lock data
Returns: List of problems (empty list indicates success)
Example:
from benchbox.utils.dependency_validation import (
_load_toml,
validate_dependency_versions
)
from pathlib import Path
pyproject = _load_toml(Path("pyproject.toml"))
lock = _load_toml(Path("uv.lock"))
problems = validate_dependency_versions(pyproject, lock)
if not problems:
print("✅ All dependencies valid")
else:
for problem in problems:
print(f"❌ {problem}")
- build_matrix_summary(pyproject_data, lock_data)[source]¶
Return a summary dictionary used for documentation and CLI output.
Signature:
build_matrix_summary(
pyproject_data: Mapping[str, object],
lock_data: Mapping[str, object]
) -> dict[str, object]
Parameters:
pyproject_data (Mapping): Parsed pyproject.toml data
lock_data (Mapping): Parsed uv.lock data
Returns: Summary dictionary with Python compatibility and extras
Example:
from benchbox.utils.dependency_validation import (
_load_toml,
build_matrix_summary
)
from pathlib import Path
pyproject = _load_toml(Path("pyproject.toml"))
lock = _load_toml(Path("uv.lock"))
matrix = build_matrix_summary(pyproject, lock)
print(f"Python requires: {matrix['python_requires']}")
print(f"Optional groups: {list(matrix['optional_dependencies'].keys())}")
CLI Tool¶
The dependency validation utilities include a CLI tool for CI/CD integration:
# Validate dependencies
python -m benchbox.utils.dependency_validation
# Display compatibility matrix
python -m benchbox.utils.dependency_validation --matrix
# Use custom paths
python -m benchbox.utils.dependency_validation \
--pyproject path/to/pyproject.toml \
--lock path/to/uv.lock
Exit Codes:
0: All dependencies validated successfully1: Validation failed (missing or incompatible dependencies)
Usage Examples¶
CI/CD Integration¶
# ci_check_dependencies.py
import sys
from pathlib import Path
from benchbox.utils.dependency_validation import (
_load_toml,
validate_dependency_versions
)
def check_dependencies():
"""Validate dependencies in CI/CD pipeline."""
try:
pyproject = _load_toml(Path("pyproject.toml"))
lock = _load_toml(Path("uv.lock"))
problems = validate_dependency_versions(pyproject, lock)
if problems:
print("❌ Dependency validation failed:")
for problem in problems:
print(f" {problem}")
return 1
else:
print("✅ All dependencies valid")
return 0
except Exception as e:
print(f"❌ Error: {e}")
return 1
if __name__ == "__main__":
sys.exit(check_dependencies())
Pre-commit Hook¶
#!/bin/bash
# .git/hooks/pre-commit
echo "Validating dependencies..."
python -m benchbox.utils.dependency_validation
if [ $? -ne 0 ]; then
echo "❌ Dependency validation failed. Commit aborted."
exit 1
fi
echo "✅ Dependencies validated"
Documentation Generation¶
from benchbox.utils.dependency_validation import (
_load_toml,
build_matrix_summary
)
from pathlib import Path
def generate_dependency_docs():
"""Generate dependency documentation."""
pyproject = _load_toml(Path("pyproject.toml"))
lock = _load_toml(Path("uv.lock"))
matrix = build_matrix_summary(pyproject, lock)
print("# Dependency Information\n")
print(f"Python: {matrix['python_requires']}\n")
if matrix['optional_dependencies']:
print("## Optional Dependencies\n")
for extra, deps in matrix['optional_dependencies'].items():
print(f"### {extra}")
for dep in deps:
print(f"- {dep}")
print()
generate_dependency_docs()
System Information Utilities¶
Utilities for collecting system and hardware information.
Overview¶
The system information utilities provide standardized access to system, CPU, and memory information. This is useful for documenting benchmark environments and tracking system resources.
Key Features:
System information (OS, architecture, hostname)
CPU information (model, cores, usage)
Memory information (total, available, used)
Python version tracking
Dataclass-based API
Quick Start¶
from benchbox.utils.system_info import get_system_info
# Get system information
info = get_system_info()
print(f"OS: {info.os_name} {info.os_version}")
print(f"CPU: {info.cpu_model} ({info.cpu_cores} cores)")
print(f"Memory: {info.total_memory_gb:.1f} GB total, "
f"{info.available_memory_gb:.1f} GB available")
print(f"Python: {info.python_version}")
API Reference¶
SystemInfo Class¶
- class SystemInfo(os_name, os_version, architecture, cpu_model, cpu_cores, total_memory_gb, available_memory_gb, python_version, hostname)[source]¶
Bases:
objectCore system information dataclass.
- os_name: str¶
- os_version: str¶
- architecture: str¶
- cpu_model: str¶
- cpu_cores: int¶
- total_memory_gb: float¶
- available_memory_gb: float¶
- python_version: str¶
- hostname: str¶
- __init__(os_name, os_version, architecture, cpu_model, cpu_cores, total_memory_gb, available_memory_gb, python_version, hostname)¶
Fields:
os_name (str): Operating system name
os_version (str): Operating system version
architecture (str): System architecture
cpu_model (str): CPU model name
cpu_cores (int): Number of CPU cores
total_memory_gb (float): Total memory in GB
available_memory_gb (float): Available memory in GB
python_version (str): Python version
hostname (str): System hostname
- to_dict() dict¶
Convert to dictionary for compatibility.
Signature:
get_system_info() -> SystemInfo
Returns: SystemInfo dataclass with current system information
Example:
from benchbox.utils.system_info import get_system_info
info = get_system_info()
print(f"Running on {info.os_name} {info.os_version}")
print(f"CPU: {info.cpu_model}")
print(f"Cores: {info.cpu_cores}")
print(f"Memory: {info.total_memory_gb:.1f} GB")
Signature:
get_memory_info() -> dict[str, float]
Returns: Dictionary with memory information
Keys:
total_gb: Total memory in GBavailable_gb: Available memory in GBused_gb: Used memory in GBpercent_used: Memory usage percentage
Example:
from benchbox.utils.system_info import get_memory_info
memory = get_memory_info()
print(f"Memory: {memory['used_gb']:.1f} GB / {memory['total_gb']:.1f} GB "
f"({memory['percent_used']:.1f}%)")
Signature:
get_cpu_info() -> dict[str, Any]
Returns: Dictionary with CPU information
Keys:
logical_cores: Number of logical coresphysical_cores: Number of physical corescurrent_usage_percent: Current CPU usage percentageper_core_usage: Per-core usage percentages (list)model: CPU model name
Example:
from benchbox.utils.system_info import get_cpu_info
cpu = get_cpu_info()
print(f"CPU: {cpu['model']}")
print(f"Cores: {cpu['physical_cores']} physical, {cpu['logical_cores']} logical")
print(f"Usage: {cpu['current_usage_percent']:.1f}%")
Usage Examples¶
Benchmark Environment Documentation¶
import json
from benchbox.utils.system_info import get_system_info
def document_environment(benchmark_results):
"""Add system information to benchmark results."""
info = get_system_info()
benchmark_results["environment"] = {
"os": f"{info.os_name} {info.os_version}",
"architecture": info.architecture,
"cpu": info.cpu_model,
"cpu_cores": info.cpu_cores,
"memory_gb": info.total_memory_gb,
"python_version": info.python_version,
"hostname": info.hostname
}
return benchmark_results
results = run_benchmark()
results = document_environment(results)
with open("results.json", "w") as f:
json.dump(results, f, indent=2)
Resource Monitoring¶
import time
from benchbox.utils.system_info import get_memory_info, get_cpu_info
def monitor_resources(duration_seconds=60):
"""Monitor system resources during benchmark."""
samples = []
end_time = time.time() + duration_seconds
while time.time() < end_time:
memory = get_memory_info()
cpu = get_cpu_info()
samples.append({
"timestamp": time.time(),
"memory_used_gb": memory["used_gb"],
"memory_percent": memory["percent_used"],
"cpu_percent": cpu["current_usage_percent"]
})
time.sleep(1)
return samples
# Monitor during benchmark
samples = monitor_resources(duration_seconds=30)
# Calculate statistics
avg_memory = sum(s["memory_used_gb"] for s in samples) / len(samples)
avg_cpu = sum(s["cpu_percent"] for s in samples) / len(samples)
print(f"Average memory: {avg_memory:.1f} GB")
print(f"Average CPU: {avg_cpu:.1f}%")
System Requirements Check¶
from benchbox.utils.system_info import get_system_info, get_memory_info
def check_system_requirements(min_memory_gb=8, min_cores=4):
"""Check if system meets benchmark requirements."""
info = get_system_info()
memory = get_memory_info()
issues = []
if info.total_memory_gb < min_memory_gb:
issues.append(
f"Insufficient memory: {info.total_memory_gb:.1f} GB "
f"(minimum: {min_memory_gb} GB)"
)
if info.cpu_cores < min_cores:
issues.append(
f"Insufficient CPU cores: {info.cpu_cores} "
f"(minimum: {min_cores})"
)
if memory["available_gb"] < min_memory_gb * 0.8:
issues.append(
f"Low available memory: {memory['available_gb']:.1f} GB"
)
if issues:
print("⚠️ System requirements not met:")
for issue in issues:
print(f" - {issue}")
return False
else:
print("✅ System requirements met")
return True
check_system_requirements(min_memory_gb=16, min_cores=8)
Best Practices¶
Scale Factor Utilities¶
Use Consistent Formatting: Always use utility functions for naming
# Good: Consistent naming from benchbox.utils.scale_factor import format_data_directory data_dir = format_data_directory("tpch", 1.0) # "tpch_sf1_data" # Avoid: Manual formatting data_dir = f"tpch_{1.0}_data" # Inconsistent
Apply to All Artifacts: Use for files, directories, schemas, results
Dependency Validation¶
Validate in CI/CD: Run validation in continuous integration
# .github/workflows/test.yml - name: Validate dependencies run: python -m benchbox.utils.dependency_validation
Run Before Releases: Ensure dependencies are valid before releasing
System Information¶
Document Benchmarks: Always include system info in benchmark results
info = get_system_info() results["environment"] = info.to_dict()
Check Requirements: Verify system meets requirements before benchmarking
See Also¶
Data Validation Utilities API - Data validation utilities
Performance Monitoring Utilities API - Performance monitoring utilities
Utility Functions API - Core utilities (dialect translation)
/DEVELOPMENT - Development guide