Additional Utilities API¶

Complete Python API reference for additional BenchBox utilities.

Overview¶

BenchBox provides several focused utility modules for common tasks: scale factor formatting, dependency validation, and system information collection. These utilities enable consistent naming, dependency management, and environment documentation.

Utilities Covered:

Scale Factor Formatting: Consistent naming for files, directories, and schemas
Dependency Validation: Verify dependencies match lock file requirements
System Information: Collect system and hardware information

Scale Factor Utilities¶

Utilities for consistent scale factor formatting across BenchBox.

Overview¶

The scale factor utilities provide standardized formatting for scale factors in filenames, directory names, and schema names. This ensures consistency across the framework.

Formatting Rules:

Values >= 1: No leading zero (sf1, sf10, sf100)
Values < 1: Leading zero + decimal digits (sf01, sf001, sf0001)
Non-integer values >= 1: Remove decimal point (sf15 for 1.5)

Quick Start¶

from benchbox.utils.scale_factor import (
    format_scale_factor,
    format_benchmark_name,
    format_data_directory,
    format_schema_name
)

# Format scale factors
print(format_scale_factor(1.0))    # "sf1"
print(format_scale_factor(0.1))    # "sf01"
print(format_scale_factor(0.01))   # "sf001"
print(format_scale_factor(10.0))   # "sf10"

# Format names
print(format_benchmark_name("tpch", 1.0))      # "tpch_sf1"
print(format_data_directory("tpcds", 0.1))     # "tpcds_sf01_data"
print(format_schema_name("ssb", 10.0))         # "ssb_sf10"

API Reference¶

format_scale_factor(scale_factor)[source]¶

Format scale factor for filenames and identifiers.

Rules: - Values >= 1: No leading zero (sf1, sf10, sf100) - Values < 1: Leading zero + decimal digits (sf01, sf001, sf0001) - Leading zero implies the value is < 1

Examples: - 1.0 -> sf1 (no leading zero) - 0.1 -> sf01 (leading zero implies < 1) - 0.01 -> sf001 (leading zero implies < 1) - 0.001 -> sf0001 (leading zero implies < 1) - 10 -> sf10 (no leading zero) - 1.5 -> sf15 (remove decimal point, no leading zero)

Parameters:: scale_factor (float) – The scale factor to format
Returns:: Formatted scale factor string (e.g., “sf1”, “sf01”, “sf001”)
Return type:: str

Signature:

format_scale_factor(scale_factor: float) -> str

Parameters:

scale_factor (float): Scale factor value

Returns: Formatted scale factor string (e.g., “sf1”, “sf01”, “sf001”)

Examples:

# Integer values >= 1
format_scale_factor(1.0)    # "sf1"
format_scale_factor(10.0)   # "sf10"
format_scale_factor(100.0)  # "sf100"

# Decimal values < 1
format_scale_factor(0.1)    # "sf01"
format_scale_factor(0.01)   # "sf001"
format_scale_factor(0.001)  # "sf0001"

# Non-integer values >= 1
format_scale_factor(1.5)    # "sf15"
format_scale_factor(2.25)   # "sf225"

format_benchmark_name(benchmark_name, scale_factor)[source]¶

Format benchmark name with scale factor.

Parameters:

benchmark_name (str) – Name of the benchmark (e.g., “tpch”, “tpcds”)
scale_factor (float) – Scale factor value

Returns:

Formatted benchmark name (e.g., “tpch_sf1”, “tpcds_sf01”)

Return type:

str

Signature:

format_benchmark_name(benchmark_name: str, scale_factor: float) -> str

Parameters:

benchmark_name (str): Benchmark name (e.g., “tpch”, “tpcds”)
scale_factor (float): Scale factor value

Returns: Formatted benchmark name (e.g., “tpch_sf1”, “tpcds_sf01”)

Examples:

format_benchmark_name("tpch", 1.0)    # "tpch_sf1"
format_benchmark_name("tpcds", 0.1)   # "tpcds_sf01"
format_benchmark_name("ssb", 10.0)    # "ssb_sf10"

format_data_directory(benchmark_name, scale_factor)[source]¶

Format data directory name with scale factor.

Parameters:

benchmark_name (str) – Name of the benchmark
scale_factor (float) – Scale factor value

Returns:

Formatted directory name (e.g., “tpch_sf1_data”, “tpcds_sf01_data”)

Return type:

str

Signature:

format_data_directory(benchmark_name: str, scale_factor: float) -> str

Parameters:

benchmark_name (str): Benchmark name
scale_factor (float): Scale factor value

Returns: Formatted directory name (e.g., “tpch_sf1_data”, “tpcds_sf01_data”)

Examples:

format_data_directory("tpch", 1.0)    # "tpch_sf1_data"
format_data_directory("tpcds", 0.1)   # "tpcds_sf01_data"
format_data_directory("ssb", 10.0)    # "ssb_sf10_data"

format_schema_name(benchmark_name, scale_factor)[source]¶

Format database schema name with scale factor.

Parameters:

benchmark_name (str) – Name of the benchmark
scale_factor (float) – Scale factor value

Returns:

Formatted schema name (e.g., “tpch_sf1”, “tpcds_sf01”)

Return type:

str

Signature:

format_schema_name(benchmark_name: str, scale_factor: float) -> str

Parameters:

benchmark_name (str): Benchmark name
scale_factor (float): Scale factor value

Returns: Formatted schema name (e.g., “tpch_sf1”, “tpcds_sf01”)

Examples:

format_schema_name("tpch", 1.0)    # "tpch_sf1"
format_schema_name("tpcds", 0.1)   # "tpcds_sf01"
format_schema_name("ssb", 10.0)    # "ssb_sf10"

Usage Examples¶

Consistent File Naming¶

from pathlib import Path
from benchbox.utils.scale_factor import format_data_directory

benchmark = "tpch"
scale_factor = 1.0

# Create data directory with consistent naming
data_dir = Path("data") / format_data_directory(benchmark, scale_factor)
data_dir.mkdir(parents=True, exist_ok=True)

print(f"Data directory: {data_dir}")
# Output: data/tpch_sf1_data

Database Schema Naming¶

from benchbox.utils.scale_factor import format_schema_name

def create_benchmark_schema(conn, benchmark, scale_factor):
    """Create database schema with consistent naming."""
    schema_name = format_schema_name(benchmark, scale_factor)

    conn.execute(f"CREATE SCHEMA IF NOT EXISTS {schema_name}")
    conn.execute(f"USE SCHEMA {schema_name}")

    print(f"Created schema: {schema_name}")

create_benchmark_schema(conn, "tpch", 1.0)
# Output: Created schema: tpch_sf1

Result File Naming¶

import json
from benchbox.utils.scale_factor import format_benchmark_name

def save_results(results, benchmark, scale_factor):
    """Save results with consistent naming."""
    name = format_benchmark_name(benchmark, scale_factor)
    filename = f"results_{name}.json"

    with open(filename, "w") as f:
        json.dump(results, f, indent=2)

    print(f"Saved results to: {filename}")

save_results(benchmark_results, "tpcds", 0.1)
# Output: Saved results to: results_tpcds_sf01.json

Dependency Validation Utilities¶

Utilities for validating BenchBox dependency definitions.

Overview¶

The dependency validation utilities verify that all declared dependencies in pyproject.toml have corresponding locked versions in uv.lock that satisfy the declared specifiers. This ensures dependency consistency and helps catch dependency issues early.

Key Features:

Validate core dependencies
Validate optional dependencies (extras)
Build compatibility matrix
Python version compatibility checking
CLI tool for CI/CD integration

Quick Start¶

from pathlib import Path
from benchbox.utils.dependency_validation import (
    _load_toml,
    validate_dependency_versions
)

# Load dependency files
pyproject_data = _load_toml(Path("pyproject.toml"))
lock_data = _load_toml(Path("uv.lock"))

# Validate dependencies
problems = validate_dependency_versions(pyproject_data, lock_data)

if problems:
    print("❌ Dependency validation failed:")
    for problem in problems:
        print(f"  - {problem}")
else:
    print("✅ All dependencies validated successfully")

API Reference¶

validate_dependency_versions(pyproject_data, lock_data)[source]¶

Validate that every declared dependency has a satisfying locked version.

Returns a list of problems discovered (empty list indicates success).

Signature:

validate_dependency_versions(
    pyproject_data: Mapping[str, object],
    lock_data: Mapping[str, object]
) -> list[str]

Parameters:

pyproject_data (Mapping): Parsed pyproject.toml data
lock_data (Mapping): Parsed uv.lock data

Returns: List of problems (empty list indicates success)

Example:

from benchbox.utils.dependency_validation import (
    _load_toml,
    validate_dependency_versions
)
from pathlib import Path

pyproject = _load_toml(Path("pyproject.toml"))
lock = _load_toml(Path("uv.lock"))

problems = validate_dependency_versions(pyproject, lock)

if not problems:
    print("✅ All dependencies valid")
else:
    for problem in problems:
        print(f"❌ {problem}")

build_matrix_summary(pyproject_data, lock_data)[source]¶

Return a summary dictionary used for documentation and CLI output.

Signature:

build_matrix_summary(
    pyproject_data: Mapping[str, object],
    lock_data: Mapping[str, object]
) -> dict[str, object]

Parameters:

pyproject_data (Mapping): Parsed pyproject.toml data
lock_data (Mapping): Parsed uv.lock data

Returns: Summary dictionary with Python compatibility and extras

Example:

from benchbox.utils.dependency_validation import (
    _load_toml,
    build_matrix_summary
)
from pathlib import Path

pyproject = _load_toml(Path("pyproject.toml"))
lock = _load_toml(Path("uv.lock"))

matrix = build_matrix_summary(pyproject, lock)

print(f"Python requires: {matrix['python_requires']}")
print(f"Optional groups: {list(matrix['optional_dependencies'].keys())}")

CLI Tool¶

The dependency validation utilities include a CLI tool for CI/CD integration:

# Validate dependencies
python -m benchbox.utils.dependency_validation

# Display compatibility matrix
python -m benchbox.utils.dependency_validation --matrix

# Use custom paths
python -m benchbox.utils.dependency_validation \
    --pyproject path/to/pyproject.toml \
    --lock path/to/uv.lock

Exit Codes:

0: All dependencies validated successfully
1: Validation failed (missing or incompatible dependencies)

Usage Examples¶

CI/CD Integration¶

# ci_check_dependencies.py
import sys
from pathlib import Path
from benchbox.utils.dependency_validation import (
    _load_toml,
    validate_dependency_versions
)

def check_dependencies():
    """Validate dependencies in CI/CD pipeline."""
    try:
        pyproject = _load_toml(Path("pyproject.toml"))
        lock = _load_toml(Path("uv.lock"))

        problems = validate_dependency_versions(pyproject, lock)

        if problems:
            print("❌ Dependency validation failed:")
            for problem in problems:
                print(f"  {problem}")
            return 1
        else:
            print("✅ All dependencies valid")
            return 0

    except Exception as e:
        print(f"❌ Error: {e}")
        return 1

if __name__ == "__main__":
    sys.exit(check_dependencies())

Pre-commit Hook¶

#!/bin/bash
# .git/hooks/pre-commit

echo "Validating dependencies..."
python -m benchbox.utils.dependency_validation

if [ $? -ne 0 ]; then
    echo "❌ Dependency validation failed. Commit aborted."
    exit 1
fi

echo "✅ Dependencies validated"

Documentation Generation¶

from benchbox.utils.dependency_validation import (
    _load_toml,
    build_matrix_summary
)
from pathlib import Path

def generate_dependency_docs():
    """Generate dependency documentation."""
    pyproject = _load_toml(Path("pyproject.toml"))
    lock = _load_toml(Path("uv.lock"))

    matrix = build_matrix_summary(pyproject, lock)

    print("# Dependency Information\n")
    print(f"Python: {matrix['python_requires']}\n")

    if matrix['optional_dependencies']:
        print("## Optional Dependencies\n")
        for extra, deps in matrix['optional_dependencies'].items():
            print(f"### {extra}")
            for dep in deps:
                print(f"- {dep}")
            print()

generate_dependency_docs()

System Information Utilities¶

Utilities for collecting system and hardware information.

Overview¶

The system information utilities provide standardized access to system, CPU, and memory information. This is useful for documenting benchmark environments and tracking system resources.

Key Features:

System information (OS, architecture, hostname)
CPU information (model, cores, usage)
Memory information (total, available, used)
Python version tracking
Dataclass-based API

Quick Start¶

from benchbox.utils.system_info import get_system_info

# Get system information
info = get_system_info()

print(f"OS: {info.os_name} {info.os_version}")
print(f"CPU: {info.cpu_model} ({info.cpu_cores} cores)")
print(f"Memory: {info.total_memory_gb:.1f} GB total, "
      f"{info.available_memory_gb:.1f} GB available")
print(f"Python: {info.python_version}")

API Reference¶

SystemInfo Class¶

class SystemInfo(os_name, os_version, architecture, cpu_model, cpu_cores, total_memory_gb, available_memory_gb, python_version, hostname)[source]¶

Bases: object

Core system information dataclass.

os_name: str¶

os_version: str¶

architecture: str¶

cpu_model: str¶

cpu_cores: int¶

total_memory_gb: float¶

available_memory_gb: float¶

python_version: str¶

hostname: str¶

to_dict()[source]¶

Convert to dictionary for compatibility.

__init__(os_name, os_version, architecture, cpu_model, cpu_cores, total_memory_gb, available_memory_gb, python_version, hostname)¶

Fields:

os_name (str): Operating system name
os_version (str): Operating system version
architecture (str): System architecture
cpu_model (str): CPU model name
cpu_cores (int): Number of CPU cores
total_memory_gb (float): Total memory in GB
available_memory_gb (float): Available memory in GB
python_version (str): Python version
hostname (str): System hostname

to_dict() → dict¶: Convert to dictionary for compatibility.

get_system_info()[source]¶

Get current system information.

Signature:

get_system_info() -> SystemInfo

Returns: SystemInfo dataclass with current system information

Example:

from benchbox.utils.system_info import get_system_info

info = get_system_info()
print(f"Running on {info.os_name} {info.os_version}")
print(f"CPU: {info.cpu_model}")
print(f"Cores: {info.cpu_cores}")
print(f"Memory: {info.total_memory_gb:.1f} GB")

get_memory_info()[source]¶

Get current memory usage information.

Signature:

get_memory_info() -> dict[str, float]

Returns: Dictionary with memory information

Keys:

total_gb: Total memory in GB
available_gb: Available memory in GB
used_gb: Used memory in GB
percent_used: Memory usage percentage

Example:

from benchbox.utils.system_info import get_memory_info

memory = get_memory_info()
print(f"Memory: {memory['used_gb']:.1f} GB / {memory['total_gb']:.1f} GB "
      f"({memory['percent_used']:.1f}%)")

get_cpu_info()[source]¶

Get CPU information and current usage.

Signature:

get_cpu_info() -> dict[str, Any]

Returns: Dictionary with CPU information

Keys:

logical_cores: Number of logical cores
physical_cores: Number of physical cores
current_usage_percent: Current CPU usage percentage
per_core_usage: Per-core usage percentages (list)
model: CPU model name

Example:

from benchbox.utils.system_info import get_cpu_info

cpu = get_cpu_info()
print(f"CPU: {cpu['model']}")
print(f"Cores: {cpu['physical_cores']} physical, {cpu['logical_cores']} logical")
print(f"Usage: {cpu['current_usage_percent']:.1f}%")

Usage Examples¶

Benchmark Environment Documentation¶

import json
from benchbox.utils.system_info import get_system_info

def document_environment(benchmark_results):
    """Add system information to benchmark results."""
    info = get_system_info()

    benchmark_results["environment"] = {
        "os": f"{info.os_name} {info.os_version}",
        "architecture": info.architecture,
        "cpu": info.cpu_model,
        "cpu_cores": info.cpu_cores,
        "memory_gb": info.total_memory_gb,
        "python_version": info.python_version,
        "hostname": info.hostname
    }

    return benchmark_results

results = run_benchmark()
results = document_environment(results)

with open("results.json", "w") as f:
    json.dump(results, f, indent=2)

Resource Monitoring¶

import time
from benchbox.utils.system_info import get_memory_info, get_cpu_info

def monitor_resources(duration_seconds=60):
    """Monitor system resources during benchmark."""
    samples = []
    end_time = time.time() + duration_seconds

    while time.time() < end_time:
        memory = get_memory_info()
        cpu = get_cpu_info()

        samples.append({
            "timestamp": time.time(),
            "memory_used_gb": memory["used_gb"],
            "memory_percent": memory["percent_used"],
            "cpu_percent": cpu["current_usage_percent"]
        })

        time.sleep(1)

    return samples

# Monitor during benchmark
samples = monitor_resources(duration_seconds=30)

# Calculate statistics
avg_memory = sum(s["memory_used_gb"] for s in samples) / len(samples)
avg_cpu = sum(s["cpu_percent"] for s in samples) / len(samples)

print(f"Average memory: {avg_memory:.1f} GB")
print(f"Average CPU: {avg_cpu:.1f}%")

System Requirements Check¶

from benchbox.utils.system_info import get_system_info, get_memory_info

def check_system_requirements(min_memory_gb=8, min_cores=4):
    """Check if system meets benchmark requirements."""
    info = get_system_info()
    memory = get_memory_info()

    issues = []

    if info.total_memory_gb < min_memory_gb:
        issues.append(
            f"Insufficient memory: {info.total_memory_gb:.1f} GB "
            f"(minimum: {min_memory_gb} GB)"
        )

    if info.cpu_cores < min_cores:
        issues.append(
            f"Insufficient CPU cores: {info.cpu_cores} "
            f"(minimum: {min_cores})"
        )

    if memory["available_gb"] < min_memory_gb * 0.8:
        issues.append(
            f"Low available memory: {memory['available_gb']:.1f} GB"
        )

    if issues:
        print("⚠️  System requirements not met:")
        for issue in issues:
            print(f"  - {issue}")
        return False
    else:
        print("✅ System requirements met")
        return True

check_system_requirements(min_memory_gb=16, min_cores=8)

Best Practices¶

Scale Factor Utilities¶

Use Consistent Formatting: Always use utility functions for naming

# Good: Consistent naming
from benchbox.utils.scale_factor import format_data_directory
data_dir = format_data_directory("tpch", 1.0)  # "tpch_sf1_data"

# Avoid: Manual formatting
data_dir = f"tpch_{1.0}_data"  # Inconsistent

Apply to All Artifacts: Use for files, directories, schemas, results

Dependency Validation¶

Validate in CI/CD: Run validation in continuous integration

# .github/workflows/test.yml
- name: Validate dependencies
  run: python -m benchbox.utils.dependency_validation

Run Before Releases: Ensure dependencies are valid before releasing

System Information¶

Document Benchmarks: Always include system info in benchmark results

info = get_system_info()
results["environment"] = info.to_dict()

Check Requirements: Verify system meets requirements before benchmarking

Additional Utilities API¶

Overview¶

Scale Factor Utilities¶

Overview¶

Quick Start¶

API Reference¶

Usage Examples¶

Consistent File Naming¶

Database Schema Naming¶

Result File Naming¶

Dependency Validation Utilities¶

Overview¶

Quick Start¶

API Reference¶

CLI Tool¶

Usage Examples¶

CI/CD Integration¶

Pre-commit Hook¶

Documentation Generation¶

System Information Utilities¶

Overview¶

Quick Start¶

API Reference¶

SystemInfo Class¶

Usage Examples¶

Benchmark Environment Documentation¶

Resource Monitoring¶

System Requirements Check¶

Best Practices¶

Scale Factor Utilities¶

Dependency Validation¶

System Information¶

See Also¶