Runtime Module Architecture Overview¶
This guide summarizes the organization of the runtime-focused modules after the modularization refactor. It highlights the primary entry points, extension boundaries, and the locations of shared utilities.
CLI package¶
benchbox/cli/app.py: constructs the root Click application and registers commands.benchbox/cli/commands/: one file per public command (run,profile,benchmarks, etc.). Each module only exports a Click command function and related helper types.benchbox/cli/presentation/: light-weight presentation helpers (currentlysystem.py).benchbox/cli/shared.py: shared console helpers (console,set_quiet_output,silence_output).benchbox/cli/main.py: thin compatibility layer that re-exports the new modules for legacy imports and tests.
Extension notes¶
New commands should live in
benchbox/cli/commands/and only import shared utilities frombenchbox.cli.sharedor other command modules.Presentation-only helpers should live under
benchbox/cli/presentation/to keep command modules focused on business logic.
TPC-DS package¶
benchbox/core/tpcds/benchmark/:config.py,results.py,phases.py, andrunner.pyimplement the benchmark orchestration.__init__.pyre-exportsTPCDSBenchmarkand data classes.benchbox/core/tpcds/generator/: mixin-based split betweenmanager.py,runner.py,streaming.py, andfilesystem.pyto keep responsibilities focused (execution, streaming chunking, file handling).benchbox/core/tpcds/schema/:models.py,tables.py, andregistry.pyprovide a declarative schema catalog that mirrors the official specification.
Extension notes¶
New benchmark behaviors should extend the relevant mixin (
runner.pyfor orchestration,streaming.pyfor table generation variations,filesystem.pyfor file/mount logic).Schema changes should modify the declarative tables in
schema/tables.pywith supporting helpers inschema/registry.py.
TPC-DI package¶
benchbox/core/tpcdi/etl/pipeline.py: containsTPCDIETLPipelinewith clear dependencies on ETL sub-modules.benchbox/core/tpcdi/etl/results.py: shared dataclasses for ETL pipeline phases.benchbox/core/tpcdi/generator/: modular generator broken intodata.py(orchestrator),dimensions.py,facts.py,manifest.py, andmonitoring.pyfor dedicated responsibilities.
Extension notes¶
New pipeline behaviors should either extend
pipeline.pyor live in the specialized modules underbenchbox/core/tpcdi/etl/(e.g.,batch,incremental_loader).Generator extensions inherit from the mixins in
benchbox/core/tpcdi/generator/to keep orchestration and domain-specific logic separated.
ClickHouse platform¶
benchbox/platforms/clickhouse/__init__.py: re-exports a modularized adapter.adapter.py: coreClickHouseAdapterthat now inherits from individual mixins.Mixins:
metadata.py,setup.py,diagnostics.py,workload.py, andtuning.pyencapsulate discrete responsibilities;_dependencies.pycentralizes optional imports;client.pycontainsClickHouseLocalClient.
Extension notes¶
Connection/setup changes should live in
setup.py; diagnostics and metadata reporting belongs indiagnostics.py/metadata.py.Workload-specific adaptations (schema creation, load pipeline, query execution) should extend
workload.py.
Validation utilities¶
benchbox/core/validation/shared/: new shared utilities (logging.py) host logic reused byDataValidatorand future validation engines.benchbox/core/validation/data.pynow delegates to the shared logging helper for consistent reporting.
Extension notes¶
Additional shared helpers should be placed in
benchbox/core/validation/shared/to keep engines thin and consistently reusable.
File size guardrails¶
A new automated test (tests/system/test_module_size_thresholds.py) watches the size of runtime-critical modules (CLI commands, benchmark orchestration, generators, and platform adapters). The guardrail enforces a default 1,200 line limit with an allowlist for modules that are intentionally larger. See the test file for configuration details.