Cloud Sorted Ingestion¶
BenchBox supports an opt-in sorted-ingestion strategy for cloud data warehouse runs so you can compare:
Deep sorted ingestion
Platform-native tuning (for example clustering or Z-ORDER)
Baseline runs with no sorted-ingestion rewrite
Strategy Controls¶
Use unified tuning platform_optimizations:
platform_optimizations:
sorted_ingestion_mode: force # off | auto | force
sorted_ingestion_method: ctas # auto | ctas | z_order | liquid_clustering | vacuum_sort
sorted_ingestion_mode:
off: disable sorted ingestionauto: enable only when the platform capability matrix supports a methodforce: require sorted ingestion, fail if unsupported
sorted_ingestion_method:
auto: choose platform defaultctas: use CTAS rewrite flow where availablez_order: Databricks Z-ORDER pathliquid_clustering: Databricks liquid clustering pathvacuum_sort: Redshift vacuum sort path
CLI Overrides¶
You can override strategy at runtime:
benchbox run \
--platform snowflake \
--benchmark tpch \
--scale 1 \
--tuning ./unified_tuning.yaml \
--sorted-ingestion-mode force \
--sorted-ingestion-method ctas
Comparison Workflow¶
Run baseline:
benchbox run --platform snowflake --benchmark tpch --scale 1 --tuning notuning
Run platform-native tuning:
benchbox run --platform snowflake --benchmark tpch --scale 1 --tuning ./snowflake-native.yaml
Run deep-sorted ingestion:
benchbox run --platform snowflake --benchmark tpch --scale 1 --tuning ./snowflake-sorted.yaml
Compare result files:
benchbox compare <baseline.json> <native.json> <sorted.json>
Platform Notes¶
Snowflake: supports
ctassorted-ingestion path when mode is enabled.Databricks: supports
ctas,z_order, andliquid_clusteringstrategy methods.Redshift: supports
ctasandvacuum_sort.Amazon Athena and Azure Synapse Analytics: support
ctasmethod.BigQuery: forcing sorted ingestion currently returns an actionable error for CTAS-hook execution path.
ClickHouse Cloud: post-load sorted ingestion is unsupported; use table
ORDER BYat create time.
Result Metadata¶
BenchBox writes sorted-ingestion metadata into execution_metadata.sorted_ingestion:
Configured mode/method
Resolved mode/method
Platform capability details
Applied table list/count
Total sorted-ingestion apply time