Cloud Sorted Ingestion

BenchBox supports an opt-in sorted-ingestion strategy for cloud data warehouse runs so you can compare:

  • Deep sorted ingestion

  • Platform-native tuning (for example clustering or Z-ORDER)

  • Baseline runs with no sorted-ingestion rewrite

Strategy Controls

Use unified tuning platform_optimizations:

platform_optimizations:
  sorted_ingestion_mode: force   # off | auto | force
  sorted_ingestion_method: ctas  # auto | ctas | z_order | liquid_clustering | vacuum_sort

sorted_ingestion_mode:

  • off: disable sorted ingestion

  • auto: enable only when the platform capability matrix supports a method

  • force: require sorted ingestion, fail if unsupported

sorted_ingestion_method:

  • auto: choose platform default

  • ctas: use CTAS rewrite flow where available

  • z_order: Databricks Z-ORDER path

  • liquid_clustering: Databricks liquid clustering path

  • vacuum_sort: Redshift vacuum sort path

CLI Overrides

You can override strategy at runtime:

benchbox run \
  --platform snowflake \
  --benchmark tpch \
  --scale 1 \
  --tuning ./unified_tuning.yaml \
  --sorted-ingestion-mode force \
  --sorted-ingestion-method ctas

Comparison Workflow

  1. Run baseline:

benchbox run --platform snowflake --benchmark tpch --scale 1 --tuning notuning
  1. Run platform-native tuning:

benchbox run --platform snowflake --benchmark tpch --scale 1 --tuning ./snowflake-native.yaml
  1. Run deep-sorted ingestion:

benchbox run --platform snowflake --benchmark tpch --scale 1 --tuning ./snowflake-sorted.yaml
  1. Compare result files:

benchbox compare <baseline.json> <native.json> <sorted.json>

Platform Notes

  • Snowflake: supports ctas sorted-ingestion path when mode is enabled.

  • Databricks: supports ctas, z_order, and liquid_clustering strategy methods.

  • Redshift: supports ctas and vacuum_sort.

  • Amazon Athena and Azure Synapse Analytics: support ctas method.

  • BigQuery: forcing sorted ingestion currently returns an actionable error for CTAS-hook execution path.

  • ClickHouse Cloud: post-load sorted ingestion is unsupported; use table ORDER BY at create time.

Result Metadata

BenchBox writes sorted-ingestion metadata into execution_metadata.sorted_ingestion:

  • Configured mode/method

  • Resolved mode/method

  • Platform capability details

  • Applied table list/count

  • Total sorted-ingestion apply time