Databend Platform¶
Databend is a cloud-native, Rust-based data warehouse with Snowflake-compatible SQL, compute/storage separation, and object storage backends (S3, GCS, Azure Blob, MinIO). BenchBox connects via the databend-driver Python package (DB-API 2.0 compliant) and uses the Snowflake dialect as a SQLGlot translation proxy for SQL compatibility.
Databend supports two deployment modes: Databend Cloud (managed service) and self-hosted (user-managed cluster with object storage). Its vectorized query engine, written in Rust, is optimized for analytical workloads with automatic micro-partitioning and clustering key support.
Features¶
Cloud-native architecture - Compute/storage separation on object storage (S3, MinIO, GCS, Azure Blob)
Snowflake-compatible SQL - Uses Snowflake dialect as SQLGlot translation proxy (~100% compatibility)
Full TPC-H support - All 22 queries with row count validation
Full TPC-DS support - All 99 queries with row count validation
Vectorized Rust engine - High-performance analytical query execution
DB-API 2.0 driver - Standard Python database connectivity via
databend-driverClustering keys -
CLUSTER BYclause for optimizing data layoutAutomatic statistics - No explicit ANALYZE needed; statistics collected by storage engine
Result cache control - Configurable query result cache (disabled by default for benchmarking)
Dual deployment - Databend Cloud (managed) or self-hosted with MinIO/S3
Quick Start¶
# Install databend-driver dependency
uv add databend-driver
# Or install via the Databend extra
uv add benchbox --extra databend
# Configure connection (Databend Cloud)
export DATABEND_HOST=tenant--warehouse.gw.databend.com
export DATABEND_USER=benchbox
export DATABEND_PASSWORD=your_password
# Run TPC-H benchmark
benchbox run --platform databend --benchmark tpch --scale 0.01
Self-Hosted Quick Start¶
# Start Databend with Docker (requires MinIO or S3-compatible storage)
docker run -p 8000:8000 datafuselabs/databend:latest
# Configure for self-hosted
export DATABEND_HOST=localhost
export DATABEND_PORT=8000
# Disable SSL for local development
benchbox run --platform databend --benchmark tpch --scale 0.01 \
--platform-option ssl=false
DSN-Based Connection¶
# Self-hosted via DSN
benchbox run --platform databend --benchmark tpch --scale 0.01 \
--platform-option dsn=databend+http://benchbox:benchbox@localhost:8000/benchbox?sslmode=disable
Configuration Options¶
Option |
CLI Argument |
Environment Variable |
Default |
Description |
|---|---|---|---|---|
|
|
|
- |
Databend host (required unless DSN is set) |
|
|
|
|
Connection port |
|
|
|
|
Database username |
|
|
|
- |
Database password |
|
|
|
|
Target database name |
|
|
|
- |
Full DSN string (overrides individual params) |
|
|
|
- |
Databend Cloud warehouse name |
|
|
- |
|
SSL/TLS for connections (disable with flag) |
|
|
- |
|
Disable query result cache for benchmarking |
DSN Format¶
The DSN (Data Source Name) follows the format:
databend+https://user:password@host:port/database?warehouse=name
databend+http://user:password@host:port/database?sslmode=disable
databend+https://is used when SSL is enabled (default for cloud)databend+http://is used when SSL is disabled (typical for self-hosted) and should includesslmode=disableSpecial characters in username/password are automatically URL-encoded
Data Loading¶
BenchBox loads data into Databend using batch INSERT statements via the databend-driver Python client. The adapter handles both TPC pipe-delimited (.tbl) and standard CSV formats automatically.
Loading Process¶
Database creation -
CREATE DATABASE IF NOT EXISTSensures target database existsSchema creation - Tables created with Snowflake-compatible DDL, optimized for Databend
Type conversion -
CHAR(n)converted toVARCHAR(n)(Databend preference); constraints removedBatch inserts - Data loaded in batches of 500 rows using
INSERT INTO ... VALUESstatementsConstraint handling - Primary keys and foreign keys are removed (Databend does not enforce them)
Type Mappings¶
Source Type |
Databend Type |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Large-Scale Loading¶
For datasets larger than SF 10, Databend supports high-throughput loading via:
COPY INTO from staged S3/MinIO files (for cloud and self-hosted with object storage)
Streaming Load via HTTP API for parallel ingestion
The current adapter uses INSERT batching for broad compatibility across both deployment modes.
Usage Examples¶
Basic Benchmarks¶
# TPC-H at scale factor 1
benchbox run --platform databend --benchmark tpch --scale 1.0
# TPC-DS at scale factor 10
benchbox run --platform databend --benchmark tpcds --scale 10.0
# Run specific queries only
benchbox run --platform databend --benchmark tpch --queries Q1,Q6,Q17
Databend Cloud Configuration¶
export DATABEND_HOST=tenant--warehouse.gw.databend.com
export DATABEND_USER=benchbox
export DATABEND_PASSWORD=your_cloud_password
export DATABEND_WAREHOUSE=my_warehouse
benchbox run --platform databend --benchmark tpch --scale 10.0
Self-Hosted Configuration¶
export DATABEND_HOST=localhost
export DATABEND_PORT=8000
# Disable SSL for local Databend
benchbox run --platform databend --benchmark tpch --scale 1.0 \
--platform-option ssl=false
Custom Database Name¶
benchbox run --platform databend --benchmark tpch --scale 1.0 \
--platform-option database=my_benchmarks
Dry Run (Preview)¶
# Preview execution plan without running
benchbox run --platform databend --benchmark tpch --scale 1.0 --dry-run ./preview
Architecture¶
Adapter Structure¶
The Databend adapter is a single-file adapter with all functionality in one class:
Module |
Class |
Responsibility |
|---|---|---|
|
|
Connection, schema, loading, queries, tuning |
|
|
Config builder with credential loading |
Connection Model¶
BenchBox CLI
|
v
DatabendAdapter
|
+-- databend-driver (DB-API 2.0)
| - BlockingDatabendClient
| - DSN-based connection (databend+http:// or databend+https://)
|
+-- Databend Cloud (HTTPS, port 443)
| - tenant--warehouse.gw.databend.com
| - Warehouse-scoped compute
|
+-- Self-hosted (HTTP, port 8000)
- Direct connection to Databend server
- Object storage backend (MinIO/S3)
SQL Translation Strategy¶
Databend claims ~100% Snowflake SQL compatibility. BenchBox leverages this by:
Translating benchmark SQL from DuckDB dialect to Snowflake dialect via SQLGlot
Sending Snowflake-compatible SQL directly to Databend
Applying edge-case optimizations in
_optimize_table_definition():CHAR(n)toVARCHAR(n)conversionPrimary key constraint removal
Foreign key constraint removal
Platform Information¶
At runtime, BenchBox captures platform metadata:
{
"platform_type": "databend",
"platform_name": "Databend",
"configuration": {
"host": "tenant--warehouse.gw.databend.com",
"database": "benchbox_tpch_sf1"
},
"warehouse": "my_warehouse",
"platform_version": "v1.x.x",
"client_library_version": "0.x.x"
}
Tuning and Optimization¶
Automatic Benchmark Configuration¶
The adapter automatically applies optimizations when running benchmarks:
Query result cache: Disabled by default (
enable_query_result_cache = 0) for accurate timingVectorized engine: Pre-optimized for analytical workloads (no additional tuning needed)
Automatic statistics: Databend collects statistics via its storage engine; no explicit
ANALYZErequired
Supported Tuning Types¶
Tuning Type |
Support |
Notes |
|---|---|---|
Clustering |
Yes |
|
Partitioning |
No |
Automatic micro-partitioning (not user-configurable) |
Distribution |
No |
Managed by storage engine |
Clustering Keys¶
Databend supports CLUSTER BY to optimize data layout for frequently queried columns, similar to Snowflake clustering keys:
-- Applied automatically via tuning configuration
ALTER TABLE lineitem CLUSTER BY (l_shipdate, l_orderkey)
Clustering keys can be specified at table creation time (via generate_tuning_clause) or applied post-creation (via apply_table_tunings).
Custom Platform Settings¶
benchbox run --platform databend --benchmark tpch --scale 10.0 \
--tuning tuned
Troubleshooting¶
Connection Refused¶
Error: Databend configuration is incomplete
Solutions:
Verify Databend is running and accessible on the configured host and port
For cloud: Set
DATABEND_HOST=tenant--warehouse.gw.databend.comFor self-hosted: Set
DATABEND_HOST=localhostandDATABEND_PORT=8000Provide either DSN, individual params, or environment variables
Missing databend-driver Dependency¶
Error: Missing dependencies for databend platform: databend-driver
Solutions:
Install databend-driver:
uv add databend-driverOr install the Databend extra:
uv add benchbox --extra databendMinimum version required:
>=0.28.0
SSL/TLS Errors¶
Error: SSL handshake failed
Solutions:
For self-hosted without TLS:
--platform-option ssl=falseor use--databend-no-sslFor cloud: SSL is required; verify certificate chain
Check that the port matches the SSL setting (443 for SSL, 8000 for non-SSL)
Schema Creation Failures¶
Error: Schema creation failed
Solutions:
Check that the user has
CREATE TABLEandCREATE DATABASEpermissionsVerify the database exists or the user has
CREATE DATABASEprivilegeIf tables already exist, the adapter will automatically drop and recreate them
Review Databend query logs for detailed error messages
Slow Data Loading¶
Solutions:
For large datasets (SF 10+), consider using Databend’s native
COPY INTOfrom staged filesEnsure adequate network bandwidth between BenchBox host and Databend
For self-hosted: verify object storage (MinIO/S3) performance
Consider running BenchBox on a machine co-located with the Databend cluster
Query Result Cache Interference¶
Warning: Query results may be cached
Solutions:
Verify
disable_result_cacheistrue(default):--platform-option disable_result_cache=trueThe adapter sets
enable_query_result_cache = 0at session levelFor Databend Cloud: check that warehouse-level caching is not interfering
See Also¶
Platform Comparison Matrix - Compare all platforms
Platform Selection Guide - Choosing the right platform
TPC-H Benchmark - TPC-H benchmark guide
TPC-DS Benchmark - TPC-DS benchmark guide
Deployment Modes Guide - Platform deployment architecture