StarRocks Platform¶
StarRocks is an open-source MPP (Massively Parallel Processing) columnar OLAP database, hosted under the Linux Foundation. It is designed for sub-second analytics on large-scale datasets. BenchBox connects via the MySQL protocol using PyMySQL (port 9030) and supports Stream Load via the BE HTTP API (port 8040) for high-throughput data ingestion. SQLGlot provides official starrocks dialect support for SQL translation.
StarRocks is used in production by Airbnb, Alibaba, Coinbase, Pinterest, and Tencent, among others. It delivers competitive performance against ClickHouse, Trino, and Druid on analytical workloads.
Features¶
MySQL protocol connectivity - Standard MySQL wire protocol via PyMySQL (port 9030)
Stream Load HTTP API - High-throughput data ingestion via BE HTTP port 8040
Full TPC-H support - All 22 queries with row count validation
Full TPC-DS support - All 99 queries with row count validation
Official SQLGlot dialect - Native
starrocksdialect for accurate SQL translationColumnar storage - Duplicate Key model optimized for OLAP workloads
Distributed hash partitioning - Automatic data distribution across nodes
Tuning support - Partitioning, sorting, and distribution tuning at table creation time
Quick Start¶
# Install PyMySQL dependency
uv add pymysql
# Or install via the StarRocks extra
uv add benchbox --extra starrocks
# Configure connection (StarRocks must be running)
export STARROCKS_HOST=localhost
export STARROCKS_PORT=9030
# Run TPC-H benchmark
benchbox run --platform starrocks --benchmark tpch --scale 0.01
Docker Quick Start¶
# Start StarRocks with Docker (FE + BE)
docker run -p 9030:9030 -p 8030:8030 -p 8040:8040 \
starrocks/allin1-ubuntu:latest
# Verify connectivity
mysql -h 127.0.0.1 -P 9030 -u root -e "SELECT 1"
# Run benchmark
benchbox run --platform starrocks --benchmark tpch --scale 1.0
Configuration Options¶
Option |
Environment Variable |
Default |
Description |
|---|---|---|---|
|
|
|
StarRocks FE hostname |
|
|
|
MySQL protocol port (FE) |
|
|
|
Database username |
|
|
|
Database password |
|
|
auto-generated |
Target database name |
|
|
|
BE HTTP port for Stream Load |
|
- |
|
Query timeout in seconds |
|
- |
|
Disable query cache for accurate benchmarking |
|
- |
|
Enable strict data validation |
|
- |
|
Deployment mode identifier |
Data Loading¶
BenchBox loads data into StarRocks using batch INSERT statements via the MySQL protocol. The adapter handles both TPC pipe-delimited (.tbl) and standard CSV formats automatically.
Loading Process¶
Schema creation - Tables are created with StarRocks-optimized DDL (Duplicate Key model, hash distribution)
Type conversion - DuckDB/standard SQL types are mapped to StarRocks equivalents (e.g.,
HUGEINTtoLARGEINT,TEXTtoVARCHAR(65533))Batch inserts - Data is loaded in batches of 1,000 rows using parameterized
INSERT INTO ... VALUESstatementsConstraint handling - Foreign keys are removed (StarRocks does not enforce them); primary keys are converted to Duplicate Key model
Stream Load (HTTP API)¶
StarRocks provides a high-throughput Stream Load API on the BE HTTP port (default 8040). While the current adapter uses batch INSERT for broad compatibility, the Stream Load endpoint is available for custom ingestion workflows:
Endpoint:
http://<be_host>:8040/api/<database>/<table>/_stream_loadProtocol: HTTP PUT with CSV or JSON payload
Throughput: Significantly higher than row-by-row INSERT for large datasets
Type Mappings¶
Source Type |
StarRocks Type |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Usage Examples¶
Basic Benchmarks¶
# TPC-H at scale factor 1
benchbox run --platform starrocks --benchmark tpch --scale 1.0
# TPC-DS at scale factor 10
benchbox run --platform starrocks --benchmark tpcds --scale 10.0
# Run specific queries only
benchbox run --platform starrocks --benchmark tpch --queries Q1,Q6,Q17
Environment Variable Configuration¶
export STARROCKS_HOST=starrocks-fe.example.com
export STARROCKS_PORT=9030
export STARROCKS_USER=benchbox
export STARROCKS_PASSWORD=secret
export STARROCKS_DATABASE=benchmark_db
export STARROCKS_HTTP_PORT=8040
benchbox run --platform starrocks --benchmark tpch --scale 10.0
Custom Database Name¶
benchbox run --platform starrocks --benchmark tpch --scale 1.0 \
--platform-option database=my_benchmarks
Dry Run (Preview)¶
# Preview execution plan without running
benchbox run --platform starrocks --benchmark tpch --scale 1.0 --dry-run ./preview
Architecture¶
Adapter Structure¶
The StarRocks adapter is composed of four mixins plus the main adapter class:
Module |
Class |
Responsibility |
|---|---|---|
|
|
Main adapter entry point, data integrity validation |
|
|
Connection management, database lifecycle (create/drop) |
|
|
Schema creation, data loading, query execution |
|
|
Benchmark configuration, session settings, table tuning |
|
|
Platform info, CLI arguments, dialect selection |
Connection Model¶
BenchBox CLI
|
v
StarRocksAdapter
|
+-- MySQL Protocol (PyMySQL) --> FE (port 9030)
| - Schema DDL
| - Query execution
| - Batch INSERT loading
|
+-- HTTP API (Stream Load) --> BE (port 8040)
- High-throughput data ingestion
Platform Information¶
At runtime, BenchBox captures platform metadata:
{
"platform_type": "starrocks",
"platform_name": "StarRocks",
"deployment_mode": "self-hosted",
"configuration": {
"host": "localhost",
"port": 9030,
"database": "benchbox_tpch_sf1",
"http_port": 8040
},
"platform_version": "3.x.x",
"client_library_version": "1.x.x"
}
Tuning and Optimization¶
Automatic Benchmark Configuration¶
The adapter automatically applies session-level optimizations when running benchmarks:
Query timeout: Set via
query_timeout(default: 300 seconds)Query cache: Disabled by default (
enable_query_cache = false) for accurate timingPlanner timeout: Extended for complex OLAP queries (
new_planner_optimize_timeout = 30000)Profiling: Disabled during benchmarks (
enable_profile = false) to reduce overhead
Supported Tuning Types¶
Tuning Type |
Support |
Notes |
|---|---|---|
Partitioning |
Yes |
|
Sorting |
Yes |
Sort keys via data model key ordering |
Distribution |
Yes |
|
Table Model¶
StarRocks uses the Duplicate Key model by default, which is optimal for OLAP workloads. The adapter automatically:
Removes
PRIMARY KEYandFOREIGN KEYconstraints from source DDLAdds
DUPLICATE KEY(<first_column>)to table definitionsAdds
DISTRIBUTED BY HASH(<first_column>) BUCKETS 8for data distribution
Custom Platform Settings¶
Additional StarRocks session variables can be applied through tuning configuration:
benchbox run --platform starrocks --benchmark tpch --scale 10.0 \
--tuning tuned
Troubleshooting¶
Connection Refused¶
Error: Failed to connect to StarRocks
Solutions:
Verify StarRocks FE is running and accessible on the configured host and port
Check that port 9030 (MySQL protocol) is open and not blocked by a firewall
For Docker deployments, ensure port mapping is correct (
-p 9030:9030)Test connectivity directly:
mysql -h <host> -P 9030 -u root
Missing PyMySQL Dependency¶
Error: Missing dependencies for starrocks platform: pymysql
Solutions:
Install PyMySQL:
uv add pymysqlOr install the StarRocks extra:
uv add benchbox --extra starrocks
Schema Creation Failures¶
Error: Schema creation failed
Solutions:
Check that the StarRocks user has
CREATE TABLEandCREATE DATABASEpermissionsVerify the target database exists or the user has
CREATE DATABASEprivilegeReview StarRocks FE logs for detailed error messages
Ensure sufficient disk space and memory on BE nodes
Slow Data Loading¶
Solutions:
Increase
batch_sizefor larger datasets (default: 1,000 rows per batch)Consider using Stream Load directly for datasets larger than SF 10
Ensure BE nodes have adequate memory and disk I/O capacity
Check network latency between BenchBox host and StarRocks cluster
Query Timeout¶
Error: Query execution exceeded timeout
Solutions:
Increase the timeout:
--platform-option max_execution_time=600Check StarRocks resource utilization (CPU, memory, disk I/O)
Verify data distribution is balanced across BE nodes
Review the query plan for performance bottlenecks
See Also¶
Platform Comparison Matrix - Compare all platforms
Platform Selection Guide - Choosing the right platform
TPC-H Benchmark - TPC-H benchmark guide
TPC-DS Benchmark - TPC-DS benchmark guide
Deployment Modes Guide - Platform deployment architecture