ClickHouse Local Mode¶
BenchBox supports ClickHouse in two deployment modes:
Server Mode: Connects to an external ClickHouse server (default)
Local Mode: Uses chDB for in-process ClickHouse engine (new)
Overview¶
ClickHouse Local Mode uses chDB, the official embedded ClickHouse engine, to run ClickHouse queries directly in Python without requiring a separate ClickHouse server installation.
Key Benefits¶
Zero Server Setup: No ClickHouse server installation required
Native Performance: In-process execution eliminates IPC overhead
Development Friendly: Perfect for testing, development, and quick analysis
Same SQL Compatibility: Full ClickHouse SQL dialect support
Easy Installation: Single
pip install chdbcommand
Installation¶
Prerequisites¶
Python 3.10+
Supported platforms: macOS and Linux (x86_64 and ARM64)
Install chDB¶
# Install chDB for embedded mode support
pip install chdb
# Verify installation
python -c "import chdb; print(chdb.chdb_version())"
Install BenchBox with ClickHouse Support¶
# Install BenchBox (if not already installed)
uv add benchbox
# Or with pip
pip install benchbox
Usage¶
Basic Usage¶
# Run TPC-H benchmark in embedded mode
benchbox run tpch --platform=clickhouse --mode=local --scale-factor=0.01
# Run with custom data path
benchbox run tpch --platform=clickhouse --mode=local --data-path=/tmp/benchmark_data
# Compare with server mode
benchbox run tpch --platform=clickhouse --mode=server --host=localhost --port=9000
CLI Arguments¶
Mode Selection¶
--mode=local- Use embedded ClickHouse via chDB--mode=server- Use ClickHouse server (default for backward compatibility)
Embedded Mode Specific Arguments¶
--data-path=PATH- Optional data path for file operations
Server Mode Arguments (not used in embedded mode)¶
--host=HOST- ClickHouse server host--port=PORT- ClickHouse server port--user=USER- Username for server authentication--password=PASS- Password for server authentication--secure- Use TLS connection
Performance Characteristics¶
Embedded Mode¶
Memory Usage: Lower baseline memory (~50-200MB)
Startup Time: No network connection setup required
Query Execution: Columnar engine for analytical workloads
Scalability: Suited for small to medium datasets (< 10GB)
Concurrency: Single-process, sequential query execution
Server Mode¶
Memory Usage: Higher baseline (server overhead)
Startup Time: Network connection overhead
Query Execution: Same columnar engine, distributed architecture available
Scalability: Designed for large datasets (TB+)
Concurrency: Multi-client support, parallel query execution
When to Use Each Mode¶
Use Embedded Mode When:¶
Development & Testing: Quick benchmark development and validation
CI/CD Pipelines: Automated testing without infrastructure setup
Data Analysis: Interactive data exploration and analysis
Prototyping: Rapid benchmark prototyping and iteration
Small to Medium Data: Datasets under 10GB
Single-User Scenarios: Personal analysis and development
Use Server Mode When:¶
Production Benchmarking: Large-scale production environment testing
Large Datasets: Working with multi-TB datasets
Multi-User Access: Shared benchmark environments
Enterprise Deployments: Integration with existing ClickHouse infrastructure
Performance Testing: Maximum throughput and scalability testing
Cluster Configurations: Testing distributed ClickHouse setups
Examples¶
TPC-H Benchmark¶
# Small scale for development
benchbox run tpch --platform=clickhouse --mode=local --scale-factor=0.01
# Medium scale for testing
benchbox run tpch --platform=clickhouse --mode=local --scale-factor=1.0
ClickBench Benchmark¶
# Run ClickBench analytical queries
benchbox run clickbench --platform=clickhouse --mode=local
Custom Data Directory¶
# Use specific directory for generated data
benchbox run tpch \
--platform=clickhouse \
--mode=local \
--scale-factor=0.1 \
--data-path=/path/to/benchmark/data
Troubleshooting¶
Common Issues and Solutions¶
1. chDB Not Installed¶
Error: ClickHouse local mode requires chDB but it is not installed.
Solution:
pip install chdb
2. Platform Not Supported¶
Error: chDB installation failed or not compatible with your platform
Solution:
Ensure you’re on macOS or Linux (x86_64/ARM64)
Try upgrading pip:
pip install --upgrade pipCheck Python version:
python --version(3.8+ required)
3. Memory Issues with Large Datasets¶
Error: Memory limit exceeded or system running out of memory
Solution:
Use smaller scale factors for testing
Switch to server mode for large datasets
Monitor system memory usage
4. Query Performance Issues¶
Queries running slower than expected in embedded mode
Solution:
Embedded mode is optimized for small-medium datasets
For large datasets or maximum performance, use server mode
Consider data partitioning or smaller scale factors
Getting Help¶
Check Installation: Verify chDB is properly installed
python -c "import chdb; print('chDB version:', chdb.chdb_version())"
Verbose Output: Run with verbose logging
benchbox run tpch --platform=clickhouse --mode=local --verbose
Compare Modes: Test both modes to isolate issues
# Test embedded mode benchbox run tpch --platform=clickhouse --mode=local --scale-factor=0.01 # Test server mode (if available) benchbox run tpch --platform=clickhouse --mode=server --scale-factor=0.01
Advanced Usage¶
Performance Tuning¶
While embedded mode has fewer tuning options than server mode, you can optimize performance:
# Use appropriate scale factors
benchbox run tpch --platform=clickhouse --mode=local --scale-factor=0.1
# Monitor memory usage during execution
top -p $(pgrep -f benchbox)
Integration with Other Tools¶
# Export results for analysis
benchbox run tpch --platform=clickhouse --mode=local --output=json > results.json
# Run multiple benchmarks
for benchmark in tpch tpcds ssb; do
echo "Running $benchmark..."
benchbox run $benchmark --platform=clickhouse --mode=local --scale-factor=0.01
done
Technical Details¶
Architecture¶
chDB Integration: Uses official ClickHouse local engine
Connection Management: Persistent connection maintains table state
Query Execution: Direct SQL execution without network overhead
Result Processing: Native Python data type conversion
Error Handling: Comprehensive error messages with resolution guidance
File Formats¶
Embedded mode supports all standard formats:
CSV, TSV (tab-separated)
Parquet (future enhancement)
JSON (future enhancement)
Limitations¶
Single Process: No multi-process parallelism
Memory Bounds: Limited by available system memory
No Clustering: Single-node execution only
No Replication: No built-in data redundancy
Migration Guide¶
From Server to Embedded Mode¶
# Old server mode command
benchbox run tpch --platform=clickhouse --host=localhost --port=9000
# New embedded mode equivalent
benchbox run tpch --platform=clickhouse --mode=local
From Embedded to Server Mode¶
# Current embedded mode command
benchbox run tpch --platform=clickhouse --mode=local
# Server mode equivalent (requires ClickHouse server)
benchbox run tpch --platform=clickhouse --mode=server --host=localhost --port=9000
Contributing¶
To contribute to ClickHouse local mode support:
Testing: Run the embedded mode test suite
pytest tests/unit/platforms/test_clickhouse_local.py -v
Development: Set up development environment
pip install -e .[dev] pip install chdb
Bug Reports: Include system information and chDB version
python -c "import chdb, platform; print(f'chDB: {chdb.chdb_version()}, Platform: {platform.platform()}')"