Troubleshooting Guide¶

This guide helps diagnose and resolve common issues when running BenchBox benchmarks.

Quick Diagnosis¶

Error Type Matrix¶

Error Message Contains	Likely Cause	Jump To
“connection refused”	Platform not running	Connection Issues
“authentication failed”	Invalid credentials	Authentication
“catalog not found”	Presto/Trino config	Catalog Errors
“permission denied”	Access rights	Permission Errors
“timeout”	Slow query/network	Timeouts
“out of memory”	Scale too large	Memory Issues
“ImportError”	Missing package	Dependencies
“file not found”	Data loading	Data Loading
“command not found: benchbox”	PATH issue	Installation Issues

Installation Issues¶

`command not found: benchbox`¶

Problem: The benchbox command is not available in your shell after installation.

Solution:

Check your PATH: Ensure that the Python scripts directory is in your system’s PATH. You can find the directory by running:
```
python -m site --user-base
```
Add the bin subdirectory of that path to your PATH.
Reactivate your virtual environment: If you installed BenchBox in a virtual environment, make sure it’s activated:
```
source .venv/bin/activate
```

Missing Platform Dependencies¶

Problem: You get an ImportError when trying to use a specific database platform.

Solution:

Install the required dependencies for that platform:

# Specific platforms
pip install "benchbox[snowflake]"
pip install "benchbox[databricks]"
pip install "benchbox[bigquery]"

# All cloud platforms
pip install "benchbox[cloud]"

# DataFrame platforms
pip install "benchbox[dataframe]"

Check installation status:

benchbox platforms list  # Shows available/unavailable

Shell Reports `no matches found`¶

Problem: Shells such as zsh treat square brackets as glob patterns, producing errors like zsh: no matches found: benchbox[cloud].

Solution: Use modern uv syntax (no quotes needed):

uv add benchbox --extra cloud
uv add benchbox --extra cloud --extra clickhouse

Or quote the pip-compatible syntax:

uv pip install "benchbox[cloud]"
python -m pip install "benchbox[cloud,clickhouse]"

Connection Issues¶

Connection Refused¶

Symptoms:

ConnectionRefusedError: [Errno 111] Connection refused
OperationalError: could not connect to server

Diagnosis:

# Check if service is running
curl -s http://localhost:3473/health  # Firebolt Core
curl -s http://localhost:8080         # Trino/Presto

# Docker platforms
docker ps | grep -E 'trino|presto|clickhouse|firebolt'

Solutions:

Start the platform:

# Trino
docker run -d -p 8080:8080 trinodb/trino

# Firebolt Core
docker run -d -p 3473:3473 ghcr.io/firebolt-db/firebolt-core:preview-rc

# ClickHouse
docker run -d -p 9000:9000 clickhouse/clickhouse-server

Check port availability:

lsof -i :8080  # Check if port is in use
netstat -an | grep 8080

Verify host/port in config:

benchbox run --platform trino --benchmark tpch \
  --platform-option host=localhost \
  --platform-option port=8080

Network Timeout¶

Symptoms:

TimeoutError: Connection timed out
socket.timeout: timed out

Solutions:

Cloud platforms - check firewall:

# AWS Security Groups
aws ec2 describe-security-groups --group-ids sg-xxx

# Test connectivity
nc -zv your-cluster.redshift.amazonaws.com 5439

Increase connection timeout:

benchbox run --platform snowflake --benchmark tpch \
  --platform-option connect_timeout=60

Authentication Failures¶

Invalid Credentials¶

Symptoms:

AuthenticationError: Invalid credentials
401 Unauthorized
Access Denied

Platform-Specific Solutions:

Snowflake¶

# Verify credentials work
snowsql -a $SNOWFLAKE_ACCOUNT -u $SNOWFLAKE_USER

# Check account format (should be account_locator.region)
echo $SNOWFLAKE_ACCOUNT
# Correct: xy12345.us-east-1 or xy12345.us-east-1.aws

Databricks¶

# Test token validity
curl -H "Authorization: Bearer $DATABRICKS_TOKEN" \
  https://your-workspace.cloud.databricks.com/api/2.0/clusters/list

# Regenerate token if expired (90 days default)
# User Settings > Developer > Access Tokens

BigQuery¶

# Test service account
gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
gcloud auth application-default print-access-token

# Verify project access
gcloud projects describe $BIGQUERY_PROJECT

Redshift¶

# Test connection
psql -h $REDSHIFT_HOST -p 5439 -U $REDSHIFT_USER -d dev

# For IAM auth, verify role
aws sts get-caller-identity

Token Expired¶

Symptoms:

Token has expired
Session expired

Solutions:

Regenerate tokens:
- Databricks: User Settings > Access Tokens > Generate New
- Snowflake: Tokens don’t expire, check password
- BigQuery: gcloud auth application-default login

Use refresh tokens where supported:

# BigQuery - auto-refresh with ADC
gcloud auth application-default login

Catalog Not Found (Presto/Trino)¶

Symptoms:

ConfigurationError: Catalog 'memory' not found
CatalogNotFoundError: Catalog does not exist

Cause: The memory catalog default rarely exists on production servers.

Solutions:

List available catalogs:

# Trino/Presto
benchbox platforms check --platform trino \
  --platform-option host=localhost

Specify the correct catalog:

benchbox run --platform trino --benchmark tpch \
  --platform-option catalog=hive     # Or iceberg, delta, etc.

Common catalog names:
- hive - Hive Metastore
- iceberg - Apache Iceberg
- delta - Delta Lake
- tpch - TPC-H connector (built-in)
- mysql, postgresql - Database connectors

Permission Errors¶

Insufficient Privileges¶

Symptoms:

PermissionDenied: User does not have permission
AccessDenied: Access Denied

Solutions by Platform:

Snowflake¶

-- Grant required permissions
GRANT USAGE ON WAREHOUSE compute_wh TO ROLE benchbox_role;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE benchbox_role;
GRANT USAGE ON DATABASE benchbox TO ROLE benchbox_role;
GRANT CREATE TABLE ON SCHEMA benchbox.public TO ROLE benchbox_role;

Databricks¶

-- Unity Catalog permissions
GRANT USE CATALOG ON CATALOG benchmarks TO `user@company.com`;
GRANT CREATE SCHEMA ON CATALOG benchmarks TO `user@company.com`;
GRANT USE SCHEMA ON SCHEMA benchmarks.default TO `user@company.com`;

BigQuery¶

# Grant via gcloud
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="user:user@company.com" \
  --role="roles/bigquery.dataEditor"

Redshift¶

-- Grant schema access
GRANT CREATE ON DATABASE dev TO benchbox_user;
GRANT ALL ON SCHEMA public TO benchbox_user;

Query Timeouts¶

Symptoms:

QueryTimeoutError: Query exceeded time limit
Statement timeout

Solutions:

Increase query timeout:

# Global
benchbox run --platform snowflake --benchmark tpch \
  --platform-option query_timeout=3600  # 1 hour

# Redshift
benchbox run --platform redshift --benchmark tpch \
  --platform-option statement_timeout=3600000  # ms

Use larger compute resources:

# Snowflake - larger warehouse
benchbox run --platform snowflake --benchmark tpch --scale 10 \
  --platform-option warehouse=LARGE_WH

# Databricks - larger SQL warehouse
benchbox run --platform databricks --benchmark tpch --scale 10 \
  --platform-option http_path=/sql/1.0/warehouses/large_wh_id

Reduce scale factor for testing:

# Start small
benchbox run --platform snowflake --benchmark tpch --scale 0.1

Memory Issues¶

Out of Memory¶

Symptoms:

MemoryError: Unable to allocate
OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space

Solutions by Platform:

DuckDB¶

# Limit memory and enable spilling
benchbox run --platform duckdb --benchmark tpch --scale 10 \
  --platform-option memory_limit=8GB \
  --platform-option temp_directory=/fast/ssd/tmp

Polars¶

# Enable streaming for large datasets
benchbox run --platform polars-df --benchmark tpch --scale 10 \
  --platform-option streaming=true

Spark¶

# Increase executor memory
benchbox run --platform spark --benchmark tpch --scale 10 \
  --platform-option executor_memory=8g \
  --platform-option driver_memory=4g

Cloud Platforms¶

# Use larger compute tiers
benchbox run --platform snowflake --benchmark tpch --scale 100 \
  --platform-option warehouse=X_LARGE_WH

Scale Factor Recommendations¶

Platform	Max Recommended SF	Notes
DuckDB	10-100	Depends on RAM
SQLite	0.1-1.0	Not for OLAP
Polars	10-100	Enable streaming
Snowflake	1000+	Scale warehouse
Databricks	1000+	Scale cluster
BigQuery	1000+	Serverless

Data Generation Issues¶

`dbgen` or `dsdgen` not found¶

Problem: TPC-H or TPC-DS data generation fails with an error indicating that dbgen or dsdgen is not found.

Solution:

BenchBox attempts to compile these tools automatically, but if that fails, you may need to compile them manually.

Navigate to the tools directory:

# For TPC-H
cd _sources/tpc-h/dbgen

# For TPC-DS
cd _sources/tpc-ds/tools

Compile the tools:
```
make
```
If you encounter compilation errors, you may need to install a C compiler and other build tools (build-essential on Debian/Ubuntu, Xcode Command Line Tools on macOS).

Slow Data Generation¶

Problem: Data generation is taking a very long time.

Solution:

Use a smaller scale factor: For testing and development, use a small scale factor like 0.01.
Run power-only cycles first: benchbox run --phases generate,load,power lets you warm caches before expanding to throughput tests.
Check disk throughput: Write data to a fast local volume before copying it to network storage. Use the --output flag to point at SSD-backed paths.
Persist generated data: Reuse existing datasets with --force turned off (default) so future runs skip regeneration.

Data Loading Issues¶

File Not Found¶

Symptoms:

FileNotFoundError: Data file not found
No such file or directory

Solutions:

Generate data first:

# Explicit generation
benchbox run --platform duckdb --benchmark tpch --scale 0.1 \
  --phases generate

Check data directory:
```
ls -la ~/.cache/benchbox/tpch/sf0.1/
```

Force regeneration:

benchbox run --platform duckdb --benchmark tpch --scale 0.1 \
  --force datagen

Upload Failures¶

Symptoms:

UploadError: Failed to upload file
S3 upload failed
Storage access denied

Solutions:

Cloud Storage Staging¶

# Verify storage access
aws s3 ls s3://your-bucket/benchbox/

# Test write access
aws s3 cp test.txt s3://your-bucket/benchbox/

# Configure staging
benchbox run --platform redshift --benchmark tpch --scale 10 \
  --staging-root s3://your-bucket/benchbox/

Snowflake Stages¶

# List stages
snowsql -q "SHOW STAGES;"

# Create user stage
snowsql -q "CREATE STAGE IF NOT EXISTS @~/benchbox_stage;"

Databricks Volumes¶

# Check volume permissions
databricks volumes list /Volumes/catalog/schema/

# Create volume
databricks volumes create catalog.schema.benchbox_data

Platform-Specific Issues¶

Snowflake¶

Warehouse Suspended¶

# Resume warehouse
snowsql -q "ALTER WAREHOUSE BENCHMARK_WH RESUME;"

# Set auto-resume
snowsql -q "ALTER WAREHOUSE BENCHMARK_WH SET AUTO_RESUME = TRUE;"

Databricks¶

Cluster Not Running¶

# Start cluster via API
curl -X POST "https://workspace.cloud.databricks.com/api/2.0/clusters/start" \
  -H "Authorization: Bearer $DATABRICKS_TOKEN" \
  -d '{"cluster_id": "your-cluster-id"}'

# Or use SQL Warehouse (always-on option available)

BigQuery¶

Quota Exceeded¶

# Check quotas
gcloud compute project-info describe --project $PROJECT_ID

# Request increase via Console
# BigQuery > Quotas > Request Increase

General Tips¶

Use the --verbose flag: The -v or -vv flag can provide more detailed output to help you diagnose issues.
Check the logs: BenchBox creates log files in the output directory. These can contain valuable information for troubleshooting.
Start small: When testing a new setup, start with a small scale factor (e.g., 0.01) to quickly verify that everything is working correctly.

Getting Help¶

Diagnostic Information¶

When reporting issues, include:

# System info
benchbox --version
python --version
uname -a

# Platform availability
benchbox platforms list

# Full error with traceback
benchbox run --platform <platform> --benchmark tpch --scale 0.01 \
  --verbose 2>&1 | tee benchmark_error.log

Resources¶

GitHub Issues - Report bugs
Platform Docs - Platform-specific guides
Configuration Guide - Detailed options

Troubleshooting Guide¶

Quick Diagnosis¶

Error Type Matrix¶

Installation Issues¶

command not found: benchbox¶

Missing Platform Dependencies¶

Shell Reports no matches found¶

Connection Issues¶

Connection Refused¶

Network Timeout¶

Authentication Failures¶

Invalid Credentials¶

Snowflake¶

Databricks¶

BigQuery¶

Redshift¶

Token Expired¶

Catalog Not Found (Presto/Trino)¶

Permission Errors¶

Insufficient Privileges¶

Snowflake¶

Databricks¶

BigQuery¶

Redshift¶

Query Timeouts¶

Memory Issues¶

Out of Memory¶

DuckDB¶

Polars¶

Spark¶

Cloud Platforms¶

Scale Factor Recommendations¶

Data Generation Issues¶

dbgen or dsdgen not found¶

Slow Data Generation¶

Data Loading Issues¶

File Not Found¶

Upload Failures¶

Cloud Storage Staging¶

Snowflake Stages¶

Databricks Volumes¶

Platform-Specific Issues¶

Snowflake¶

Warehouse Suspended¶

Databricks¶

Cluster Not Running¶

BigQuery¶

Quota Exceeded¶

General Tips¶

Getting Help¶

Diagnostic Information¶

Resources¶

Related Documentation¶

`command not found: benchbox`¶

Shell Reports `no matches found`¶

`dbgen` or `dsdgen` not found¶