Redshift Platform¶
Amazon Redshift is a fully managed petabyte-scale data warehouse service in AWS. BenchBox provides comprehensive support for Redshift benchmarking with S3-based data loading via COPY command.
Features¶
Columnar storage - Optimized for analytics
Massively parallel processing - Distributed query execution
RA3 nodes - Managed storage with S3 backing
Concurrency scaling - Automatic burst capacity
Redshift Spectrum - Query S3 directly
Prerequisites¶
Amazon Redshift cluster (provisioned or Serverless)
IAM role with S3 access for COPY command
S3 bucket for data staging
VPC security group allowing inbound connections
Installation¶
# Install Redshift connector
pip install redshift_connector boto3
# Or via BenchBox extras
pip install "benchbox[redshift]"
Configuration¶
Environment Variables (Recommended)¶
export REDSHIFT_HOST=your-cluster.abc123xyz.us-east-1.redshift.amazonaws.com
export REDSHIFT_USER=admin
export REDSHIFT_PASSWORD=your_password
export REDSHIFT_DATABASE=dev
export REDSHIFT_IAM_ROLE=arn:aws:iam::123456789:role/RedshiftS3Access
Interactive Setup¶
benchbox platforms setup --platform redshift
CLI Options¶
benchbox run --platform redshift --benchmark tpch --scale 1.0 \
--platform-option host=your-cluster.abc123xyz.us-east-1.redshift.amazonaws.com \
--platform-option user=admin \
--platform-option password=your_password \
--platform-option database=dev
Platform Options¶
Option |
Default |
Description |
|---|---|---|
|
(env) |
Cluster endpoint |
|
(env) |
Database username |
|
(env) |
Database password |
|
dev |
Database name |
|
5439 |
Connection port |
|
(env) |
IAM role ARN for COPY |
|
(auto) |
AWS region |
|
true |
Enable SSL connection |
Authentication Methods¶
Standard Authentication¶
# Username/password auth
benchbox run --platform redshift --benchmark tpch --scale 1.0 \
--platform-option host=cluster.abc123.us-east-1.redshift.amazonaws.com \
--platform-option user=admin \
--platform-option password=secure_password
IAM Authentication¶
# IAM database authentication
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
benchbox run --platform redshift --benchmark tpch \
--platform-option host=cluster.abc123.us-east-1.redshift.amazonaws.com \
--platform-option user=iam_user \
--platform-option iam_auth=true
Redshift Serverless¶
# Serverless workgroup
benchbox run --platform redshift --benchmark tpch \
--platform-option workgroup_name=default \
--platform-option database=dev
Usage Examples¶
Basic Benchmark¶
# TPC-H at scale factor 1
benchbox run --platform redshift --benchmark tpch --scale 1.0
With S3 Staging¶
# Configure S3 staging for COPY command
benchbox run --platform redshift --benchmark tpch --scale 100.0 \
--staging-root s3://your-bucket/benchbox/ \
--platform-option iam_role=arn:aws:iam::123456789:role/RedshiftS3Access
With Tuning¶
# Apply distribution and sort keys
benchbox run --platform redshift --benchmark tpch --scale 10.0 \
--tuning tuned
Python API¶
from benchbox import TPCH
from benchbox.platforms.redshift import RedshiftAdapter
adapter = RedshiftAdapter(
host="cluster.abc123.us-east-1.redshift.amazonaws.com",
user="admin",
password="secure_password",
database="dev",
iam_role="arn:aws:iam::123456789:role/RedshiftS3Access",
)
benchmark = TPCH(scale_factor=1.0)
benchmark.generate_data()
adapter.load_benchmark(benchmark)
results = adapter.run_benchmark(benchmark)
Node Types¶
RA3 Nodes (Recommended)¶
Node Type |
vCPU |
Memory |
Storage |
|---|---|---|---|
ra3.xlplus |
4 |
32 GB |
Managed |
ra3.4xlarge |
12 |
96 GB |
Managed |
ra3.16xlarge |
48 |
384 GB |
Managed |
DC2 Nodes (Dense Compute)¶
Node Type |
vCPU |
Memory |
Storage |
|---|---|---|---|
dc2.large |
2 |
15 GB |
160 GB |
dc2.8xlarge |
32 |
244 GB |
2.56 TB |
Serverless¶
# Use serverless for variable workloads
benchbox run --platform redshift --benchmark tpch \
--platform-option workgroup_name=benchbox-wg
Performance Features¶
Distribution Keys¶
BenchBox applies distribution keys with --tuning tuned:
-- Key distribution for large tables
CREATE TABLE lineitem (...)
DISTKEY (l_orderkey)
SORTKEY (l_shipdate);
-- All distribution for small tables
CREATE TABLE nation (...)
DISTSTYLE ALL;
Sort Keys¶
-- Compound sort key
CREATE TABLE orders (...)
COMPOUND SORTKEY (o_orderdate, o_custkey);
-- Interleaved sort key (for multiple filter columns)
CREATE TABLE lineitem (...)
INTERLEAVED SORTKEY (l_shipdate, l_receiptdate);
WLM Configuration¶
BenchBox uses dedicated queue for benchmark queries:
# Configure WLM queue
benchbox run --platform redshift --benchmark tpch \
--platform-option query_group=benchbox \
--platform-option concurrency_scaling=on
Data Loading¶
COPY from S3 (Recommended)¶
# Configure S3 staging
benchbox run --platform redshift --benchmark tpch --scale 10.0 \
--staging-root s3://bucket/benchbox/ \
--platform-option iam_role=arn:aws:iam::123:role/RedshiftS3
IAM Role Setup¶
# Create IAM role for COPY
aws iam create-role --role-name RedshiftS3Access \
--assume-role-policy-document file://trust-policy.json
# Attach S3 read policy
aws iam attach-role-policy --role-name RedshiftS3Access \
--policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
# Associate with cluster
aws redshift modify-cluster-iam-roles \
--cluster-identifier my-cluster \
--add-iam-roles arn:aws:iam::123456789:role/RedshiftS3Access
Direct Insert (Small Datasets)¶
For datasets under 1GB, direct INSERT is used automatically:
benchbox run --platform redshift --benchmark tpch --scale 0.01
Cost Optimization¶
Concurrency Scaling¶
# Enable concurrency scaling for burst capacity
benchbox run --platform redshift --benchmark tpch \
--platform-option concurrency_scaling=auto
Pause/Resume¶
# Pause cluster when not in use
aws redshift pause-cluster --cluster-identifier my-cluster
# Resume before benchmark
aws redshift resume-cluster --cluster-identifier my-cluster
Serverless RPU¶
# Set max RPU to control costs
benchbox run --platform redshift --benchmark tpch \
--platform-option workgroup_name=benchbox \
--platform-option max_rpu=128
Troubleshooting¶
Connection Refused¶
# Check cluster status
aws redshift describe-clusters --cluster-identifier my-cluster
# Verify security group allows your IP
aws ec2 describe-security-groups --group-ids sg-xxx
# Add your IP to security group
aws ec2 authorize-security-group-ingress \
--group-id sg-xxx \
--protocol tcp \
--port 5439 \
--cidr YOUR_IP/32
COPY Failed¶
# Check COPY errors
SELECT * FROM stl_load_errors ORDER BY starttime DESC LIMIT 10;
# Verify IAM role is attached
aws redshift describe-clusters --cluster-identifier my-cluster \
--query 'Clusters[0].IamRoles'
# Test S3 access
SELECT * FROM svl_s3list
WHERE bucket = 'your-bucket';
Permission Denied¶
-- Grant required permissions
GRANT CREATE ON DATABASE dev TO benchbox_user;
GRANT CREATE ON SCHEMA public TO benchbox_user;
GRANT ALL ON ALL TABLES IN SCHEMA public TO benchbox_user;
Query Timeout¶
# Increase statement timeout
benchbox run --platform redshift --benchmark tpch \
--platform-option statement_timeout=3600000 # 1 hour in ms
Disk Space Exceeded¶
# Check disk usage
SELECT owner, host, diskno, used, capacity
FROM stv_partitions
ORDER BY used DESC;
# Vacuum to reclaim space
VACUUM FULL lineitem;