Redshift Platform

Tags intermediate guide redshift cloud-platform

Amazon Redshift is a fully managed petabyte-scale data warehouse service in AWS. BenchBox provides comprehensive support for Redshift benchmarking with S3-based data loading via COPY command.

Features

  • Columnar storage - Optimized for analytics

  • Massively parallel processing - Distributed query execution

  • RA3 nodes - Managed storage with S3 backing

  • Concurrency scaling - Automatic burst capacity

  • Redshift Spectrum - Query S3 directly

Prerequisites

  • Amazon Redshift cluster (provisioned or Serverless)

  • IAM role with S3 access for COPY command

  • S3 bucket for data staging

  • VPC security group allowing inbound connections

Installation

# Install Redshift connector
pip install redshift_connector boto3

# Or via BenchBox extras
pip install "benchbox[redshift]"

Configuration

Interactive Setup

benchbox platforms setup --platform redshift

CLI Options

benchbox run --platform redshift --benchmark tpch --scale 1.0 \
  --platform-option host=your-cluster.abc123xyz.us-east-1.redshift.amazonaws.com \
  --platform-option user=admin \
  --platform-option password=your_password \
  --platform-option database=dev

Platform Options

Option

Default

Description

host

(env)

Cluster endpoint

user

(env)

Database username

password

(env)

Database password

database

dev

Database name

port

5439

Connection port

iam_role

(env)

IAM role ARN for COPY

region

(auto)

AWS region

ssl

true

Enable SSL connection

Authentication Methods

Standard Authentication

# Username/password auth
benchbox run --platform redshift --benchmark tpch --scale 1.0 \
  --platform-option host=cluster.abc123.us-east-1.redshift.amazonaws.com \
  --platform-option user=admin \
  --platform-option password=secure_password

IAM Authentication

# IAM database authentication
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

benchbox run --platform redshift --benchmark tpch \
  --platform-option host=cluster.abc123.us-east-1.redshift.amazonaws.com \
  --platform-option user=iam_user \
  --platform-option iam_auth=true

Redshift Serverless

# Serverless workgroup
benchbox run --platform redshift --benchmark tpch \
  --platform-option workgroup_name=default \
  --platform-option database=dev

Usage Examples

Basic Benchmark

# TPC-H at scale factor 1
benchbox run --platform redshift --benchmark tpch --scale 1.0

With S3 Staging

# Configure S3 staging for COPY command
benchbox run --platform redshift --benchmark tpch --scale 100.0 \
  --staging-root s3://your-bucket/benchbox/ \
  --platform-option iam_role=arn:aws:iam::123456789:role/RedshiftS3Access

With Tuning

# Apply distribution and sort keys
benchbox run --platform redshift --benchmark tpch --scale 10.0 \
  --tuning tuned

Python API

from benchbox import TPCH
from benchbox.platforms.redshift import RedshiftAdapter

adapter = RedshiftAdapter(
    host="cluster.abc123.us-east-1.redshift.amazonaws.com",
    user="admin",
    password="secure_password",
    database="dev",
    iam_role="arn:aws:iam::123456789:role/RedshiftS3Access",
)

benchmark = TPCH(scale_factor=1.0)
benchmark.generate_data()
adapter.load_benchmark(benchmark)
results = adapter.run_benchmark(benchmark)

Node Types

DC2 Nodes (Dense Compute)

Node Type

vCPU

Memory

Storage

dc2.large

2

15 GB

160 GB

dc2.8xlarge

32

244 GB

2.56 TB

Serverless

# Use serverless for variable workloads
benchbox run --platform redshift --benchmark tpch \
  --platform-option workgroup_name=benchbox-wg

Performance Features

Distribution Keys

BenchBox applies distribution keys with --tuning tuned:

-- Key distribution for large tables
CREATE TABLE lineitem (...)
DISTKEY (l_orderkey)
SORTKEY (l_shipdate);

-- All distribution for small tables
CREATE TABLE nation (...)
DISTSTYLE ALL;

Sort Keys

-- Compound sort key
CREATE TABLE orders (...)
COMPOUND SORTKEY (o_orderdate, o_custkey);

-- Interleaved sort key (for multiple filter columns)
CREATE TABLE lineitem (...)
INTERLEAVED SORTKEY (l_shipdate, l_receiptdate);

WLM Configuration

BenchBox uses dedicated queue for benchmark queries:

# Configure WLM queue
benchbox run --platform redshift --benchmark tpch \
  --platform-option query_group=benchbox \
  --platform-option concurrency_scaling=on

Data Loading

IAM Role Setup

# Create IAM role for COPY
aws iam create-role --role-name RedshiftS3Access \
  --assume-role-policy-document file://trust-policy.json

# Attach S3 read policy
aws iam attach-role-policy --role-name RedshiftS3Access \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

# Associate with cluster
aws redshift modify-cluster-iam-roles \
  --cluster-identifier my-cluster \
  --add-iam-roles arn:aws:iam::123456789:role/RedshiftS3Access

Direct Insert (Small Datasets)

For datasets under 1GB, direct INSERT is used automatically:

benchbox run --platform redshift --benchmark tpch --scale 0.01

Cost Optimization

Concurrency Scaling

# Enable concurrency scaling for burst capacity
benchbox run --platform redshift --benchmark tpch \
  --platform-option concurrency_scaling=auto

Pause/Resume

# Pause cluster when not in use
aws redshift pause-cluster --cluster-identifier my-cluster

# Resume before benchmark
aws redshift resume-cluster --cluster-identifier my-cluster

Serverless RPU

# Set max RPU to control costs
benchbox run --platform redshift --benchmark tpch \
  --platform-option workgroup_name=benchbox \
  --platform-option max_rpu=128

Troubleshooting

Connection Refused

# Check cluster status
aws redshift describe-clusters --cluster-identifier my-cluster

# Verify security group allows your IP
aws ec2 describe-security-groups --group-ids sg-xxx

# Add your IP to security group
aws ec2 authorize-security-group-ingress \
  --group-id sg-xxx \
  --protocol tcp \
  --port 5439 \
  --cidr YOUR_IP/32

COPY Failed

# Check COPY errors
SELECT * FROM stl_load_errors ORDER BY starttime DESC LIMIT 10;

# Verify IAM role is attached
aws redshift describe-clusters --cluster-identifier my-cluster \
  --query 'Clusters[0].IamRoles'

# Test S3 access
SELECT * FROM svl_s3list
WHERE bucket = 'your-bucket';

Permission Denied

-- Grant required permissions
GRANT CREATE ON DATABASE dev TO benchbox_user;
GRANT CREATE ON SCHEMA public TO benchbox_user;
GRANT ALL ON ALL TABLES IN SCHEMA public TO benchbox_user;

Query Timeout

# Increase statement timeout
benchbox run --platform redshift --benchmark tpch \
  --platform-option statement_timeout=3600000  # 1 hour in ms

Disk Space Exceeded

# Check disk usage
SELECT owner, host, diskno, used, capacity
FROM stv_partitions
ORDER BY used DESC;

# Vacuum to reclaim space
VACUUM FULL lineitem;