Trino Platform¶

Tags intermediate guide trino sql-platform

Trino (formerly PrestoSQL) is a distributed SQL query engine designed for interactive analytics against data sources of all sizes. It’s widely used by companies like Netflix, Airbnb, and Lyft for data lake analytics.

Features¶

Distributed execution - Query data across multiple workers
Federated queries - Join data from multiple sources (S3, Hive, Iceberg, Delta)
Session properties - Fine-grained query optimization
Table formats - Support for Iceberg, Delta Lake, and Hive
Starburst compatible - Works with Starburst Enterprise

Important Notes¶

Trino vs PrestoDB: This adapter supports Trino only, NOT PrestoDB (Meta’s fork). While they share ancestry, they have diverged significantly since 2019:

Different Python drivers (trino vs presto-python-client)
Different HTTP headers (X-Trino-* vs X-Presto-*)
Diverging SQL syntax and functions

For PrestoDB, use the Presto adapter. For AWS managed Trino, use the Athena adapter.

Installation¶

# Install Trino Python driver
pip install trino

# Or install with authentication support
pip install "trino[kerberos]"

Configuration¶

Environment Variables¶

export TRINO_HOST=localhost
export TRINO_PORT=8080
export TRINO_USER=trino
export TRINO_CATALOG=memory
export TRINO_SCHEMA=default

CLI Options¶

benchbox run --platform trino --benchmark tpch --scale 1.0 \
  --platform-option host=trino-coordinator.example.com \
  --platform-option port=8080 \
  --platform-option catalog=hive \
  --platform-option schema=benchmark

Platform Options¶

Option	Default	Description
`host`	localhost	Trino coordinator hostname
`port`	8080	Trino coordinator port
`catalog`	memory	Default catalog (hive, iceberg, delta, memory)
`schema`	default	Default schema
`username`	trino	Trino user for query attribution
`password`	(none)	Password for LDAP/basic auth
`http_scheme`	http/https	Auto-detected based on auth
`verify_ssl`	true	Verify SSL certificates
`staging_root`	(none)	S3/GCS path for data staging
`table_format`	hive	Table format (hive, iceberg, delta)

Usage Examples¶

Basic Benchmark Run¶

# Run TPC-H with Hive catalog
benchbox run --platform trino --benchmark tpch --scale 1.0 \
  --platform-option host=trino.example.com \
  --platform-option catalog=hive \
  --platform-option schema=tpch_sf1

Iceberg Tables¶

# Run with Iceberg table format
benchbox run --platform trino --benchmark tpch --scale 1.0 \
  --platform-option catalog=iceberg \
  --platform-option table_format=iceberg \
  --platform-option staging_root=s3://bucket/staging/

Python API¶

from benchbox import TPCH
from benchbox.platforms.trino import TrinoAdapter

# Initialize adapter
adapter = TrinoAdapter(
    host="trino-coordinator.example.com",
    port=8080,
    catalog="hive",
    schema="benchmark",
    username="analyst",
)

# Load and run benchmark
benchmark = TPCH(scale_factor=1.0)
adapter.load_benchmark(benchmark)
results = adapter.run_benchmark(benchmark)

Deployment Modes¶

Local Development (Memory Catalog)¶

# Start Trino locally (Docker)
docker run -d -p 8080:8080 --name trino trinodb/trino

# Run benchmark with memory catalog
benchbox run --platform trino --benchmark tpch --scale 0.1 \
  --platform-option catalog=memory

If you installed Trino via Homebrew, you can start the service with:

brew install trino
brew services start trino
# or run it manually
trino-server run

BenchBox automatically detects when localhost:8080 refuses connections and emits a friendly error explaining that Trino needs to be started (including the brew services start trino reminder) or that you should point BenchBox at a remote Trino cluster via --platform-option host=<host> --platform-option port=<port>.

Production (Hive/Iceberg)¶

# Run with Hive Metastore
benchbox run --platform trino --benchmark tpch --scale 10.0 \
  --platform-option host=trino.production.com \
  --platform-option catalog=hive \
  --platform-option staging_root=s3://data-lake/staging/

Starburst Enterprise¶

The Trino adapter is fully compatible with Starburst Enterprise:

benchbox run --platform trino --benchmark tpch \
  --platform-option host=starburst.example.com \
  --platform-option http_scheme=https \
  --platform-option verify_ssl=true

Performance Tuning¶

Session Properties¶

Common session properties for optimization:

adapter = TrinoAdapter(
    host="trino.example.com",
    session_properties={
        "query_max_memory": "8GB",
        "query_max_memory_per_node": "2GB",
        "join_distribution_type": "AUTOMATIC",
    }
)

Data Format Recommendations¶

Parquet: Best compression and performance for analytics
ORC: Good alternative with predicate pushdown
Iceberg: Recommended for updates and time-travel

Query Plan Analysis¶

benchbox run --platform trino --benchmark tpch \
  --show-query-plans

Trino provides detailed EXPLAIN output including:

Distributed query plan fragments
Data exchange patterns
Join strategies
Partition pruning

Limitations¶

No local execution - Requires Trino cluster
Infrastructure overhead - Coordinator and workers needed
Cold start - First queries may be slower

Troubleshooting¶

Connection Timeout¶

# Verify Trino is accessible
curl http://trino-host:8080/v1/info

Catalog Not Found¶

# List available catalogs
trino --server trino-host:8080 --execute "SHOW CATALOGS"

Memory Errors¶

-- Check memory usage
SELECT * FROM system.runtime.queries WHERE state = 'RUNNING';