PrestoDB Platform Guide

Tags intermediate guide presto sql-platform

PrestoDB is Meta’s distributed SQL query engine (the original Presto project). BenchBox provides a dedicated PrestoDB adapter that intentionally diverges from the Trino adapter to respect protocol, driver, and dialect differences between the two forks.

Overview

Type: Distributed SQL query engine
Common Use Cases: Legacy Presto clusters, Meta ecosystem deployments, Presto-compatible managed services
BenchBox Extra: uv add benchbox --extra presto
Driver-only install: python -m pip install presto-python-client

Why a Separate Adapter (Not Shared With Trino)

  • Driver API: PrestoDB uses presto-python-client (prestodb import) while Trino uses the trino package. Cursor semantics and auth helpers differ.

  • Protocol Headers: PrestoDB sends X-Presto-* headers; Trino sends X-Trino-*. Mixing them causes authentication failures.

  • Dialect: SQLGlot exposes a dedicated presto dialect. Trino inherits from it but overrides functions (LOG base order, TRIM handling, JSON behavior, LISTAGG rules).

  • System Metadata: System schemas diverge (e.g., runtime tables, connector properties), so platform info queries cannot be shared.

The adapter follows the recommended path of keeping PrestoDB and Trino isolated for clarity and maintainability.

SQL Dialect Notes (SQLGlot)

  • PrestoDB uses the presto dialect identifier (adapter.get_target_dialect() returns "presto").

  • Function differences vs Trino include:

    • LOG argument order (base-first), distinct from Trino’s override

    • TRIM, JSON_QUERY, and LISTAGG parsing

    • Array and aggregation transforms (e.g., ArraySum, GroupConcat, Merge) use Presto-specific rewrites

  • Use this adapter when you need PrestoDB-specific query rendering; use the Trino adapter for Trino/Starburst workloads.

Quick Start

Install the Presto extra (includes presto-python-client and cloudpathlib):

uv add benchbox --extra presto
from benchbox.platforms.presto import PrestoAdapter
from benchbox import TPCH

adapter = PrestoAdapter(
    host="presto-coordinator.example.com",
    port=8080,
    catalog="hive",
    schema="default",
    username="presto",
    # password="secret",  # Optional basic auth
)

benchmark = TPCH(scale_factor=1.0)
results = benchmark.run_with_platform(adapter)

print(f"Completed in {results.duration_seconds:.2f}s")

CLI Usage

# Minimal PrestoDB run (assumes presto-python-client installed)
benchbox run --platform presto --benchmark tpch --scale 1.0 \
  --host presto-coordinator.example.com \
  --catalog hive --schema default --username presto

Configuration Highlights

  • Connection: host, port (default 8080), catalog, schema

  • Authentication: Basic auth via presto-python-client (--username/--password); Kerberos can be layered later if needed

  • Protocol: http_scheme auto-selects https when a password is provided; SSL verification is configurable via verify_ssl or ssl_cert_path

  • Dialect: Hard-coded to "presto" to avoid Trino behavior

  • Session Properties: Can be set via session_properties mapping (applied after connect with SET SESSION)

  • Data Loading: Uses batch INSERT statements with small batches tuned for PrestoDB; supports memory or Hive catalogs

When to Choose PrestoDB vs Trino vs Athena

  • PrestoDB: Existing Presto deployments, Meta ecosystem, or environments pinned to Presto-only features.

  • Trino: Recommended default for modern clusters and Starburst Enterprise.

  • Athena: AWS-managed Presto/Trino lineage with serverless execution.