pg_duckdb Platform¶
CLI name:
pg-duckdb— usebenchbox run --platform pg-duckdb
pg_duckdb is a PostgreSQL extension that embeds DuckDB’s columnar-vectorized analytics engine inside PostgreSQL, accelerating OLAP queries without requiring SQL changes or data migration. BenchBox provides first-class pg_duckdb support for benchmarking DuckDB-accelerated PostgreSQL workloads.
Features¶
DuckDB vectorized execution - Analytical queries routed through DuckDB’s high-performance engine
No SQL changes needed - Existing PostgreSQL queries work transparently
Configurable execution - Force DuckDB execution or let PostgreSQL decide per query
Parallel table scanning - Configurable thread count for PostgreSQL heap table scans
PostgreSQL compatible - Uses PostgreSQL COPY for efficient bulk loading
MotherDuck integration - Optional cloud offload for hybrid local+cloud queries
Two deployment modes - Self-hosted and MotherDuck
Deployment Modes¶
pg_duckdb supports two deployment modes, selectable via the colon syntax:
# Self-hosted (default) - PostgreSQL with pg_duckdb extension
benchbox run --platform pg-duckdb --benchmark tpch --scale 1.0
# MotherDuck mode - hybrid local+cloud queries
benchbox run --platform pg-duckdb:motherduck --benchmark tpch --scale 1.0
Self-Hosted Mode (Default)¶
Connect to a PostgreSQL server with pg_duckdb extension installed:
pg_duckdb 1.0+ extension must be installed on the server
Full database management (CREATE/DROP database) available
DuckDB execution engine handles analytical queries on heap tables
MotherDuck Mode¶
Connect pg_duckdb to MotherDuck for hybrid queries:
Requires MOTHERDUCK_TOKEN for authentication
Enables queries that join local PostgreSQL tables with MotherDuck cloud data
Sign up at motherduck.com
MotherDuck Configuration:
# Set MotherDuck token
export MOTHERDUCK_TOKEN=your-token-here
# Run benchmark with MotherDuck hybrid mode
benchbox run --platform pg-duckdb:motherduck --benchmark tpch --scale 1.0
Installation¶
Python Dependencies¶
pg_duckdb uses the same Python driver as PostgreSQL:
uv add psycopg2-binary
Server Requirements¶
pg_duckdb must be installed on the PostgreSQL server. See the pg_duckdb GitHub repository for installation instructions.
Docker (recommended for testing):
docker run -d --name pg-duckdb \
-e POSTGRES_PASSWORD=benchbox \
-p 5432:5432 \
pgduckdb/pgduckdb:17-v1.1.1
# Verify extension
psql -h localhost -U postgres -c "CREATE EXTENSION pg_duckdb;"
Quick Start¶
# Basic TPC-H benchmark
benchbox run --platform pg-duckdb --benchmark tpch --scale 0.01
# With custom connection
benchbox run --platform pg-duckdb --benchmark tpch --scale 1.0 \
--platform-option host=pgduckdb.example.com \
--platform-option password=secret
# Disable force execution (let PostgreSQL choose when to use DuckDB)
benchbox run --platform pg-duckdb --benchmark tpch --scale 1.0 \
--platform-option force_execution=false
# Configure parallel scan threads
benchbox run --platform pg-duckdb --benchmark tpch --scale 1.0 \
--platform-option postgres_scan_threads=8
Configuration Options¶
Option |
Default |
Description |
|---|---|---|
|
|
PostgreSQL server hostname |
|
|
PostgreSQL server port |
|
auto-generated |
Database name |
|
|
PostgreSQL username |
|
(none) |
PostgreSQL password |
|
|
PostgreSQL schema name |
|
|
Force DuckDB execution for all queries |
|
|
Threads for parallel PostgreSQL table scanning |
How It Works¶
pg_duckdb accelerates queries by routing them through DuckDB’s vectorized execution engine:
Schema creation - Standard PostgreSQL DDL (CREATE TABLE) is used
Data loading - PostgreSQL COPY loads data into standard heap tables
Query execution - With
force_execution=true, queries are executed by DuckDB’s engineResults - Results are returned through the standard PostgreSQL wire protocol
Performance Characteristics¶
Up to ~4x faster than PostgreSQL with full indexes on TPC-H
Up to 1500x faster on specific TPC-DS queries vs unindexed PostgreSQL
Row-oriented storage - Data remains in PostgreSQL heap format (no columnar compression)
Best for analytical scans, aggregations, and joins
Less benefit for point lookups and index-dependent OLTP queries
Comparison with Other Platforms¶
Aspect |
pg_duckdb |
Native DuckDB |
PostgreSQL |
|---|---|---|---|
Storage |
PostgreSQL heap |
DuckDB columnar |
PostgreSQL heap |
Execution |
DuckDB vectorized |
DuckDB vectorized |
PostgreSQL row-based |
SQL dialect |
PostgreSQL |
DuckDB |
PostgreSQL |
Data loading |
COPY (psycopg2) |
Direct file read |
COPY (psycopg2) |
Compression |
No (heap tables) |
Yes (columnar) |
No (heap tables) |
Best for |
Adding analytics to existing PG |
Standalone analytics |
Transactional + basic analytics |
Known Limitations¶
Extension conflicts - pg_duckdb cannot coexist with pg_mooncake (shared libduckdb.so). Use separate PostgreSQL instances for each.
Row-oriented storage - Queries operate on PostgreSQL’s row-oriented heap tables, missing DuckDB’s columnar compression benefits.
Server installation required - pg_duckdb must be pre-installed on the PostgreSQL server; BenchBox cannot install it remotely.
GUC permissions - Thread count tuning may require superuser or pg_duckdb role privileges.
Recommended Benchmarks¶
TPC-H - Standard OLAP benchmark; shows pg_duckdb’s analytical acceleration
TPC-DS - Complex OLAP benchmark; demonstrates vectorized execution benefits
ClickBench - Real-world analytical queries; good for comparison with columnar stores