pg_mooncake Platform¶
CLI name:
pg-mooncake— usebenchbox run --platform pg-mooncake
pg_mooncake is a PostgreSQL extension that adds native columnstore tables with DuckDB-powered vectorized execution. Data is stored in Parquet format with Iceberg metadata, providing 5-20x columnar compression and top-10 ClickBench performance. BenchBox provides first-class pg_mooncake support for benchmarking columnstore PostgreSQL workloads.
Features¶
Columnstore tables -
CREATE TABLE ... USING columnstorefor Parquet-based columnar storageDuckDB execution - Vectorized query engine for analytical workloads
Columnar compression - 5-20x compression ratios vs PostgreSQL heap tables
Object storage - Optional S3/GCS backend for Parquet data
PostgreSQL compatible - Uses standard PostgreSQL wire protocol and COPY loading
Top-10 ClickBench - Competitive with dedicated OLAP engines on real-world analytical queries
Quick Start¶
# Basic TPC-H benchmark with columnstore tables
benchbox run --platform pg-mooncake --benchmark tpch --scale 0.01
# With custom connection
benchbox run --platform pg-mooncake --benchmark tpch --scale 1.0 \
--platform-option host=mooncake.example.com \
--platform-option password=secret
# With S3 storage backend
benchbox run --platform pg-mooncake --benchmark tpch --scale 1.0 \
--platform-option storage_mode=s3 \
--platform-option mooncake_bucket=s3://my-bucket/mooncake-data
Installation¶
Python Dependencies¶
pg_mooncake uses the same Python driver as PostgreSQL:
uv add psycopg2-binary
Server Requirements¶
pg_mooncake must be installed on the PostgreSQL server. The recommended approach is Docker.
Docker (recommended):
docker run -d --name pg-mooncake \
-e POSTGRES_PASSWORD=benchbox \
-p 5432:5432 \
mooncakelabs/pg_mooncake:latest
# Verify extension
psql -h localhost -U postgres -c "CREATE EXTENSION pg_mooncake;"
See the pg_mooncake GitHub repository for additional installation methods.
How It Works¶
pg_mooncake transforms PostgreSQL into a hybrid analytical database:
Schema creation - BenchBox adds
USING columnstoreto allCREATE TABLEstatementsData loading - PostgreSQL COPY loads data, which pg_mooncake stores as Parquet files
Query execution - Queries on columnstore tables are routed through DuckDB’s vectorized engine
Results - Results are returned through the standard PostgreSQL wire protocol
Storage Architecture¶
PostgreSQL Instance
├── pg_mooncake extension
│ ├── Columnstore Access Method (USING columnstore)
│ ├── pg_moonlink (WAL-based ingestion)
│ └── DuckDB execution engine
└── Storage Backend
├── Local disk (default) → Parquet files
└── Object storage (S3/GCS) → Parquet files with Iceberg metadata
Configuration Options¶
Option |
Default |
Description |
|---|---|---|
|
|
PostgreSQL server hostname |
|
|
PostgreSQL server port |
|
auto-generated |
Database name |
|
|
PostgreSQL username |
|
(none) |
PostgreSQL password |
|
|
PostgreSQL schema name |
|
|
Storage backend: |
|
(none) |
S3/GCS bucket URL (required when |
S3 Storage Configuration¶
# Set S3 bucket via environment variable
export MOONCAKE_S3_BUCKET=s3://my-bucket/mooncake-data
# Or via platform option
benchbox run --platform pg-mooncake --benchmark tpch \
--platform-option storage_mode=s3 \
--platform-option mooncake_bucket=s3://my-bucket/data
S3 mode requires standard AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY).
Comparison with Other Platforms¶
Aspect |
pg_mooncake |
pg_duckdb |
Native DuckDB |
PostgreSQL |
|---|---|---|---|---|
Storage |
Columnstore (Parquet) |
PostgreSQL heap |
DuckDB columnar |
PostgreSQL heap |
Execution |
DuckDB vectorized |
DuckDB vectorized |
DuckDB vectorized |
PostgreSQL row-based |
Compression |
5-20x (columnar) |
No (heap) |
Yes (columnar) |
No (heap) |
DDL changes |
|
None |
N/A |
None |
Workload |
OLAP columnstore |
OLAP acceleration |
Standalone analytics |
General purpose |
ClickBench rank |
Top 10 |
Not ranked |
Top 5 |
Not ranked |
Known Limitations¶
Extension conflicts - pg_mooncake cannot coexist with standalone pg_duckdb (shared libduckdb.so). Use separate PostgreSQL instances for each.
Data type support - Not all PostgreSQL data types are supported on columnstore tables. The extension is actively evolving.
No constraints - Columnstore tables don’t support PRIMARY KEY, FOREIGN KEY, or other constraints.
No indexes - B-tree and other indexes are not applicable to columnstore tables.
Server installation required - pg_mooncake must be pre-installed on the PostgreSQL server; Docker is the recommended setup method.
Recommended Benchmarks¶
TPC-H - Standard OLAP benchmark; demonstrates columnstore compression and DuckDB execution benefits
ClickBench - Real-world analytical queries; pg_mooncake is a top-10 performer
TPC-DS - Complex OLAP; note some data types may not be fully supported yet