Apache Iceberg Guide¶
Iceberg provides table format features with true multi-engine support. This guide covers how to benchmark with Iceberg across different query engines, catalog configuration, and multi-engine workflows.
Overview¶
Apache Iceberg was created at Netflix to solve a specific problem: running the same analytics across multiple query engines without locking into one vendor. Unlike Hive tables (Hive metastore-dependent), Iceberg tables work identically across Spark, Trino, Flink, Athena, Snowflake, and more.
For benchmarking, Iceberg’s engine independence enables fair comparisons. We can load TPC-H data once, store it in Iceberg format, and run the same queries on Spark, Trino, and Athena. The data is identical across engines, isolating engine performance from data format differences.
We added Iceberg support to BenchBox for users building multi-engine lakehouses and those running on platforms where Iceberg is the native format (Athena, Starburst, Snowflake).
Multi-Engine Support¶
Supported Engines¶
The same Iceberg table is readable by:
Apache Spark
Trino and Presto
Apache Flink
Amazon Athena
Snowflake
BigQuery
DuckDB (via extension)
This multi-engine support comes from a carefully designed metadata layer that doesn’t depend on any specific query engine’s implementation.
Why Engine Independence Matters¶
For benchmarking, engine independence means:
Load data once, query from multiple engines
Fair comparisons (identical data across engines)
Production-like conditions for multi-engine architectures
No vendor lock-in in your benchmark methodology
Iceberg Architecture¶
Metadata Files¶
iceberg_table/
├── metadata/
│ ├── v1.metadata.json
│ ├── v2.metadata.json (current version)
│ └── snap-xxxxx.avro (manifest list)
├── data/
│ ├── partition=2024-01/
│ │ └── data-xxxxx.parquet
│ └── partition=2024-02/
│ └── data-xxxxx.parquet
└── (catalog points here)
Manifest Lists and Files¶
Metadata files: Track table state, schema, and partitioning
Manifest lists: Point to manifest files per snapshot
Manifest files: List data files and their statistics
Data Files¶
Data files are typically Parquet (or ORC, Avro) containing actual data. Iceberg manages the metadata layer; the underlying storage format is Parquet in most cases.
Iceberg vs Delta Lake¶
Aspect |
Iceberg |
Delta Lake |
|---|---|---|
Engine lock-in |
None |
Databricks-optimized |
Catalog options |
REST, Hive, Glue, Nessie |
Unity Catalog, Hive |
Partition evolution |
Yes (metadata-only) |
Limited |
Hidden partitioning |
Yes |
No |
Community |
Apache project |
Linux Foundation |
Both formats provide ACID transactions and time travel. The key difference is Iceberg’s broader engine support and more flexible partitioning.
Performance Considerations¶
Manifest Overhead¶
Iceberg tracks files through a hierarchy of manifests:
Read current metadata file
Read manifest list (points to manifests)
Read relevant manifest files
Filter to needed data files based on partition pruning
Read data files
Overhead characteristics:
Grows with file count and table history
Mitigated by manifest caching (engine-dependent)
Partition pruning reduces manifest reads
For TPC-H benchmarks with standard data loads, manifest overhead is typically under 50ms. For tables with millions of files, consider compaction.
Mitigation Strategies¶
Use partition pruning to reduce manifest reads
Enable manifest caching where available
Compact manifests for tables with many small files
Use
expire_snapshotsto clean up historical metadata
Advanced Partitioning¶
Partition Evolution¶
Iceberg allows changing partition schemes without rewriting data:
-- Original table partitioned by date
CREATE TABLE events (...) PARTITIONED BY (event_date);
-- Add hour-level partitioning for new data (no rewrite)
ALTER TABLE events ADD PARTITION FIELD hour(event_time);
Benchmark implications:
Partition evolution is metadata-only (fast)
Queries automatically use appropriate partitioning per file
Historical data keeps its original partitioning
Partition Transforms¶
Iceberg supports several partition transforms:
Transform |
Example |
Use Case |
|---|---|---|
identity |
|
Exact value partitioning |
year |
|
Annual aggregations |
month |
|
Monthly aggregations |
day |
|
Daily partitions |
hour |
|
Hourly partitions |
bucket |
|
Distribute by hash |
truncate |
|
Prefix partitioning |
Catalog Configuration¶
REST Catalog¶
The REST catalog is the most portable option, working across engines:
benchbox run --platform spark --format iceberg --catalog rest
AWS Glue¶
For AWS-native workflows with Athena, EMR:
benchbox run --platform spark --format iceberg --catalog glue
Hive Metastore¶
For Spark/Hadoop ecosystems:
benchbox run --platform spark --format iceberg --catalog hive
Nessie¶
For git-like versioning (experimental):
benchbox run --platform spark --format iceberg --catalog nessie
Catalog |
Use Case |
BenchBox Support |
|---|---|---|
REST Catalog |
Standard, portable |
Default |
Hive Metastore |
Spark/Hadoop ecosystems |
Supported |
AWS Glue |
AWS-native (Athena, EMR) |
Supported |
Nessie |
Git-like versioning |
Experimental |
Multi-Engine Workflows¶
Write with One Engine, Read with Another¶
A key Iceberg benefit: load data with Spark, query with Trino (or vice versa).
BenchBox multi-engine benchmark:
# 1. Generate and load data with Spark
benchbox run --platform spark --benchmark tpch --scale 10 \
--format iceberg --phases load
# 2. Run queries with Trino (same Iceberg tables)
benchbox run --platform trino --benchmark tpch --scale 10 \
--format iceberg --phases power
# 3. Run queries with Athena (same tables via Glue)
benchbox run --platform athena --benchmark tpch --scale 10 \
--format iceberg --phases power
Ensuring Consistent Results¶
BenchBox validates query results against reference answers, catching:
Type handling differences between engines
NULL behavior variations
Ordering differences
# Compare results from different engines
benchbox compare-results spark-results.json trino-results.json --validate
Common Multi-Engine Issues¶
1. Schema evolution lag
Some engines support new Iceberg schema features before others. If you add a column with Spark, older Trino versions might not see it.
2. Partition pruning variations
Partition pruning implementation differs by engine. One engine might prune more aggressively than another.
3. Statistics availability
Column-level statistics may not be shared across all engines. This affects query optimization.
4. Caching effects
Engine-specific caching can skew comparisons. BenchBox uses cold cache between runs to mitigate this.
When to Use Iceberg¶
Best-Fit Scenarios¶
Scenario |
Why Iceberg |
|---|---|
Multi-engine comparison |
Same data, fair comparison |
Lakehouse architectures |
Production-like conditions |
AWS-native platforms |
Athena, EMR native support |
Cross-platform portability |
Engine-independent format |
When to Use Parquet Instead¶
Scenario |
Why Parquet |
|---|---|
Single-engine benchmarks |
Simpler, no catalog needed |
Local development |
No external catalog required |
Quick comparisons |
Less setup overhead |
DuckDB-only |
Native Parquet is faster |
See Also¶
Table Format Guides: Overview of all formats
Format Conversion Reference: CLI commands for format conversion
Delta Lake Guide: Alternative table format
Parquet Deep Dive: Foundation format for Iceberg