Onehouse Quanton Platform¶
Onehouse Quanton is a serverless managed Spark compute runtime that delivers 2-3x better price-performance versus AWS EMR and Databricks, with multi-table-format support for Apache Hudi, Apache Iceberg, and Delta Lake.
Features¶
Multi-format native - Hudi, Iceberg, and Delta Lake support
Serverless Spark - No cluster management overhead
Apache XTable - Cross-format metadata translation
Cost-efficient - No per-cluster fees, pay-as-you-go compute
Open standards - 100% Spark SQL compatible
Prerequisites¶
Onehouse account (onehouse.ai)
AWS account with S3 access for data staging
Onehouse API key
Installation¶
# Install required dependencies
pip install requests boto3
# Or via BenchBox extras
pip install "benchbox[quanton]"
Configuration¶
Environment Variables (Recommended)¶
export ONEHOUSE_API_KEY=your-onehouse-api-key
CLI Options¶
benchbox run --platform quanton --benchmark tpch --scale 1.0 \
--platform-option api_key=your-api-key \
--platform-option s3_staging_dir=s3://your-bucket/benchbox-data \
--platform-option table_format=iceberg
Platform Options¶
Option |
Default |
Description |
|---|---|---|
|
(env) |
Onehouse API key |
|
(required) |
S3 path for data staging (e.g., s3://bucket/path) |
|
us-east-1 |
AWS region for cluster deployment |
|
benchbox |
Database name for benchmarks |
|
iceberg |
Table format: iceberg, hudi, or delta |
|
small |
Cluster size: small, medium, large, xlarge |
|
(auto) |
Hudi record key field (required for hudi format) |
|
(optional) |
Hudi precombine field for ordering during updates |
|
COPY_ON_WRITE |
Hudi table type: COPY_ON_WRITE or MERGE_ON_READ |
Table Format Selection¶
Quanton supports three open table formats, each with distinct strengths:
Format |
Best For |
Key Features |
|---|---|---|
Iceberg |
Enterprise data lakes, multi-engine access |
Schema evolution, hidden partitioning, partition evolution |
Hudi |
Streaming workloads, record-level updates |
Record-level ACID, incremental processing, efficient upserts |
Delta |
Databricks interop, time travel queries |
ACID transactions, unified batch/streaming |
Iceberg (Default)¶
benchbox run --platform quanton --benchmark tpch --scale 1.0 \
--platform-option s3_staging_dir=s3://bucket/data \
--platform-option table_format=iceberg
Hudi¶
# Hudi requires record_key for write operations
benchbox run --platform quanton --benchmark tpch --scale 1.0 \
--platform-option s3_staging_dir=s3://bucket/data \
--platform-option table_format=hudi \
--platform-option record_key=l_orderkey \
--platform-option precombine_field=l_shipdate
Delta Lake¶
benchbox run --platform quanton --benchmark tpch --scale 1.0 \
--platform-option s3_staging_dir=s3://bucket/data \
--platform-option table_format=delta
Usage Examples¶
Basic Benchmark¶
# TPC-H with Iceberg (default)
benchbox run --platform quanton --benchmark tpch --scale 1.0 \
--platform-option s3_staging_dir=s3://my-bucket/benchbox
Production Benchmark¶
# Larger scale with medium cluster
benchbox run --platform quanton --benchmark tpch --scale 10.0 \
--platform-option s3_staging_dir=s3://my-bucket/benchbox \
--platform-option cluster_size=medium \
--platform-option table_format=iceberg
Cross-Format Comparison¶
# Compare performance across table formats
for format in iceberg hudi delta; do
benchbox run --platform quanton --benchmark tpch --scale 1.0 \
--platform-option s3_staging_dir=s3://my-bucket/benchbox \
--platform-option table_format=$format \
--output results/quanton_${format}.json
done
Python API¶
from benchbox import TPCH
from benchbox.platforms.onehouse import QuantonAdapter
adapter = QuantonAdapter(
api_key="your-onehouse-api-key",
s3_staging_dir="s3://my-bucket/benchbox-data",
region="us-east-1",
table_format="iceberg",
cluster_size="small",
)
benchmark = TPCH(scale_factor=1.0)
benchmark.generate_data()
adapter.load_benchmark(benchmark)
results = adapter.run_benchmark(benchmark)
Cluster Sizing¶
Size |
Workers |
Recommended Scale |
Use Case |
|---|---|---|---|
Small |
1-2 |
SF 0.01-1.0 |
Development, testing |
Medium |
2-5 |
SF 1.0-10.0 |
Standard benchmarks |
Large |
5-10 |
SF 10.0-100.0 |
Production workloads |
XLarge |
10+ |
SF 100.0+ |
Large-scale analytics |
Hudi-Specific Configuration¶
When using Hudi table format, additional configuration is required:
Record Key¶
The record key uniquely identifies each record for ACID operations:
# TPC-H lineitem: use composite key
--platform-option record_key=l_orderkey,l_linenumber
# TPC-H orders: use primary key
--platform-option record_key=o_orderkey
Precombine Field¶
The precombine field orders records during deduplication:
# Use date field for ordering
--platform-option precombine_field=l_shipdate
Table Type¶
Choose between COPY_ON_WRITE (faster reads) or MERGE_ON_READ (faster writes):
# Analytics workload (default)
--platform-option hudi_table_type=COPY_ON_WRITE
# Write-heavy workload
--platform-option hudi_table_type=MERGE_ON_READ
Cost Optimization¶
Serverless Benefits¶
No idle cluster costs
Pay only for compute time used
Automatic scaling based on workload
Cluster Auto-Stop¶
Clusters automatically terminate after idle timeout (default: 15 minutes).
S3 Data Reuse¶
Data staged to S3 is reused across runs:
# First run uploads data
benchbox run --platform quanton --benchmark tpch --scale 1.0 \
--platform-option s3_staging_dir=s3://bucket/data
# Subsequent runs skip upload
benchbox run --platform quanton --benchmark tpch --scale 1.0 \
--platform-option s3_staging_dir=s3://bucket/data
Troubleshooting¶
Authentication Failed¶
# Verify API key is valid
curl -H "Authorization: Bearer $ONEHOUSE_API_KEY" \
https://api.onehouse.ai/v1/health
S3 Access Denied¶
Ensure your AWS credentials have access to the S3 staging bucket:
# Test S3 access
aws s3 ls s3://your-bucket/benchbox-data/
Job Timeout¶
For large-scale benchmarks, increase the timeout:
benchbox run --platform quanton --benchmark tpch --scale 100.0 \
--platform-option timeout_minutes=120
Hudi Write Failures¶
Ensure record_key is specified for Hudi format:
# Error: No record_key configured
# Fix: Add record_key parameter
--platform-option record_key=primary_key_column