Microsoft Fabric¶
BenchBox supports Microsoft Fabric Warehouse for SQL-based analytical benchmarking with Entra ID authentication and OneLake integration.
Overview¶
Property |
Value |
|---|---|
Platform Name |
|
SQL Dialect |
T-SQL (SQL Server subset) |
Authentication |
Entra ID only (service principal, default credential, interactive) |
Data Loading |
COPY INTO via OneLake staging |
Status |
Production-ready for Warehouse items |
Important Limitations¶
This adapter ONLY supports Fabric Warehouse items:
Fabric Item |
Support Status |
Reason |
|---|---|---|
Warehouse |
Supported |
Full T-SQL DDL/DML support |
Lakehouse |
Not Supported |
SQL Analytics Endpoint is READ-ONLY |
KQL Database |
Not Supported |
Kusto Query Language, not T-SQL |
Mirrored Database |
Not Supported |
Different architecture |
For Lakehouse benchmarking, you would need Spark integration via Livy API, which is not currently implemented.
Installation¶
# Install BenchBox with Fabric dependencies
uv add benchbox --extra fabric
# Or with pip
pip install "benchbox[fabric]"
Required Dependencies¶
pyodbc- ODBC driver connectivityazure-identity- Entra ID authenticationazure-storage-file-datalake- OneLake stagingODBC Driver 18 for SQL Server - System driver (required)
System Requirements¶
macOS:
brew install unixodbc msodbcsql18
Ubuntu/Debian:
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
Windows: Download from Microsoft Download Center.
Quick Start¶
CLI Usage¶
# Run TPC-H on Fabric Warehouse
benchbox run --platform fabric --benchmark tpch --scale 0.1 \
--platform-option workspace=your-workspace-guid \
--platform-option warehouse=your_warehouse_name
# With service principal authentication
benchbox run --platform fabric --benchmark tpch --scale 1 \
--platform-option workspace=your-workspace-guid \
--platform-option warehouse=your_warehouse_name \
--platform-option auth_method=service_principal \
--platform-option tenant_id=your-tenant-id \
--platform-option client_id=your-client-id \
--platform-option client_secret=your-secret
Programmatic Usage¶
from benchbox.platforms.fabric_warehouse import FabricWarehouseAdapter
from benchbox import TPCH
# Initialize adapter
adapter = FabricWarehouseAdapter(
workspace="your-workspace-guid",
warehouse="your_warehouse_name",
auth_method="default_credential" # Uses Azure CLI or managed identity
)
# Initialize benchmark
benchmark = TPCH(scale_factor=0.1)
# Generate data
benchmark.generate_data()
# Load data into Fabric
adapter.load_benchmark_data(benchmark)
# Run queries
results = adapter.run_benchmark(benchmark)
Authentication Methods¶
Default Credential (Recommended)¶
Uses Azure CLI, managed identity, or environment variables:
adapter = FabricWarehouseAdapter(
workspace="your-workspace-guid",
warehouse="your_warehouse_name",
auth_method="default_credential"
)
Prerequisites:
Azure CLI logged in:
az loginOr running on Azure with managed identity
Or set environment variables:
AZURE_CLIENT_IDAZURE_CLIENT_SECRETAZURE_TENANT_ID
Service Principal¶
For automated/CI environments:
adapter = FabricWarehouseAdapter(
workspace="your-workspace-guid",
warehouse="your_warehouse_name",
auth_method="service_principal",
tenant_id="your-tenant-id",
client_id="your-client-id",
client_secret="your-client-secret"
)
Required Permissions:
Fabric workspace member (Contributor or Admin)
OneLake data access for staging
Interactive Browser¶
For development/testing:
adapter = FabricWarehouseAdapter(
workspace="your-workspace-guid",
warehouse="your_warehouse_name",
auth_method="interactive"
)
Opens browser for Entra ID login.
Configuration Options¶
Option |
Type |
Default |
Description |
|---|---|---|---|
|
str |
Required |
Fabric workspace GUID |
|
str |
Required |
Warehouse item name |
|
str |
|
Authentication method |
|
str |
- |
Azure tenant ID (for service principal) |
|
str |
- |
Service principal client ID |
|
str |
- |
Service principal client secret |
|
str |
|
OneLake staging path |
|
str |
|
Default schema |
|
int |
30 |
Connection timeout (seconds) |
|
int |
0 |
Query timeout (0 = no limit) |
|
bool |
True |
Disable result caching |
|
str |
|
ODBC driver name |
OneLake Data Staging¶
Fabric Warehouse uses OneLake for data staging during bulk loads:
https://onelake.dfs.fabric.microsoft.com/{workspace}/{warehouse}.Warehouse/Files/benchbox-staging/
How Data Loading Works¶
Upload to OneLake: Data files uploaded to staging path
COPY INTO: SQL command loads from OneLake path
Cleanup: Staging files removed after load
-- Example COPY INTO (generated by adapter)
COPY INTO dbo.lineitem
FROM 'https://onelake.dfs.fabric.microsoft.com/{workspace}/{warehouse}.Warehouse/Files/benchbox-staging/lineitem.csv'
WITH (
FILE_TYPE = 'CSV',
FIELDTERMINATOR = '|',
FIRSTROW = 2
);
SQL Dialect Notes¶
Fabric uses T-SQL with some limitations compared to SQL Server:
Supported¶
Standard T-SQL queries
CTEs (Common Table Expressions)
Window functions
Aggregate functions
Joins (including hash, merge, nested loop)
Not Supported¶
Clustered indexes - Fabric uses automatic indexing
Distribution keys - Automatic distribution management
Partitioning - Fabric handles internally
V-Order optimization - Spark-only feature (Lakehouse)
Some T-SQL features - Cursor types, certain stored procedures
Query Translation¶
BenchBox automatically translates queries for Fabric:
from benchbox import TPCH
# TPC-H queries translated from DuckDB dialect
benchmark = TPCH(scale_factor=1.0)
query = benchmark.get_query(1) # Returns T-SQL compatible query
Performance Considerations¶
Result Cache¶
By default, BenchBox disables Fabric’s result cache for accurate benchmarking:
adapter = FabricWarehouseAdapter(
workspace="...",
warehouse="...",
disable_result_cache=True # Default
)
To enable (for production-like testing):
adapter = FabricWarehouseAdapter(
workspace="...",
warehouse="...",
disable_result_cache=False
)
Capacity Considerations¶
Fabric uses Capacity Units (CUs) for billing:
F2: 2 CUs - Development/testing
F4-F64: 4-64 CUs - Production workloads
F128+: Large-scale analytics
Recommendations:
Use F4+ for TPC-H scale factor > 1
Use F16+ for TPC-DS
Consider pausing capacity between benchmark runs
Cost Optimization¶
# Pause Fabric capacity between runs (via Azure CLI)
az fabric capacity suspend --capacity-name my-capacity
# Resume when ready to benchmark
az fabric capacity resume --capacity-name my-capacity
Troubleshooting¶
Connection Errors¶
“Login timeout expired”
# Check ODBC driver installation
odbcinst -q -d
# Verify workspace/warehouse names
az fabric workspace list
“Cannot open server”
Verify workspace GUID is correct
Check network connectivity to
*.datawarehouse.fabric.microsoft.comEnsure same-region deployment (cross-region not supported)
Authentication Errors¶
“AADSTS700016: Application not found”
Verify service principal exists in correct tenant
Check client ID is correct
“AADSTS7000215: Invalid client secret”
Regenerate client secret
Check for expired credentials
Data Loading Errors¶
“Access to OneLake denied”
Verify service principal has OneLake access
Check workspace permissions include data access
“COPY INTO failed”
Verify staging path exists
Check file format matches data
Ensure column count matches table schema
Differences from Azure Synapse¶
Feature |
Fabric Warehouse |
Azure Synapse |
|---|---|---|
Authentication |
Entra ID only |
SQL + Entra ID |
Storage |
OneLake |
Azure Blob/ADLS |
Distribution |
Automatic |
User-specified |
Indexing |
Automatic |
Manual |
File Format |
Delta Lake |
Multiple |
Cross-region |
Not supported |
Supported |
Pricing |
Capacity Units |
DWU/cDWU |