Microsoft Fabric

Tags intermediate guide fabric cloud-platform

BenchBox supports Microsoft Fabric Warehouse for read/write SQL benchmarking, plus separate Fabric Spark and Fabric Lakehouse SQL adapters for Lakehouse workflows.

Overview

Property

Value

Platform Name

fabric

SQL Dialect

T-SQL (SQL Server subset)

Authentication

Entra ID only (service principal, default credential, interactive)

Data Loading

COPY INTO via OneLake staging

Status

Production-ready for Warehouse items

Important Limitations (Warehouse Adapter)

This adapter ONLY supports Microsoft Fabric Warehouse items:

Fabric Item

Support Status

Reason

Warehouse

Supported

Full T-SQL DDL/DML support

Lakehouse

Supported via fabric-lakehouse

SQL Analytics Endpoint is READ-ONLY

KQL Database

Not Supported

Kusto Query Language, not T-SQL

Mirrored Database

Not Supported

Different architecture

For Lakehouse benchmarking, use Spark for load (fabric-spark) and SQL endpoint for query-only phases (fabric-lakehouse).

Installation

# Install BenchBox with Fabric dependencies
uv add benchbox --extra fabric

# Or with pip
pip install "benchbox[fabric]"

Required Dependencies

  • pyodbc - ODBC driver connectivity

  • azure-identity - Entra ID authentication

  • azure-storage-file-datalake - OneLake staging

  • ODBC Driver 18 for SQL Server - System driver (required)

System Requirements

macOS:

brew install unixodbc msodbcsql18

Ubuntu/Debian:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18

Windows: Download from Microsoft Download Center.

Quick Start

CLI Usage

# Run TPC-H on Fabric Warehouse
benchbox run --platform fabric --benchmark tpch --scale 0.1 \
  --platform-option workspace=your-workspace-guid \
  --platform-option warehouse=your_warehouse_name

# With service principal authentication
benchbox run --platform fabric --benchmark tpch --scale 1 \
  --platform-option workspace=your-workspace-guid \
  --platform-option warehouse=your_warehouse_name \
  --platform-option auth_method=service_principal \
  --platform-option tenant_id=your-tenant-id \
  --platform-option client_id=your-client-id \
  --platform-option client_secret=your-secret

Programmatic Usage

from benchbox.platforms.fabric_warehouse import FabricWarehouseAdapter
from benchbox import TPCH

# Initialize adapter
adapter = FabricWarehouseAdapter(
    workspace="your-workspace-guid",
    warehouse="your_warehouse_name",
    auth_method="default_credential"  # Uses Azure CLI or managed identity
)

# Initialize benchmark
benchmark = TPCH(scale_factor=0.1)

# Generate data
benchmark.generate_data()

# Load data into Fabric
adapter.load_benchmark_data(benchmark)

# Run queries
results = adapter.run_benchmark(benchmark)

Authentication Methods

Service Principal

For automated/CI environments:

adapter = FabricWarehouseAdapter(
    workspace="your-workspace-guid",
    warehouse="your_warehouse_name",
    auth_method="service_principal",
    tenant_id="your-tenant-id",
    client_id="your-client-id",
    client_secret="your-client-secret"
)

Required Permissions:

  • Fabric workspace member (Contributor or Admin)

  • OneLake data access for staging

Interactive Browser

For development/testing:

adapter = FabricWarehouseAdapter(
    workspace="your-workspace-guid",
    warehouse="your_warehouse_name",
    auth_method="interactive"
)

Opens browser for Entra ID login.

Configuration Options

Option

Type

Default

Description

workspace

str

Required

Fabric workspace GUID

warehouse

str

Required

Warehouse item name

auth_method

str

default_credential

Authentication method

tenant_id

str

-

Azure tenant ID (for service principal)

client_id

str

-

Service principal client ID

client_secret

str

-

Service principal client secret

staging_path

str

benchbox-staging

OneLake staging path

schema

str

dbo

Default schema

connect_timeout

int

30

Connection timeout (seconds)

query_timeout

int

0

Query timeout (0 = no limit)

disable_result_cache

bool

True

Disable result caching

driver

str

ODBC Driver 18 for SQL Server

ODBC driver name

OneLake Data Staging

Microsoft Fabric Warehouse uses OneLake for data staging during bulk loads:

https://onelake.dfs.fabric.microsoft.com/{workspace}/{warehouse}.Warehouse/Files/benchbox-staging/

How Data Loading Works

  1. Upload to OneLake: Data files uploaded to staging path

  2. COPY INTO: SQL command loads from OneLake path

  3. Cleanup: Staging files removed after load

-- Example COPY INTO (generated by adapter)
COPY INTO dbo.lineitem
FROM 'https://onelake.dfs.fabric.microsoft.com/{workspace}/{warehouse}.Warehouse/Files/benchbox-staging/lineitem.csv'
WITH (
    FILE_TYPE = 'CSV',
    FIELDTERMINATOR = '|',
    FIRSTROW = 2
);

SQL Dialect Notes

Fabric uses T-SQL with some limitations compared to SQL Server:

Supported

  • Standard T-SQL queries

  • CTEs (Common Table Expressions)

  • Window functions

  • Aggregate functions

  • Joins (including hash, merge, nested loop)

Not Supported

  • Clustered indexes - Fabric uses automatic indexing

  • Distribution keys - Automatic distribution management

  • Partitioning - Fabric handles internally

  • V-Order optimization - Spark-only feature (Lakehouse)

  • Some T-SQL features - Cursor types, certain stored procedures

Query Translation

BenchBox automatically translates queries for Fabric:

from benchbox import TPCH

# TPC-H queries translated from DuckDB dialect
benchmark = TPCH(scale_factor=1.0)
query = benchmark.get_query(1)  # Returns T-SQL compatible query

Performance Considerations

Result Cache

By default, BenchBox disables Fabric’s result cache for accurate benchmarking:

adapter = FabricWarehouseAdapter(
    workspace="...",
    warehouse="...",
    disable_result_cache=True  # Default
)

To enable (for production-like testing):

adapter = FabricWarehouseAdapter(
    workspace="...",
    warehouse="...",
    disable_result_cache=False
)

Capacity Considerations

Fabric uses Capacity Units (CUs) for billing:

  • F2: 2 CUs - Development/testing

  • F4-F64: 4-64 CUs - Production workloads

  • F128+: Large-scale analytics

Recommendations:

  • Use F4+ for TPC-H scale factor > 1

  • Use F16+ for TPC-DS

  • Consider pausing capacity between benchmark runs

Cost Optimization

# Pause Fabric capacity between runs (via Azure CLI)
az fabric capacity suspend --capacity-name my-capacity

# Resume when ready to benchmark
az fabric capacity resume --capacity-name my-capacity

Troubleshooting

Connection Errors

“Login timeout expired”

# Check ODBC driver installation
odbcinst -q -d

# Verify workspace/warehouse names
az fabric workspace list

“Cannot open server”

  • Verify workspace GUID is correct

  • Check network connectivity to *.datawarehouse.fabric.microsoft.com

  • Ensure same-region deployment (cross-region not supported)

Authentication Errors

“AADSTS700016: Application not found”

  • Verify service principal exists in correct tenant

  • Check client ID is correct

“AADSTS7000215: Invalid client secret”

  • Regenerate client secret

  • Check for expired credentials

Data Loading Errors

“Access to OneLake denied”

  • Verify service principal has OneLake access

  • Check workspace permissions include data access

“COPY INTO failed”

  • Verify staging path exists

  • Check file format matches data

  • Ensure column count matches table schema

Differences from Azure Synapse Analytics

Feature

Microsoft Fabric Warehouse

Azure Synapse Analytics

Authentication

Entra ID only

SQL + Entra ID

Storage

OneLake

Azure Blob/ADLS

Distribution

Automatic

User-specified

Indexing

Automatic

Manual

File Format

Delta Lake

Multiple

Cross-region

Not supported

Supported

Pricing

Capacity Units

DWU/cDWU