TSBS DevOps Benchmark¶

Tags intermediate concept tsbs-devops custom-benchmark

Overview¶

The Time Series Benchmark Suite (TSBS) DevOps benchmark simulates infrastructure monitoring workloads typical of DevOps and observability platforms. Based on the official TSBS implementation by Timescale, this benchmark generates realistic time-series data representing CPU, memory, disk, and network metrics from a fleet of monitored hosts.

The benchmark is ideal for evaluating time-series databases, OLAP systems handling temporal data, and infrastructure monitoring solutions.

Key Features¶

Realistic metrics - CPU, memory, disk I/O, and network statistics
Host metadata - Tags for region, datacenter, service, team
Diurnal patterns - Realistic daily usage patterns
Configurable scale - From 10 hosts to thousands
18 DevOps queries - Common monitoring and alerting patterns
Multiple dialects - Standard SQL, ClickHouse, TimescaleDB, InfluxDB support

Data Model¶

The TSBS DevOps benchmark uses a dimensional model with a tags table and four metric tables:

Tables¶

Table	Purpose	Rows per SF=1
tags	Host metadata and dimensions	100
cpu	CPU usage metrics per timestamp	~864,000
mem	Memory metrics per timestamp	~864,000
disk	Disk I/O metrics per device	~1,728,000
net	Network metrics per interface	~1,728,000

tags Table¶

Column	Type	Description
`hostname`	VARCHAR	Unique host identifier (PK)
`region`	VARCHAR	Cloud region (us-east-1, eu-west-1, etc.)
`datacenter`	VARCHAR	Datacenter name
`rack`	VARCHAR	Rack identifier
`os`	VARCHAR	Operating system (linux, windows)
`arch`	VARCHAR	CPU architecture (x86_64, arm64)
`team`	VARCHAR	Team owner
`service`	VARCHAR	Service name
`service_version`	VARCHAR	Service version
`service_environment`	VARCHAR	Environment (prod, staging, dev)

cpu Table¶

Column	Type	Description
`time`	TIMESTAMP	Measurement timestamp (PK)
`hostname`	VARCHAR	Host identifier (PK)
`usage_user`	DOUBLE	CPU % in user space
`usage_system`	DOUBLE	CPU % in kernel space
`usage_idle`	DOUBLE	CPU % idle
`usage_nice`	DOUBLE	CPU % nice priority
`usage_iowait`	DOUBLE	CPU % waiting for I/O
`usage_irq`	DOUBLE	CPU % hardware interrupts
`usage_softirq`	DOUBLE	CPU % software interrupts
`usage_steal`	DOUBLE	CPU % stolen by hypervisor
`usage_guest`	DOUBLE	CPU % running guest VMs
`usage_guest_nice`	DOUBLE	CPU % guest nice priority

mem Table¶

Column	Type	Description
`time`	TIMESTAMP	Measurement timestamp (PK)
`hostname`	VARCHAR	Host identifier (PK)
`total`	BIGINT	Total memory bytes
`available`	BIGINT	Available memory bytes
`used`	BIGINT	Used memory bytes
`free`	BIGINT	Free memory bytes
`cached`	BIGINT	Cached memory bytes
`buffered`	BIGINT	Buffered memory bytes
`used_percent`	DOUBLE	Memory usage percent
`available_percent`	DOUBLE	Available memory percent

disk Table¶

Column	Type	Description
`time`	TIMESTAMP	Measurement timestamp (PK)
`hostname`	VARCHAR	Host identifier (PK)
`device`	VARCHAR	Disk device name (PK)
`reads_completed`	BIGINT	Total read operations
`writes_completed`	BIGINT	Total write operations
`read_time_ms`	BIGINT	Read time in milliseconds
`write_time_ms`	BIGINT	Write time in milliseconds
`io_in_progress`	INTEGER	Current I/O operations

net Table¶

Column	Type	Description
`time`	TIMESTAMP	Measurement timestamp (PK)
`hostname`	VARCHAR	Host identifier (PK)
`interface`	VARCHAR	Network interface name (PK)
`bytes_recv`	BIGINT	Bytes received
`bytes_sent`	BIGINT	Bytes sent
`packets_recv`	BIGINT	Packets received
`packets_sent`	BIGINT	Packets sent
`err_in`	BIGINT	Receive errors
`err_out`	BIGINT	Send errors
`drop_in`	BIGINT	Dropped incoming packets
`drop_out`	BIGINT	Dropped outgoing packets

Query Categories¶

The benchmark includes 18 queries organized into categories:

Single Host Queries¶

Metrics for individual hosts over time ranges:

single-host-12-hr: CPU usage for one host over 12 hours
single-host-1-hr: Detailed CPU for one host over 1 hour

Aggregation Queries¶

Cross-host aggregations:

cpu-max-all-1-hr: Maximum CPU across all hosts (1 hour)
cpu-max-all-8-hr: Maximum CPU across all hosts (8 hours)

GroupBy Queries¶

Time-bucketed aggregations:

double-groupby-1-hr: CPU grouped by host and minute
double-groupby-5-min: Fine-grained CPU grouping

Threshold Queries¶

Alert-style threshold filters:

high-cpu-1-hr: Hosts with CPU > 90%
high-cpu-12-hr: Sustained high CPU hosts
low-memory-hosts: Hosts with available memory < 10%
net-errors: Hosts with network errors

Memory Queries¶

Memory-specific analytics:

mem-by-host-1-hr: Memory statistics per host

Disk Queries¶

Disk I/O analytics:

disk-iops-1-hr: Read/write operations per host
disk-latency: Average disk latency analysis

Network Queries¶

Network throughput analytics:

net-throughput-1-hr: Bytes sent/received per host

Combined Queries¶

Cross-metric correlation:

resource-utilization: Combined CPU and memory per host

Lastpoint Queries¶

Most recent values (common in dashboards):

lastpoint: Most recent metrics per host

Tag-filtered Queries¶

Filtering by host metadata:

by-region: Metrics filtered by cloud region
by-service: Metrics grouped by service

Usage Examples¶

Basic Benchmark Setup¶

from benchbox import TSBSDevOps

# Initialize TSBS DevOps benchmark (SF=1 = 100 hosts, 1 day)
tsbs = TSBSDevOps(scale_factor=1.0, output_dir="tsbs_data")

# Generate time-series data
data_files = tsbs.generate_data()

# Get all queries
queries = tsbs.get_queries()
print(f"Generated {len(queries)} TSBS queries")

# Get specific query
cpu_query = tsbs.get_query("cpu-max-all-1-hr")
print(cpu_query)

Custom Configuration¶

# Configure specific hosts and duration
tsbs_custom = TSBSDevOps(
    scale_factor=0.5,
    output_dir="tsbs_custom",
    num_hosts=50,           # Override: 50 hosts
    duration_days=7,        # Override: 7 days of data
    interval_seconds=60,    # 1-minute intervals
)
data_files = tsbs_custom.generate_data()

DuckDB Integration¶

import duckdb
from benchbox import TSBSDevOps

# Initialize and generate data
tsbs = TSBSDevOps(scale_factor=0.1, output_dir="tsbs_small")
data_files = tsbs.generate_data()

# Create DuckDB connection and schema
conn = duckdb.connect("tsbs.duckdb")
schema_sql = tsbs.get_create_tables_sql(dialect="duckdb")

for stmt in schema_sql.split(";"):
    if stmt.strip():
        conn.execute(stmt)

# Load data
for table_name, file_path in tsbs.tables.items():
    conn.execute(f"""
        INSERT INTO {table_name}
        SELECT * FROM read_csv('{file_path}', header=true, auto_detect=true)
    """)

# Run queries
for query_id in ["cpu-max-all-1-hr", "high-cpu-1-hr", "lastpoint"]:
    query_sql = tsbs.get_query(query_id)
    result = conn.execute(query_sql).fetchall()
    print(f"{query_id}: {len(result)} rows")

conn.close()

TimescaleDB Integration¶

from benchbox import TSBSDevOps

tsbs = TSBSDevOps(scale_factor=1.0)

# Get TimescaleDB-optimized schema with hypertables
schema_sql = tsbs.get_create_tables_sql(
    dialect="timescale",
    time_partitioning=True,
)
print(schema_sql)
# Includes: SELECT create_hypertable('cpu', 'time', ...)

ClickHouse Integration¶

from benchbox import TSBSDevOps

tsbs = TSBSDevOps(scale_factor=1.0)

# Get ClickHouse-optimized schema
schema_sql = tsbs.get_create_tables_sql(
    dialect="clickhouse",
    time_partitioning=True,
)
print(schema_sql)
# Includes: ENGINE = MergeTree() ORDER BY (...) PARTITION BY toYYYYMMDD(time)

InfluxDB Integration¶

InfluxDB 3.x uses FlightSQL for SQL queries and Line Protocol for data ingestion. BenchBox handles this automatically via the InfluxDB adapter.

from benchbox.platforms.influxdb import InfluxDBAdapter
from benchbox import TSBSDevOps

# Initialize TSBS DevOps benchmark
tsbs = TSBSDevOps(scale_factor=0.1, output_dir="tsbs_influx")
data_files = tsbs.generate_data()

# Create InfluxDB adapter (Core/OSS mode)
adapter = InfluxDBAdapter(
    mode="core",
    host="localhost",
    port=8086,
    token="your-influxdb-token",
    database="benchmarks",
    ssl=False,
)

# Create connection
conn = adapter.create_connection()

# InfluxDB auto-creates schema from Line Protocol writes
# Load data (converts CSV to Line Protocol)
row_counts, load_time, metadata = adapter.load_data(tsbs, conn, tsbs.output_dir)
print(f"Loaded {metadata['total_rows']:,} rows in {load_time:.2f}s")

# Get InfluxDB-compatible queries (uses DataFusion SQL)
for query_id in ["cpu-max-all-1-hr", "high-cpu-1-hr", "lastpoint"]:
    query_sql = tsbs.get_query(query_id, dialect="influxdb")
    exec_time, row_count, _ = adapter.execute_query(conn, query_sql, query_id)
    print(f"{query_id}: {row_count} rows in {exec_time:.3f}s")

adapter.close_connection(conn)

InfluxDB Cloud mode:

# InfluxDB Cloud (Serverless/Dedicated/Clustered)
adapter = InfluxDBAdapter(
    mode="cloud",
    host="us-east-1-1.aws.cloud2.influxdata.com",
    token="your-cloud-token",
    org="your-org",
    database="benchmarks",
)

Key InfluxDB Considerations:

Line Protocol: Data is loaded via InfluxDB’s native Line Protocol format for optimal ingest performance
Schema auto-creation: Tables (measurements) are auto-created on first write
SQL via FlightSQL: Queries use standard SQL (powered by Apache DataFusion)
Tags vs Fields: hostname becomes a tag (indexed), metrics become fields
No DELETE: InfluxDB Core doesn’t support deletes; use retention policies instead

Scale Factor Guidelines¶

Scale Factor	Hosts	Duration	CPU Rows	Total Rows	Use Case
0.01	10	1 day	~86K	~430K	Quick testing
0.1	10	1 day	~86K	~430K	Development
1.0	100	1 day	~864K	~5M	Standard benchmark
10.0	1000	10 days	~86M	~500M	Performance testing
100.0	1000	100 days	~864M	~5B	Large scale testing

Data Generation Patterns¶

The generator creates realistic data with:

Diurnal CPU patterns: Higher usage during business hours (9am-5pm)
Memory growth: Gradual memory increase with periodic GC drops
Disk I/O bursts: 5% chance of 10x burst per interval
Network errors: Rare errors (~0.1%) and drops (~0.2%)
Tag distributions: 75% Linux, 50% production, balanced regions

Performance Characteristics¶

Query Performance Patterns¶

Single Host Queries:

Bottleneck: Time range filtering
Optimization: Index on (hostname, time)
Typical performance: Fast (milliseconds)

Aggregation Queries:

Bottleneck: Full scan of time range
Optimization: Columnar storage, vectorized execution
Typical performance: Medium (seconds)

GroupBy Queries:

Bottleneck: Hash aggregation memory
Optimization: Pre-aggregation, materialized views
Typical performance: Medium to slow

Threshold Queries:

Bottleneck: Filtering efficiency
Optimization: Bloom filters, sparse indexes
Typical performance: Fast with good indexes

Lastpoint Queries:

Bottleneck: Finding max timestamp per group
Optimization: Specialized last-value indexes
Typical performance: Critical for dashboards

Best Practices¶

Data Generation¶

Match your monitoring interval - Use realistic intervals (10s, 30s, 60s)
Scale hosts appropriately - Test with expected fleet size
Consider retention - Duration affects storage testing

Query Optimization¶

Partition by time - Essential for time-series databases
Index on hostname - For single-host query performance
Pre-aggregate - Materialized views for dashboards

Time-Series Database Tips¶

Use native types - TIMESTAMPTZ, DateTime64
Enable compression - Time-series compresses well
Consider downsampling - For long-term storage

External Resources¶

TSBS GitHub Repository - Original implementation
TimescaleDB Documentation - Time-series optimization
InfluxDB Line Protocol - Time-series data format