TSBS DevOps Benchmark¶
CLI name:
tsbs_devops- usebenchbox run --benchmark tsbs_devops
Overview¶
The Time Series Benchmark Suite (TSBS) DevOps benchmark simulates infrastructure monitoring workloads typical of DevOps and observability platforms. Based on the official TSBS implementation by Timescale, this benchmark generates realistic time-series data representing CPU, memory, disk, and network metrics from a fleet of monitored hosts.
The benchmark is ideal for evaluating time-series databases, OLAP systems handling temporal data, and infrastructure monitoring solutions.
Key Features¶
Realistic metrics - CPU, memory, disk I/O, and network statistics
Host metadata - Tags for region, datacenter, service, team
Diurnal patterns - Realistic daily usage patterns
Configurable scale - From 10 hosts to thousands
18 DevOps queries - Common monitoring and alerting patterns
Multiple dialects - Standard SQL, ClickHouse, TimescaleDB, InfluxDB support
Data Model¶
The TSBS DevOps benchmark uses a dimensional model with a tags table and four metric tables:
Tables¶
Table |
Purpose |
Rows per SF=1 |
|---|---|---|
tags |
Host metadata and dimensions |
100 |
cpu |
CPU usage metrics per timestamp |
~1,728,000 |
mem |
Memory metrics per timestamp |
~1,728,000 |
disk |
Disk I/O metrics per device |
~3,456,000 |
net |
Network metrics per interface |
~3,456,000 |
cpu Table¶
Column |
Type |
Description |
|---|---|---|
|
TIMESTAMP |
Measurement timestamp (PK) |
|
VARCHAR |
Host identifier (PK) |
|
DOUBLE |
CPU % in user space |
|
DOUBLE |
CPU % in kernel space |
|
DOUBLE |
CPU % idle |
|
DOUBLE |
CPU % nice priority |
|
DOUBLE |
CPU % waiting for I/O |
|
DOUBLE |
CPU % hardware interrupts |
|
DOUBLE |
CPU % software interrupts |
|
DOUBLE |
CPU % stolen by hypervisor |
|
DOUBLE |
CPU % running guest VMs |
|
DOUBLE |
CPU % guest nice priority |
mem Table¶
Column |
Type |
Description |
|---|---|---|
|
TIMESTAMP |
Measurement timestamp (PK) |
|
VARCHAR |
Host identifier (PK) |
|
BIGINT |
Total memory bytes |
|
BIGINT |
Available memory bytes |
|
BIGINT |
Used memory bytes |
|
BIGINT |
Free memory bytes |
|
BIGINT |
Cached memory bytes |
|
BIGINT |
Buffered memory bytes |
|
DOUBLE |
Memory usage percent |
|
DOUBLE |
Available memory percent |
disk Table¶
Column |
Type |
Description |
|---|---|---|
|
TIMESTAMP |
Measurement timestamp (PK) |
|
VARCHAR |
Host identifier (PK) |
|
VARCHAR |
Disk device name (PK) |
|
BIGINT |
Total read operations (cumulative counter) |
|
BIGINT |
Merged read operations |
|
BIGINT |
Sectors read (cumulative counter) |
|
BIGINT |
Read time in milliseconds (cumulative counter) |
|
BIGINT |
Total write operations (cumulative counter) |
|
BIGINT |
Merged write operations |
|
BIGINT |
Sectors written (cumulative counter) |
|
BIGINT |
Write time in milliseconds (cumulative counter) |
|
INTEGER |
Current I/O operations |
|
BIGINT |
Total I/O time (cumulative counter) |
|
BIGINT |
Weighted I/O time (cumulative counter) |
Note: Columns flagged as cumulative counter accumulate across samples (see
generator.py:_generate_disk_metrics). Compute the delta between consecutive samples per(hostname, device)when reporting rates.
net Table¶
Column |
Type |
Description |
|---|---|---|
|
TIMESTAMP |
Measurement timestamp (PK) |
|
VARCHAR |
Host identifier (PK) |
|
VARCHAR |
Network interface name (PK) |
|
BIGINT |
Bytes received (cumulative counter) |
|
BIGINT |
Bytes sent (cumulative counter) |
|
BIGINT |
Packets received (cumulative counter) |
|
BIGINT |
Packets sent (cumulative counter) |
|
BIGINT |
Receive errors in this sample window |
|
BIGINT |
Send errors in this sample window |
|
BIGINT |
Dropped incoming packets in this sample window |
|
BIGINT |
Dropped outgoing packets in this sample window |
Note:
bytes_*andpackets_*columns are cumulative counters (seegenerator.py:_generate_net_metrics); compute the delta between consecutive samples per(hostname, interface)when reporting rates. Error and drop columns are per-sample gauges (generated as rare 0/1 events), so they can be summed over a window rather than differenced.
Query Categories¶
The benchmark includes 18 queries organized into categories:
Single Host Queries¶
Metrics for individual hosts over time ranges:
single-host-12-hr: CPU usage for one host over 12 hourssingle-host-1-hr: Detailed CPU for one host over 1 hour
Aggregation Queries¶
Cross-host aggregations:
cpu-max-all-1-hr: Maximum CPU across all hosts (1 hour)cpu-max-all-8-hr: Maximum CPU across all hosts (8 hours)
GroupBy Queries¶
Time-bucketed aggregations:
double-groupby-1-hr: CPU grouped by host and minutedouble-groupby-5-min: Fine-grained CPU grouping
Threshold Queries¶
Alert-style threshold filters:
high-cpu-1-hr: Hosts with CPU > 90%high-cpu-12-hr: Sustained high CPU hostslow-memory-hosts: Hosts with available memory < 10%net-errors: Hosts with network errors
Memory Queries¶
Memory-specific analytics:
mem-by-host-1-hr: Memory statistics per host
Disk Queries¶
Disk I/O analytics:
disk-iops-1-hr: Read/write operations per hostdisk-latency: Average disk latency analysis
Network Queries¶
Network throughput analytics:
net-throughput-1-hr: Bytes sent/received per host
Combined Queries¶
Cross-metric correlation:
resource-utilization: Combined CPU and memory per host
Lastpoint Queries¶
Most recent values (common in dashboards):
lastpoint: Most recent metrics per host
Tag-filtered Queries¶
Filtering by host metadata:
by-region: Metrics filtered by cloud regionby-service: Metrics grouped by service
Usage Examples¶
Basic Benchmark Setup¶
from benchbox import TSBSDevOps
# Initialize TSBS DevOps benchmark (SF=1 = 100 hosts, 2 days)
tsbs = TSBSDevOps(scale_factor=1.0, output_dir="tsbs_data")
# Generate time-series data
data_files = tsbs.generate_data()
# Get all queries
queries = tsbs.get_queries()
print(f"Generated {len(queries)} TSBS queries")
# Get specific query
cpu_query = tsbs.get_query("cpu-max-all-1-hr")
print(cpu_query)
Custom Configuration¶
# Configure specific hosts and duration
tsbs_custom = TSBSDevOps(
scale_factor=0.5,
output_dir="tsbs_custom",
num_hosts=50, # Override: 50 hosts
duration_days=7, # Override: 7 days of data
interval_seconds=60, # 1-minute intervals
)
data_files = tsbs_custom.generate_data()
DuckDB Integration¶
import duckdb
from benchbox import TSBSDevOps
# Initialize and generate data
tsbs = TSBSDevOps(scale_factor=0.1, output_dir="tsbs_small")
data_files = tsbs.generate_data()
# Create DuckDB connection and schema
conn = duckdb.connect("tsbs.duckdb")
schema_sql = tsbs.get_create_tables_sql(dialect="duckdb")
for stmt in schema_sql.split(";"):
if stmt.strip():
conn.execute(stmt)
# Load data
for table_name, file_path in tsbs.tables.items():
conn.execute(f"""
INSERT INTO {table_name}
SELECT * FROM read_csv('{file_path}', header=true, auto_detect=true)
""")
# Run queries
for query_id in ["cpu-max-all-1-hr", "high-cpu-1-hr", "lastpoint"]:
query_sql = tsbs.get_query(query_id)
result = conn.execute(query_sql).fetchall()
print(f"{query_id}: {len(result)} rows")
conn.close()
TimescaleDB Integration¶
from benchbox import TSBSDevOps
tsbs = TSBSDevOps(scale_factor=1.0)
# Get TimescaleDB-optimized schema with hypertables
schema_sql = tsbs.get_create_tables_sql(
dialect="timescale",
time_partitioning=True,
)
print(schema_sql)
# Includes: SELECT create_hypertable('cpu', 'time', ...)
ClickHouse Integration¶
from benchbox import TSBSDevOps
tsbs = TSBSDevOps(scale_factor=1.0)
# Get ClickHouse-optimized schema
schema_sql = tsbs.get_create_tables_sql(
dialect="clickhouse",
time_partitioning=True,
)
print(schema_sql)
# Includes: ENGINE = MergeTree() ORDER BY (...) PARTITION BY toYYYYMMDD(time)
InfluxDB Integration¶
InfluxDB 3.x uses FlightSQL for SQL queries and Line Protocol for data ingestion. BenchBox handles this automatically via the InfluxDB adapter.
from benchbox.platforms.influxdb import InfluxDBAdapter
from benchbox import TSBSDevOps
# Initialize TSBS DevOps benchmark
tsbs = TSBSDevOps(scale_factor=0.1, output_dir="tsbs_influx")
data_files = tsbs.generate_data()
# Create InfluxDB adapter (Core/OSS mode)
adapter = InfluxDBAdapter(
mode="core",
host="localhost",
port=8086,
token="your-influxdb-token",
database="benchmarks",
ssl=False,
)
# Create connection
conn = adapter.create_connection()
# InfluxDB auto-creates schema from Line Protocol writes
# Load data (converts CSV to Line Protocol)
row_counts, load_time, metadata = adapter.load_data(tsbs, conn, tsbs.output_dir)
print(f"Loaded {metadata['total_rows']:,} rows in {load_time:.2f}s")
# Get InfluxDB-compatible queries (uses DataFusion SQL)
for query_id in ["cpu-max-all-1-hr", "high-cpu-1-hr", "lastpoint"]:
query_sql = tsbs.get_query(query_id, dialect="influxdb")
exec_time, row_count, _ = adapter.execute_query(conn, query_sql, query_id)
print(f"{query_id}: {row_count} rows in {exec_time:.3f}s")
adapter.close_connection(conn)
InfluxDB Cloud mode:
# InfluxDB Cloud (Serverless/Dedicated/Clustered)
adapter = InfluxDBAdapter(
mode="cloud",
host="us-east-1-1.aws.cloud2.influxdata.com",
token="your-cloud-token",
org="your-org",
database="benchmarks",
)
Key InfluxDB Considerations:
Line Protocol: Data is loaded via InfluxDB’s native Line Protocol format for optimal ingest performance
Schema auto-creation: Tables (measurements) are auto-created on first write
SQL via FlightSQL: Queries use standard SQL (powered by Apache DataFusion)
Tags vs Fields: hostname becomes a tag (indexed), metrics become fields
No DELETE: InfluxDB Core doesn’t support deletes; use retention policies instead
CLI Options (--benchmark-option)¶
Configure TSBS Devops data generation via --benchmark-option KEY=VALUE:
Option |
Default |
Description |
|---|---|---|
|
- |
Number of simulated hosts |
|
- |
Duration in days for data generation |
|
|
Measurement interval in seconds |
|
- |
Start time in ISO format (e.g. |
|
- |
Random seed for reproducibility |
|
- |
Force data regeneration ( |
Accepts hyphenated aliases (e.g. num-hosts, duration-days).
# Custom host count and interval
benchbox run --platform duckdb --benchmark tsbs_devops --scale 1 \
--benchmark-option num_hosts=100 \
--benchmark-option interval_seconds=30 \
--benchmark-option start_time=2019-01-01T00:00:00
Scale Factor Guidelines¶
Scale Factor |
Hosts |
Duration |
CPU Rows |
Total Rows |
Data Size |
Use Case |
|---|---|---|---|---|---|---|
0.01 |
10 |
2 days |
~173K |
~1M |
~93 MB |
Quick testing |
0.1 |
10 |
2 days |
~173K |
~1M |
~93 MB |
Development |
1.0 |
100 |
2 days |
~1.7M |
~10M |
~0.93 GB |
Standard benchmark |
10.0 |
1,000 |
2 days |
~17.3M |
~100M |
~9.3 GB |
Performance testing |
100.0 |
10,000 |
2 days |
~173M |
~1B |
~93 GB |
Large scale testing |
Note: Duration is fixed at 2 days; only host count scales with SF. If both hosts and duration scaled, total rows would grow as SF² (quadratic) instead of SF (linear), making large scale factors disproportionately expensive. SF=0.01 and SF=0.1 produce identical output due to the 10-host minimum floor.
Data Generation Patterns¶
The generator creates realistic data with:
Diurnal CPU patterns: Higher usage during business hours (9am-5pm)
Memory growth: Gradual memory increase with periodic GC drops
Disk I/O bursts: 5% chance of 10x burst per interval
Network errors: Rare errors (~0.1%) and drops (~0.2%)
Tag distributions: 75% Linux, 50% production, balanced regions
Performance Characteristics¶
Query Performance Patterns¶
Single Host Queries:
Bottleneck: Time range filtering
Optimization: Index on (hostname, time)
Typical performance: Fast (milliseconds)
Aggregation Queries:
Bottleneck: Full scan of time range
Optimization: Columnar storage, vectorized execution
Typical performance: Medium (seconds)
GroupBy Queries:
Bottleneck: Hash aggregation memory
Optimization: Pre-aggregation, materialized views
Typical performance: Medium to slow
Threshold Queries:
Bottleneck: Filtering efficiency
Optimization: Bloom filters, sparse indexes
Typical performance: Fast with good indexes
Lastpoint Queries:
Bottleneck: Finding max timestamp per group
Optimization: Specialized last-value indexes
Typical performance: Critical for dashboards
Best Practices¶
Data Generation¶
Match your monitoring interval - Use realistic intervals (10s, 30s, 60s)
Scale hosts appropriately - Test with expected fleet size
Consider retention - Duration affects storage testing
Query Optimization¶
Partition by time - Essential for time-series databases
Index on hostname - For single-host query performance
Pre-aggregate - Materialized views for dashboards
Time-Series Database Tips¶
Use native types - TIMESTAMPTZ, DateTime64
Enable compression - Time-series compresses well
Consider downsampling - For long-term storage
External Resources¶
TSBS GitHub Repository - Original implementation
TimescaleDB Documentation - Time-series optimization
InfluxDB Line Protocol - Time-series data format