TSBS DevOps Benchmark

Tags intermediate concept tsbs-devops custom-benchmark

Overview

The Time Series Benchmark Suite (TSBS) DevOps benchmark simulates infrastructure monitoring workloads typical of DevOps and observability platforms. Based on the official TSBS implementation by Timescale, this benchmark generates realistic time-series data representing CPU, memory, disk, and network metrics from a fleet of monitored hosts.

The benchmark is ideal for evaluating time-series databases, OLAP systems handling temporal data, and infrastructure monitoring solutions.

Key Features

  • Realistic metrics - CPU, memory, disk I/O, and network statistics

  • Host metadata - Tags for region, datacenter, service, team

  • Diurnal patterns - Realistic daily usage patterns

  • Configurable scale - From 10 hosts to thousands

  • 18 DevOps queries - Common monitoring and alerting patterns

  • Multiple dialects - Standard SQL, ClickHouse, TimescaleDB, InfluxDB support

Data Model

The TSBS DevOps benchmark uses a dimensional model with a tags table and four metric tables:

Tables

Table

Purpose

Rows per SF=1

tags

Host metadata and dimensions

100

cpu

CPU usage metrics per timestamp

~864,000

mem

Memory metrics per timestamp

~864,000

disk

Disk I/O metrics per device

~1,728,000

net

Network metrics per interface

~1,728,000

tags Table

Column

Type

Description

hostname

VARCHAR

Unique host identifier (PK)

region

VARCHAR

Cloud region (us-east-1, eu-west-1, etc.)

datacenter

VARCHAR

Datacenter name

rack

VARCHAR

Rack identifier

os

VARCHAR

Operating system (linux, windows)

arch

VARCHAR

CPU architecture (x86_64, arm64)

team

VARCHAR

Team owner

service

VARCHAR

Service name

service_version

VARCHAR

Service version

service_environment

VARCHAR

Environment (prod, staging, dev)

cpu Table

Column

Type

Description

time

TIMESTAMP

Measurement timestamp (PK)

hostname

VARCHAR

Host identifier (PK)

usage_user

DOUBLE

CPU % in user space

usage_system

DOUBLE

CPU % in kernel space

usage_idle

DOUBLE

CPU % idle

usage_nice

DOUBLE

CPU % nice priority

usage_iowait

DOUBLE

CPU % waiting for I/O

usage_irq

DOUBLE

CPU % hardware interrupts

usage_softirq

DOUBLE

CPU % software interrupts

usage_steal

DOUBLE

CPU % stolen by hypervisor

usage_guest

DOUBLE

CPU % running guest VMs

usage_guest_nice

DOUBLE

CPU % guest nice priority

mem Table

Column

Type

Description

time

TIMESTAMP

Measurement timestamp (PK)

hostname

VARCHAR

Host identifier (PK)

total

BIGINT

Total memory bytes

available

BIGINT

Available memory bytes

used

BIGINT

Used memory bytes

free

BIGINT

Free memory bytes

cached

BIGINT

Cached memory bytes

buffered

BIGINT

Buffered memory bytes

used_percent

DOUBLE

Memory usage percent

available_percent

DOUBLE

Available memory percent

disk Table

Column

Type

Description

time

TIMESTAMP

Measurement timestamp (PK)

hostname

VARCHAR

Host identifier (PK)

device

VARCHAR

Disk device name (PK)

reads_completed

BIGINT

Total read operations

writes_completed

BIGINT

Total write operations

read_time_ms

BIGINT

Read time in milliseconds

write_time_ms

BIGINT

Write time in milliseconds

io_in_progress

INTEGER

Current I/O operations

net Table

Column

Type

Description

time

TIMESTAMP

Measurement timestamp (PK)

hostname

VARCHAR

Host identifier (PK)

interface

VARCHAR

Network interface name (PK)

bytes_recv

BIGINT

Bytes received

bytes_sent

BIGINT

Bytes sent

packets_recv

BIGINT

Packets received

packets_sent

BIGINT

Packets sent

err_in

BIGINT

Receive errors

err_out

BIGINT

Send errors

drop_in

BIGINT

Dropped incoming packets

drop_out

BIGINT

Dropped outgoing packets

Query Categories

The benchmark includes 18 queries organized into categories:

Single Host Queries

Metrics for individual hosts over time ranges:

  • single-host-12-hr: CPU usage for one host over 12 hours

  • single-host-1-hr: Detailed CPU for one host over 1 hour

Aggregation Queries

Cross-host aggregations:

  • cpu-max-all-1-hr: Maximum CPU across all hosts (1 hour)

  • cpu-max-all-8-hr: Maximum CPU across all hosts (8 hours)

GroupBy Queries

Time-bucketed aggregations:

  • double-groupby-1-hr: CPU grouped by host and minute

  • double-groupby-5-min: Fine-grained CPU grouping

Threshold Queries

Alert-style threshold filters:

  • high-cpu-1-hr: Hosts with CPU > 90%

  • high-cpu-12-hr: Sustained high CPU hosts

  • low-memory-hosts: Hosts with available memory < 10%

  • net-errors: Hosts with network errors

Memory Queries

Memory-specific analytics:

  • mem-by-host-1-hr: Memory statistics per host

Disk Queries

Disk I/O analytics:

  • disk-iops-1-hr: Read/write operations per host

  • disk-latency: Average disk latency analysis

Network Queries

Network throughput analytics:

  • net-throughput-1-hr: Bytes sent/received per host

Combined Queries

Cross-metric correlation:

  • resource-utilization: Combined CPU and memory per host

Lastpoint Queries

Most recent values (common in dashboards):

  • lastpoint: Most recent metrics per host

Tag-filtered Queries

Filtering by host metadata:

  • by-region: Metrics filtered by cloud region

  • by-service: Metrics grouped by service

Usage Examples

Basic Benchmark Setup

from benchbox import TSBSDevOps

# Initialize TSBS DevOps benchmark (SF=1 = 100 hosts, 1 day)
tsbs = TSBSDevOps(scale_factor=1.0, output_dir="tsbs_data")

# Generate time-series data
data_files = tsbs.generate_data()

# Get all queries
queries = tsbs.get_queries()
print(f"Generated {len(queries)} TSBS queries")

# Get specific query
cpu_query = tsbs.get_query("cpu-max-all-1-hr")
print(cpu_query)

Custom Configuration

# Configure specific hosts and duration
tsbs_custom = TSBSDevOps(
    scale_factor=0.5,
    output_dir="tsbs_custom",
    num_hosts=50,           # Override: 50 hosts
    duration_days=7,        # Override: 7 days of data
    interval_seconds=60,    # 1-minute intervals
)
data_files = tsbs_custom.generate_data()

DuckDB Integration

import duckdb
from benchbox import TSBSDevOps

# Initialize and generate data
tsbs = TSBSDevOps(scale_factor=0.1, output_dir="tsbs_small")
data_files = tsbs.generate_data()

# Create DuckDB connection and schema
conn = duckdb.connect("tsbs.duckdb")
schema_sql = tsbs.get_create_tables_sql(dialect="duckdb")

for stmt in schema_sql.split(";"):
    if stmt.strip():
        conn.execute(stmt)

# Load data
for table_name, file_path in tsbs.tables.items():
    conn.execute(f"""
        INSERT INTO {table_name}
        SELECT * FROM read_csv('{file_path}', header=true, auto_detect=true)
    """)

# Run queries
for query_id in ["cpu-max-all-1-hr", "high-cpu-1-hr", "lastpoint"]:
    query_sql = tsbs.get_query(query_id)
    result = conn.execute(query_sql).fetchall()
    print(f"{query_id}: {len(result)} rows")

conn.close()

TimescaleDB Integration

from benchbox import TSBSDevOps

tsbs = TSBSDevOps(scale_factor=1.0)

# Get TimescaleDB-optimized schema with hypertables
schema_sql = tsbs.get_create_tables_sql(
    dialect="timescale",
    time_partitioning=True,
)
print(schema_sql)
# Includes: SELECT create_hypertable('cpu', 'time', ...)

ClickHouse Integration

from benchbox import TSBSDevOps

tsbs = TSBSDevOps(scale_factor=1.0)

# Get ClickHouse-optimized schema
schema_sql = tsbs.get_create_tables_sql(
    dialect="clickhouse",
    time_partitioning=True,
)
print(schema_sql)
# Includes: ENGINE = MergeTree() ORDER BY (...) PARTITION BY toYYYYMMDD(time)

InfluxDB Integration

InfluxDB 3.x uses FlightSQL for SQL queries and Line Protocol for data ingestion. BenchBox handles this automatically via the InfluxDB adapter.

from benchbox.platforms.influxdb import InfluxDBAdapter
from benchbox import TSBSDevOps

# Initialize TSBS DevOps benchmark
tsbs = TSBSDevOps(scale_factor=0.1, output_dir="tsbs_influx")
data_files = tsbs.generate_data()

# Create InfluxDB adapter (Core/OSS mode)
adapter = InfluxDBAdapter(
    mode="core",
    host="localhost",
    port=8086,
    token="your-influxdb-token",
    database="benchmarks",
    ssl=False,
)

# Create connection
conn = adapter.create_connection()

# InfluxDB auto-creates schema from Line Protocol writes
# Load data (converts CSV to Line Protocol)
row_counts, load_time, metadata = adapter.load_data(tsbs, conn, tsbs.output_dir)
print(f"Loaded {metadata['total_rows']:,} rows in {load_time:.2f}s")

# Get InfluxDB-compatible queries (uses DataFusion SQL)
for query_id in ["cpu-max-all-1-hr", "high-cpu-1-hr", "lastpoint"]:
    query_sql = tsbs.get_query(query_id, dialect="influxdb")
    exec_time, row_count, _ = adapter.execute_query(conn, query_sql, query_id)
    print(f"{query_id}: {row_count} rows in {exec_time:.3f}s")

adapter.close_connection(conn)

InfluxDB Cloud mode:

# InfluxDB Cloud (Serverless/Dedicated/Clustered)
adapter = InfluxDBAdapter(
    mode="cloud",
    host="us-east-1-1.aws.cloud2.influxdata.com",
    token="your-cloud-token",
    org="your-org",
    database="benchmarks",
)

Key InfluxDB Considerations:

  • Line Protocol: Data is loaded via InfluxDB’s native Line Protocol format for optimal ingest performance

  • Schema auto-creation: Tables (measurements) are auto-created on first write

  • SQL via FlightSQL: Queries use standard SQL (powered by Apache DataFusion)

  • Tags vs Fields: hostname becomes a tag (indexed), metrics become fields

  • No DELETE: InfluxDB Core doesn’t support deletes; use retention policies instead

Scale Factor Guidelines

Scale Factor

Hosts

Duration

CPU Rows

Total Rows

Use Case

0.01

10

1 day

~86K

~430K

Quick testing

0.1

10

1 day

~86K

~430K

Development

1.0

100

1 day

~864K

~5M

Standard benchmark

10.0

1000

10 days

~86M

~500M

Performance testing

100.0

1000

100 days

~864M

~5B

Large scale testing

Data Generation Patterns

The generator creates realistic data with:

  • Diurnal CPU patterns: Higher usage during business hours (9am-5pm)

  • Memory growth: Gradual memory increase with periodic GC drops

  • Disk I/O bursts: 5% chance of 10x burst per interval

  • Network errors: Rare errors (~0.1%) and drops (~0.2%)

  • Tag distributions: 75% Linux, 50% production, balanced regions

Performance Characteristics

Query Performance Patterns

Single Host Queries:

  • Bottleneck: Time range filtering

  • Optimization: Index on (hostname, time)

  • Typical performance: Fast (milliseconds)

Aggregation Queries:

  • Bottleneck: Full scan of time range

  • Optimization: Columnar storage, vectorized execution

  • Typical performance: Medium (seconds)

GroupBy Queries:

  • Bottleneck: Hash aggregation memory

  • Optimization: Pre-aggregation, materialized views

  • Typical performance: Medium to slow

Threshold Queries:

  • Bottleneck: Filtering efficiency

  • Optimization: Bloom filters, sparse indexes

  • Typical performance: Fast with good indexes

Lastpoint Queries:

  • Bottleneck: Finding max timestamp per group

  • Optimization: Specialized last-value indexes

  • Typical performance: Critical for dashboards

Best Practices

Data Generation

  1. Match your monitoring interval - Use realistic intervals (10s, 30s, 60s)

  2. Scale hosts appropriately - Test with expected fleet size

  3. Consider retention - Duration affects storage testing

Query Optimization

  1. Partition by time - Essential for time-series databases

  2. Index on hostname - For single-host query performance

  3. Pre-aggregate - Materialized views for dashboards

Time-Series Database Tips

  1. Use native types - TIMESTAMPTZ, DateTime64

  2. Enable compression - Time-series compresses well

  3. Consider downsampling - For long-term storage

External Resources