Pytest xdist Safety¶
BenchBox defaults to pytest-xdist for local runs, but some parts of the test suite also create native thread pools, parse large module trees, or spawn subprocess-heavy workloads. That combination is enough to make a developer machine unresponsive if worker counts are not capped early enough.
This page records the debugging outcome and the guardrails that must remain in place when contributors change pytest configuration, xdist behavior, or concurrency-heavy tests.
Executive Summary¶
Two root causes were identified and fixed:
Worker oversubscription (found first):
-n 8with uncapped workers created ~3.9 GB RSS and 181 threads on 114 tests. Fixed by an early-loaded plugin that caps macOS to 2 workers.setproctitle→launchservicesdstorm (found second, the real beachball driver): Even with workers capped to 2, the macOS beachball persisted during large suite runs. The root cause wassetproctitle(): xdist calls it twice per test (~200 calls/second) to update worker process titles. On macOS this triggerslaunchservicesdto continuously rebuild its process registry, consuming 200%+ CPU and ~900 MB RSS. Fixed by patchingworker_titleto a no-op on macOS via stack walking inpytest_configure.
Investigation Timeline¶
Phase 1: Worker Oversubscription (Original Finding)¶
A minimal reproducer with 114 tests showed that -n 8 created too many workers:
uv run -- python -m pytest -vv -n 8 \
tests/unit/test_release_sync.py \
tests/unit/test_marker_strategy.py \
tests/unit/platforms/dataframe/test_unified_frame.py
This was solved by capping workers to 2 on macOS via
_benchbox_pytest_xdist_safety.py.
Phase 2: Persistent Beachball (New Finding)¶
After the worker cap, the beachball still occurred during real test runs
(12,900+ tests). Systematic monitoring revealed launchservicesd as the #1
CPU consumer at 200%+ CPU, not pytest.
Phase 2 Controlled Experiments¶
Experiment |
launchservicesd CPU |
Result |
|---|---|---|
Baseline (no tests) |
0% |
Normal |
Pure CPU load (no pytest) |
0% |
Not CPU-related |
Serial pytest ( |
0% |
Not file I/O |
2 concurrent serial pytests |
0% |
Not concurrent Python |
Parallel collection only ( |
0% |
Not collection |
Parallel execution ( |
228% |
xdist-specific |
Parallel with output |
211% |
Not terminal rendering |
Parallel with |
0% |
ROOT CAUSE |
Parallel with |
0% |
FIX CONFIRMED |
Root Cause: setproctitle → launchservicesd¶
The setproctitle package (installed as a transitive dependency via mutmut)
provides a C extension that modifies process titles via macOS kernel APIs.
xdist’s remote.py calls setproctitle() twice per test execution:
# xdist/remote.py lines 224-230
worker_title("[pytest-xdist running] %s" % item.nodeid) # before test
worker_title("[pytest-xdist idle]") # after test
At ~100+ tests/second per worker, this generates ~200+ kernel-level process
title updates per second. On macOS, each update notifies launchservicesd
which rebuilds its process registry. The daemon cannot keep up and enters a
sustained high-CPU loop that:
Consumes 200%+ CPU (2 full cores)
Grows to ~900 MB RSS
Triggers
runningboarddat ~50% CPUStarves
WindowServerand the terminal app of responsive CPU/memoryCauses the macOS beachball
Why the Initial Investigation Missed This¶
The original 114-test reproducer was too small to trigger the effect. At low
test counts, setproctitle calls are infrequent enough that launchservicesd
handles them without stress. The threshold appears to be around several hundred
tests in parallel before launchservicesd CPU becomes dominant.
Current Protections¶
1. Early xdist cap¶
_benchbox_pytest_xdist_safety.py
is loaded from pytest.ini using:
-p _benchbox_pytest_xdist_safety
That plugin:
rewrites the last effective
-nor--numprocessesargument before xdist startupcaps macOS local runs to
2workers by defaultprints a clear message when it rewrites a request
honors
BENCHBOX_MAX_XDIST_WORKERSfor explicit local experiments
This must remain an early hook. Re-implementing the cap in
pytest_configure is not equivalent.
2. setproctitle suppression on macOS¶
tests/conftest.py patches worker_title to a
no-op on macOS in xdist worker processes. The implementation uses stack walking
because xdist’s remote.py executes in an execnet __channelexec__
namespace: import xdist.remote loads a different module object than the one
actually executing. The stack walk only patches frames whose __name__ is
"__channelexec__" and that have a callable worker_title, so unrelated
frames with a worker_title name are ignored.
This must remain in pytest_configure (not earlier) because it needs
config.workerinput to detect worker processes.
3. Native thread limits¶
tests/conftest.py sets native-library thread caps
before importing heavy modules:
POLARS_MAX_THREADS=2OMP_NUM_THREADS=1MKL_NUM_THREADS=1OPENBLAS_NUM_THREADS=1NUMEXPR_NUM_THREADS=1
Those are still necessary. The xdist cap prevents runaway worker counts, while these limits prevent each worker from using the whole machine by itself.
4. DuckDB thread control¶
DuckDB does not reliably obey the generic thread environment variables above.
BenchBox therefore patches duckdb.connect() in tests/conftest.py so test
connections default to config={"threads": "2"} unless a test explicitly
overrides that value.
Do not replace this with config.option.duckdb_threads or another controller
side-only setting. That path was already shown to be ineffective.
5. Regression coverage¶
tests/unit/test_xdist_safety.py
asserts that:
macOS defaults cap to
2the environment override works
-n autois cappedexplicit
-n 8style requests are rewrittenthe last effective
-nwins when pytest sees multiple-noccurrencessetproctitleis suppressed on macOS workers (direct stack-walk test)unrelated frames with a
worker_titleglobal are left untouched
If you change xdist startup behavior, update those tests in the same change.
Before/After Validation¶
The reproducer above was run across -n 0/1/2/4/8 before and after the final
fix. The critical signal is that requested -n 4 and -n 8 now collapse to
the same resource profile as safe -n 2.
Requested |
Before fix |
After fix |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Direct sanity checks after the fix:
pytest -n 4 ...reportscreated: 2/2 workerspytest -n 8 ...reportscreated: 2/2 workers
Contributor Checklist¶
If you touch pytest config, xdist startup, import patterns, or concurrency-heavy tests, do all of the following:
Keep the early plugin in
pytest.ini, or replace it with something equally early.Do not move the worker cap into
pytest_configure.Keep native thread limits at the top of
tests/conftest.py.Keep the DuckDB thread patch unless you have measured proof of a better replacement.
Keep the
setproctitlesuppression inpytest_configure. Ifsetproctitleis re-enabled, verify thatlaunchservicesdstays below 10% CPU during a 12,000+ test parallel run.Re-run the reproducer matrix at
-n 0/1/2/4/8.Collect resource data, not just pass/fail. At minimum record wall time, peak RSS, thread count, process count,
launchservicesdCPU, and whether requested-n 4or-n 8actually created only two workers on macOS.
Useful Commands¶
Run the regression tests:
uv run -- python -m pytest -n 0 tests/unit/test_xdist_safety.py -vv
Run the reproducer serially:
uv run -- python -m pytest -vv -n 0 \
tests/unit/test_release_sync.py \
tests/unit/test_marker_strategy.py \
tests/unit/platforms/dataframe/test_unified_frame.py
Run the reproducer with a capped explicit request:
uv run -- python -m pytest -vv -n 8 \
tests/unit/test_release_sync.py \
tests/unit/test_marker_strategy.py \
tests/unit/platforms/dataframe/test_unified_frame.py
Temporarily override the cap for a deliberate experiment:
BENCHBOX_MAX_XDIST_WORKERS=3 uv run -- python -m pytest -vv -n 3 <target>
That override is for measurement only. Do not use it as evidence that the default should be raised unless you also capture the same before/after resource metrics documented above.