Dependency Inventory¶
Evidence-backed inventory of every dependency declared in pyproject.toml.
Built for the audit-imported-dependencies-for-elimination TODO and intended
to be the entry point when a contributor needs to know why a dep is in the
manifest, who uses it, and whether it can be dropped.
This document complements (does not replace) dependency-compatibility.md
(version caps on kept deps) and dependency-audit-raw.md (raw pyproject.toml
extract). Together:
Document |
Question it answers |
|---|---|
|
Does this dep belong in the manifest at all? |
|
Is the version cap on this kept dep correct? |
|
What does |
Process note. This audit is inventory + flag only. No
pyproject.toml,uv.lock, orbenchbox/source change accompanies it. Each elimination recommendation lands as its own follow-up TODO so each removal carries its own test surface and reviewer.
Methodology¶
Source of truth for declared deps. Parsed
pyproject.tomlwithtomllib(_project/scripts/dependency_audit/parse_deps.py).uv pip listwas deliberately avoided because it includes transitives and masks which extras group owns a package.Source of truth for import sites. Walked every
.pyfile underbenchbox/,scripts/,tests/,docs/conf.py, anddocs/_static/withast.parse(_project/scripts/dependency_audit/scan_imports.py). Forfrom X import a, bwe record bothXand the synthesizedX.a,X.bso namespace packages (google.cloud.*,azure.*,databricks.*) are matched correctly._project/scripts/was scanned separately. Imports there belong to internal tooling, not the shipped wheel - packages used only there are called out so reviewers can decide whether the dep belongs in the public manifest at all.Transitive reach is read from
uv tree. A package that has no direct import sites but is required by another live dep is annotatedtransitive-via-Xrather than flagged unused.Plugin-style deps verified.
pytest-*,ruff,ty,tox,mutmut,codespell,sphinx-*,furo,pygments,roman-numerals,myst-parserare CLI tools or build-time plugins. They are correctly absent fromimportlines and were verified live by checking the relevant config surface (pytest.ini,tox.ini,docs/conf.py).Ongoing check.
make audit-depsruns_project/scripts/dependency_audit/check_deps.pyin CI (.github/workflows/lint.yml) and fails if any declared package has zero import sites and is not in one of the two allowlists in_project/scripts/dependency_audit/. To add a justified exception, append an entry with areason:to the appropriate allowlist file.
Package → import-name map¶
Most package names match their top-level import name. The cases below do not - keep this map in sync when adding or auditing deps.
Package |
Top-level import(s) |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Inventory¶
Categories: C = Core, CL = CLI/runtime support, SQL = SQL platform adapter, CS = Cloud storage, DF = DataFrame engine, BM = Benchmark data/format, DEV = Dev/test/lint/CI, DOC = Docs build, MCP = MCP server. TF = Table format. CSP = Cloud Spark adapter.
Status legend: KEEP = live import sites, retained. FLAG-UNUSED = no import sites in tracked source, no transitive role, no plugin/CLI use. FLAG-REDUNDANT = duplicates a dep already pulled in by another extras group with overlapping consumers. FLAG-DEAD-EXTRA = entire extras group or declaration appears unused.
Package |
Cat |
Owner module(s) |
Import sites |
Status |
|---|---|---|---|---|
|
SQL |
(transitive of |
0 |
FLAG-REDUNDANT - installed transitively by |
|
CL |
|
92 |
KEEP |
|
DEV |
|
0 (runtime); 1 (tooling) |
FLAG-REDUNDANT - declared as a core dep but only used by internal TODO tooling under |
|
C |
|
42 |
KEEP |
|
CL |
|
8 |
KEEP |
|
CL |
|
39 |
KEEP |
|
C |
|
120 |
KEEP |
|
CL |
|
6 |
KEEP |
|
C |
|
23 |
KEEP |
|
CL |
|
146 |
KEEP |
|
C |
|
16 |
KEEP |
|
C |
|
20 |
KEEP |
|
C |
|
9 |
KEEP - guarded by |
|
C |
|
9 |
KEEP |
|
CSP |
|
9 |
KEEP |
|
CSP |
|
3 |
KEEP |
|
CS |
|
32 |
KEEP |
|
SQL |
|
5 |
KEEP |
|
SQL |
|
1 |
KEEP - HTTP protocol used by ClickHouse Cloud adapter |
|
SQL |
|
4 |
KEEP - TCP protocol used by ClickHouse server adapter |
|
CS |
|
16 |
KEEP |
|
DF |
|
8 |
KEEP |
|
SQL |
|
6 |
KEEP |
|
CSP |
|
2 |
KEEP |
|
CSP |
|
14 |
KEEP |
|
SQL |
|
3 |
KEEP |
|
DF |
|
105 |
KEEP |
|
TF |
|
9 |
KEEP |
|
TF |
|
38 |
KEEP |
|
SQL |
|
80 |
KEEP |
|
SQL |
|
4 |
KEEP |
|
SQL |
|
8 |
KEEP |
|
CSP |
|
2 |
KEEP |
|
CS |
|
7 |
KEEP |
|
SQL |
|
1 |
KEEP |
|
MCP |
|
18 |
KEEP |
|
DF |
|
2 |
KEEP |
|
DF/BM |
|
182 |
KEEP |
|
DF |
|
60 |
KEEP |
|
SQL |
|
2 |
KEEP |
|
SQL |
|
18 |
KEEP |
|
SQL |
|
3 |
KEEP |
|
TF |
|
16 |
KEEP |
|
SQL |
|
7 |
KEEP |
|
SQL |
|
3 |
KEEP |
|
DF/CSP |
|
102 |
KEEP |
|
SQL |
|
1 |
KEEP |
|
CS |
|
22 |
KEEP - also reaches transitively via boto3 / google-cloud-* / snowflake; explicit declaration in |
|
SQL |
|
2 |
KEEP |
|
SQL |
|
4 |
KEEP |
|
CSP |
|
2 |
KEEP |
|
SQL |
|
3 |
KEEP |
|
TF |
|
2 |
KEEP |
|
DEV |
|
965 |
KEEP |
|
DEV |
(CLI plugin via |
0 |
KEEP - pytest plugin loaded by entry point; verified used in |
|
DEV |
(CLI plugin via |
0 |
KEEP - |
|
DEV |
(CLI plugin via |
0 |
KEEP - pytest plugin (timeout config in |
|
DEV |
(CLI plugin via |
0 |
KEEP - |
|
DEV |
(CLI tool) |
0 |
KEEP - |
|
DEV |
(CLI tool) |
0 |
KEEP - |
|
DEV |
(CLI tool) |
0 |
KEEP - |
|
DEV |
(CLI tool) |
0 |
KEEP - |
|
DEV |
(CLI tool) |
0 |
KEEP - invoked via |
|
DEV |
|
1 |
FLAG-UNUSED-RUNTIME - only test imports; verify whether |
|
DEV |
(also in dev for tests) |
covered above |
KEEP |
|
DEV |
tests use |
covered above |
KEEP - extras pin S3/GCS/Azure providers for live tests |
|
DEV |
tests use |
covered above |
KEEP - extras pin SQL-SQLite catalog for tests |
|
DEV |
|
0 (runtime); 1 (tooling) |
KEEP - used by TODO tooling. (Distinct from |
|
DOC |
|
2 |
KEEP - drives docs build via |
|
DOC |
(config in |
0 |
KEEP - registered in |
|
DOC |
(config in |
0 |
KEEP |
|
DOC |
(config in |
1 |
KEEP |
|
DOC |
(config in |
0 |
KEEP - markdown parser for docs |
|
DOC |
( |
0 |
KEEP - active Sphinx theme |
|
DOC |
|
4 |
KEEP - custom code-block style |
|
DOC |
(transitive of Sphinx 9) |
0 |
KEEP - Sphinx 9 needs |
|
DOC |
|
1 |
KEEP - Sphinx blog extension; configured in |
|
DOC |
- |
0 |
FLAG-DEAD-EXTRA - declared in |
|
DOC |
- |
0 |
FLAG-DEAD-EXTRA - declared in |
(Total declared package names: 76. KEEP: 70. FLAG-*: 6.)
Elimination candidates (FLAG findings)¶
F1 - chdb-core should be moved out of core dependencies¶
Where:
pyproject.tomlline 49 ([project] dependencies).Evidence:
chdb-corehas zero directimport chdb_coresites inbenchbox/,tests/,scripts/, ordocs/.uv treeconfirmschdb-coreis depended on bychdb v4.1.6, which itself is declared in optional extras (extras:all,extras:clickhouse-local,dep-group:dev).As declared today, every minimal install pulls
chdb-core(~tens of MB, bundled C++ binary) even when the user did not request the chdb extra.
Recommended action: Remove
chdb-core>=26.1.0from[project] dependencies. It will continue to be installed transitively wheneverchdbis selected.Risk: If a code path imports
chdb_coredirectly (none found), it would break. Verification command:grep -r "chdb_core" benchbox tests scripts.
F2 - jsonschema should not be a core runtime dependency¶
Where:
pyproject.tomlline 42.Evidence:
Zero import sites in
benchbox/,tests/,scripts/, ordocs/.Single import site is
_project/scripts/validate_todo.py:18- internal TODO management tooling that is not part of the shipped wheel (_project/is excluded from packaging viatool.setuptools.packages.findconvention and is not in any package data).
Recommended action: Either move
jsonschemato a tooling-only[dependency-groups]entry (e.g. add todev), or extract_project/scripts to their own pinned environment under_project/scripts/pyproject.toml.Risk: Low.
jsonschemais an unconditional install today; removing it from core would shrink the wheel install set by ~2 MB plus its (relatively small) transitive surface (attrs,referencing,rpds-py,jsonschema-specifications).
F3 - lxml is declared in dep-group:dev but only one test imports it¶
Where:
pyproject.tomlline 482.Evidence:
lxmlhas one import site:tests/unit/core/tpcdi/test_etl_sources.py:22.grep -r "^import lxml\|^from lxml\|lxml\." benchbox/returns no matches - TPC-DI ETL usesxml.etree(stdlib), notlxml.
Recommended action: Either remove
lxml>=5.0.0fromdep-group:devand rewrite the single test to use stdlibxml.etree, or keep the dep and document the test rationale inline. Removal is the lower-surface option.Risk: Low. The single test (
test_etl_sources.py) would need to be rewritten to drop the lxml-specific assertion path.
F4 - requests is reachable transitively but explicitly declared in three extras¶
Where:
pyproject.tomllines 172 (extras:questdb), 291 (extras:cloud-spark-azure), 322 (extras:cloud-spark).Evidence:
22 import sites use
requestsdirectly.uv treeshowsrequestsis reachable transitively fromboto3,google-cloud-*,snowflake-connector-python, andazure-identity, so installs that pull any of these getrequestsfor free.The explicit declarations are nonetheless valuable: they protect Azure / QuestDB / generic Spark installs that do not pull a transitive provider.
Recommended action: Keep as declared. Document the rationale here so future audits don’t churn on it. (No follow-up TODO needed.)
Risk: Removing the explicit declarations would reintroduce silent fragility - minimal QuestDB or Azure-only installs would lose
requests.
F5 - sphinx-rtd-theme is dead¶
Where:
pyproject.tomlline 487 ([dependency-groups] dev).Evidence:
docs/conf.py:130setshtml_theme = "furo".sphinx_rtd_themedoes not appear in any Python file or RST file across the repo. Only references are inpyproject.toml, this audit, anddependency-compatibility.md.
Recommended action: Remove
sphinx-rtd-theme>=3.1.0from[dependency-groups] dev.Risk: None. Theme switch to furo happened earlier; this dep was not cleaned up.
Used-but-undeclared findings¶
Walking benchbox/, scripts/, tests/, docs/conf.py, and docs/_static/
turned up the following top-level imports for which no pyproject.toml
declaration exists. Each is classified below.
Top-level module |
Classification |
Notes |
|---|---|---|
|
declared (dev) |
|
|
declared (dev) |
|
|
transitive-reach |
Pulled in by |
|
transitive-reach |
Pulled in by |
|
transitive-reach |
Pulled in by |
|
guarded GPU |
NVIDIA RAPIDS / CUDA stack. All are guarded behind |
|
guarded optional |
Optional Dask SQL backend; guarded import. |
|
guarded optional |
Apache Arrow Flight SQL client; guarded import in InfluxDB / Doris paths. |
|
first-party alias |
Not a third-party package - |
|
guarded optional |
Alternate InfluxDB client; guarded fallback in |
|
guarded optional |
LakeSail Spark distribution; guarded import in |
|
extras-included |
Pulled in via |
|
declared (extras) |
NLP / ML stacks for |
|
transitive-reach |
Pulled in by |
|
first-party |
Lives at |
|
first-party |
Local scripts / packages - not third-party. |
Status after declare-undeclared-runtime-imports:
pillowandansi2htmldeclared in[dependency-groups] dev(used byscripts/capture_chart_images.pyonly).sentence_transformers,spacy,textblob,torchdeclared inextras:ai-primitives.PILinbenchbox/ortests/- zero import sites found; no declaration needed in runtime extras.
Consolidation proposals (no action required by this audit)¶
dataframe-*aliases. The plain-name extras (pandas,polars,modin,dask,pyspark,cudf) duplicate thedataframe-*extras one-to-one. Cleanup is a breaking rename and is explicitly deferred per this TODO’sdeferred[]. Surfacing here for traceability.databricks-connectextras alias. Marked deprecated inpyproject.toml:304in favor ofcloud-spark-databricks. Removal is a breaking change and is deferred.extras:allvs per-extra duplication. Every package inextras:allis also present in at least one focused extra. This is intentional and documented independency-compatibility.md.
Summary¶
Bucket |
Count |
|---|---|
Total declared packages |
76 |
KEEP |
70 |
FLAG-* (elimination candidates) |
6 |
Used-but-undeclared (audit candidates) |
0 (all resolved: |
Each FLAG-* finding has a corresponding follow-up TODO under
_project/TODO/main/planning/. This audit performs no removals; the
follow-ups carry their own verification, must_preserve, and reviewer.