Disk Checker Output Reconciliation Design
Problem
The disk_common, disk_macos, and disk_linux checkers each report a single total_recoverable_mb figure computed across multiple lists (space_hogs, old_files, large_files). The check_health CLI formatter prints up to 10 items from each list followed by ... and N more, then prints the grand total. The displayed items don’t reconcile against the total: a 746.7 MB total can show only ~190 MB worth of items, with the rest hidden in the truncated tail and in lists that the user may not even realise are part of the sum.
A real example from a Linux host:
[OK] disk_common: Disk analysis: 746.7 MB recoverable
Space Hogs:
- /var/log/journal 112.0 MB
- /var/log/btmp.1 16.8 MB
... (10 items shown, summing to 187.0 MB)
... and 8 more
Total recoverable: 746.7 MB
The 10 shown items add to 187 MB; the 8 hidden space_hogs each cap at 4.1 MB (sorted descending), so the rest of the 746 MB total is sitting in old_files and/or large_files sections that may also be truncated, and at minimum lacks per-section subtotals so the user could decompose the grand total at a glance.
The output is not wrong — every value in metrics is correct — it’s the formatter that hides the breakdown.
Scope
In scope:
- Change
_output_metrics()inapps/checkers/management/commands/check_health.pyso that:- Every non-empty section (
space_hogs,old_files,large_files) shows a subtotal in its header. - When 2+ sections are non-empty, the section with the largest subtotal is shown in full; the others are still truncated to the first 10 items.
- Truncated sections show a trailer that includes the byte weight of the omitted items:
... and 8 more (13.5 MB).
- Every non-empty section (
Out of scope:
- Changes to checker logic or the
metricsdict shape. The data is already correct. - Changes to the JSON output path,
run_check,preflight, or the dashboard formatter. - Status-threshold tuning (
warning_threshold = 5000.0,critical_threshold = 20000.0in MB stay as-is). - Aligning
disk_linuxanddisk_macoswithdisk_commonby addinglarge_filesscans. Worth doing — both currently emit onlyspace_hogsandold_fileswhiledisk_commonalso walks~for large files — but it’s a behavioural change with new walk targets and perf implications, not a formatter fix. Deferred to a follow-up. - DB-size scanning (mysql, mariadb, mongo, meili) as a potential future signal source. Worth considering as a future checker (or extension to disk checkers), out of scope here.
Approach
Algorithm
In _output_metrics(), replace the current for key in ("space_hogs", "old_files", "large_files"): block with logic that:
- Builds an ordered list of
(key, items, subtotal_mb)tuples for each non-empty list, in the existing iteration order (space_hogs,old_files,large_files). - If there are 2+ entries, determines the “largest” one as the entry with the highest subtotal.
max()over equal subtotals returns the first encountered, which is the natural order — no explicit tie-breaker needed. - For each entry, prints:
- A header line including the subtotal, item count, and shown count:
Space Hogs: 200.5 MB (18 items, top 10 shown)when truncated.Space Hogs: 412.0 MB (24 items, all shown)when full.
- The item lines, formatted exactly as today (
- {path} {size_mb:.1f} MB[ ({age_days}d old)]). - If truncated: a trailer
... and N more (X.X MB)whereX.Xis the sum of the omitted items’ sizes.
- A header line including the subtotal, item count, and shown count:
The grand-total line Total recoverable: 746.7 MB is unchanged.
Display rule
- 2+ non-empty sections → largest gets full output (every item, no trailer); others get top 10 + trailer.
- Exactly 1 non-empty section → that section gets top 10 + trailer, just like the multi-section case. We don’t promote it to “full output” because the user already has the signal that this single section is the cause; expanding to potentially hundreds of items only bloats the screen.
- 0 non-empty sections → nothing prints (same as today).
Why this shape
- Subtotals always shown. The user can reconcile any subset of sections against the grand total without scrolling.
- Trailer carries weight.
... and 8 more (13.5 MB)makes truncation honest — you can see the omitted weight without re-running with a flag. - Largest-in-full is bounded. Only one section can be the largest, so the worst case is “one section dumps everything” rather than “every section dumps everything”.
- No new flags. No
--verboseor--full; the output reads correctly by default. Compact runs over many checkers stay reasonable because most non-disk checkers contribute a single line, and each disk checker contributes at most one fully-expanded section.
Edge cases
- All sections empty. No section header lines print. Total may still print if
total_recoverable_mbis set. Same as today. - Section has ≤ 10 items. All items already fit; header reads
(N items, all shown), no trailer. Applies regardless of largest-section status. disk_macos/disk_linuxonly emitspace_hogsandold_files(nolarge_files). The iteration over the three keys naturally skips empty/missing keys. The largest-of-two rule still works.- Float rounding. Subtotals are computed with regular float sums; rounding drift between subtotals and the grand total is possible at the 0.1 MB level. Acceptable — matches existing rounding behaviour and the underlying
size_mbvalues are already rounded at scan time.
Testing
apps/checkers/_tests/test_commands.py already has test_metrics_space_hogs covering the truncation + “… and 2 more” string. The new test cases needed:
- Single non-empty section with > 10 items → header shows
(N items, top 10 shown), items truncated to 10, trailer carries the omitted weight. - Two non-empty sections, the second being the largest → first gets truncated header + trailer, second gets
(N items, all shown)and full item list. - Three non-empty sections (the
disk_commonshape),large_fileslargest → onlylarge_filesshown in full. - Truncated trailer math: 18 items where the bottom 8 sum to a known value → trailer reads that value.
- Section with exactly 10 items → header
(10 items, all shown), no trailer. - Empty metrics dict → no header lines, no trailer.
The existing test_metrics_space_hogs will need updating because the “… and 2 more” string is now “… and 2 more (X.X MB)”.
Notes for implementation
- Keep the change inside
_output_metrics. Don’t extract a helper unless the diff genuinely needs it — the method is already structured around per-section blocks and one extra preprocessing step (compute subtotals, pick the largest) is enough. - Don’t change the order of keys iterated. Today’s order is the contract for
_build_recommendationsand for any operator who reads the output regularly. - The trailer format
... and N more (X.X MB)uses two spaces before the parenthesis to match the existing two-space gap between path and size in item lines (- /var/log/journal 112.0 MB). This is intentional visual alignment.