Unified Health Check — Design
Date: 2026-03-29
Problem
There are two separate “installation check” surfaces that overlap but cover different things:
check_installation()inbin/cli/install_menu.sh— checks uv, .venv, pre-commit, aliases, Djangobin/check_system.sh— checks uv, Python, .env, .venv, disk, writable, Django preflight
Neither covers Redis, Celery broker, database migrations, Docker container health, or systemd service status. They share no code.
Goal
Unify into a single bin/lib/health_check.sh library that auto-detects the deployment mode and runs the relevant checks. Both check_system.sh and cli’s check_installation() call the same library.
Auto-detection
The library detects deployment mode by examining what’s present:
Priority (first match wins):
1. Docker — docker compose containers running for this project
2. systemd — server-monitoring.service unit exists in systemd
3. prod — .venv exists + DJANGO_ENV=prod in .env
4. dev — fallback
Check Groups
| Group | dev | prod | docker | systemd |
|---|---|---|---|---|
| Core (Python, uv, .env, disk, writable) | Yes | Yes | Skip | Skip |
| Django (check, migrations) | Yes | Yes | Skip | Skip |
| Dev (pre-commit, aliases) | Yes | No | No | No |
| Docker (daemon, compose, containers) | No | No | Yes | No |
| systemd (units, Redis, socket) | No | No | No | Yes |
Check Inventory
Core checks (dev + prod):
- Python 3.10+ installed (with pyenv shim detection)
- uv installed
.envfile exists.venvdirectory exists- Project directory writable
- Disk space > 1GB free
Django checks (dev + prod, requires .venv):
manage.py checkpasses- No pending migrations (
manage.py migrate --check)
Dev checks (dev only):
- Pre-commit hooks installed
- Shell aliases configured
Docker checks (docker only):
- Docker daemon running
docker composev2 available- All 3 containers running (redis, web, celery)
- No containers in restart loop
systemd checks (systemd only):
server-monitoring.serviceis activeserver-monitoring-celery.serviceis active- Redis service is active
- Gunicorn socket exists (
/run/server-monitoring/gunicorn.sock)
Output
- Default: human-readable
OK/WARN/ERRwith summary line. Exit code 1 if any errors. --json: array of{"check": "name", "status": "ok|warn|err", "message": "..."}objects.
File Changes
New:
bin/lib/health_check.sh— unified library withdetect_mode(), check groups,run_all_checks(), JSON supportbin/tests/lib/test_health_check.bats— unit tests for detect_mode and check functions
Modified:
bin/check_system.sh— slim to flag parsing + callrun_all_checksbin/cli/install_menu.sh—check_installation()callsrun_all_checksbin/tests/test_check_system.bats— add--jsonflag test
Unaffected: install.sh, deploy scripts, other cli modules.
Approach
Grouped functions (selected): run_core_checks(), run_django_checks(), run_dev_checks(), run_docker_checks(), run_systemd_checks(), with run_all_checks() as the orchestrator that auto-detects mode.
Rejected:
- Registry pattern — over-engineered for ~15-20 checks
- Django management command — can’t check pre-Django prerequisites from Python