System Status Command Design
Purpose
A new system_status management command that gives operators a single-glance view of system configuration and flags inconsistencies across all config sources (.env, .env.sample, settings.py, database state, filesystem, shell scripts).
Separate from preflight (which checks health), this checks configuration coherence.
Command Interface
python manage.py system_status # Dashboard + issues
python manage.py system_status --json # Full JSON for CI
python manage.py system_status --checks-only # Skip dashboard, issues only
python manage.py system_status --verbose # Include passing checks
Integration: bin/cli.sh status subcommand.
Dashboard Output
System Profile
═══ System Profile ══════════════════════════
Role: agent → hub at https://monitoring-hub.example.com
Environment: production (DEBUG=off)
Deploy: bare (systemd)
Database: SQLite @ /var/lib/sm/db.sqlite3
Celery: redis://localhost:6379/0 (eager=off)
Metrics: statsd (localhost:8125, prefix=pipeline)
Logging: /var/log/sm/
Instance ID: node-west-1
Role derivation:
HUB_URLset,CLUSTER_ENABLED=0→agent → hub at <url>CLUSTER_ENABLED=1, noHUB_URL→hub (accepting cluster payloads)- Neither →
standalone - Both → conflict (flagged as error)
Pipeline State
═══ Pipeline State ══════════════════════════
Channels: slack (active), email (inactive)
Intelligence: local (active)
Last run: 2026-04-05 08:12 UTC — notified (OK)
Pipeline Definitions
═══ Pipeline Definitions ════════════════════
full-pipeline (active)
alerts: webhook → checkers: cpu,memory,disk → intelligence: local → notify: slack
health-only (active)
checkers: cpu,memory,disk,network → notify: email
legacy-monitor (inactive)
alerts: webhook → checkers: cpu → notify: email
Each definition shows its stage chain with configured drivers/checkers/providers/channels. Inactive definitions are dimmed in terminal output.
Consistency Checks
Env File Consistency
| Check | Sources | Severity |
|---|---|---|
Keys in .env.sample missing from .env | .env vs .env.sample | WARN |
Keys in .env not in .env.sample (unknown) | .env vs .env.sample | WARN |
Keys in settings.py missing from .env.sample | settings.py vs .env.sample | WARN |
Keys in .env.sample never referenced in code | .env.sample vs settings.py + bin/ | WARN |
Commented-out sample keys that are set in .env | .env vs .env.sample | INFO |
Cluster Profile Coherence
| Check | Logic | Severity |
|---|---|---|
| Agent+hub conflict | HUB_URL set AND CLUSTER_ENABLED=1 | ERROR |
| Agent without secret | HUB_URL set, WEBHOOK_SECRET_CLUSTER empty | WARN |
| Agent without instance ID | HUB_URL set, INSTANCE_ID empty | WARN |
| Hub without secret | CLUSTER_ENABLED=1, WEBHOOK_SECRET_CLUSTER empty | ERROR |
Environment vs Runtime State
| Check | Logic | Severity |
|---|---|---|
| Debug on in production | DJANGO_ENV=prod, DJANGO_DEBUG=1 | ERROR |
| No allowed hosts in production | DJANGO_ENV=prod, DJANGO_ALLOWED_HOSTS empty | ERROR |
| Celery eager in production | DJANGO_ENV=prod, CELERY_TASK_ALWAYS_EAGER=1 | WARN |
| StatsD configured but backend=logging | STATSD_HOST set, METRICS_BACKEND=logging | INFO |
| Metrics backend=statsd but no host | Reverse | WARN |
Database vs Config State
| Check | Logic | Severity |
|---|---|---|
| Active pipelines but Celery eager | Active PipelineDefinition + eager mode | WARN |
| Notification channels without credentials | Active channel missing webhook/config in DB | WARN |
| No active notification channels | Zero active channels | WARN |
| No active pipeline definitions | Zero active definitions | INFO |
| Intelligence provider active, fallback disabled | Active provider + FALLBACK_ENABLED=0 | INFO |
Installation State
| Check | Logic | Severity |
|---|---|---|
| Aliases not installed (dev) | DJANGO_ENV=dev, bin/aliases.sh missing | WARN |
| Pre-commit hooks not installed (dev) | DJANGO_ENV=dev, .git/hooks/pre-commit missing | WARN |
| Cron not configured (prod) | DJANGO_ENV=prod, no crontab entries | WARN |
| Logs directory not writable | LOGS_DIR not writable | ERROR |
| Database file not writable | SQLite path not writable | ERROR |
Architecture
File Layout
apps/checkers/management/commands/system_status.py # Command
apps/checkers/status/ # Status modules
__init__.py
dashboard.py # Profile dashboard renderer
env_checks.py # .env vs .env.sample vs settings.py
cluster_checks.py # Cluster profile coherence
runtime_checks.py # Environment vs runtime state
database_checks.py # DB state vs config
installation_checks.py # Installation state checks
Why Not Django System Checks?
The consistency checks compare across sources (env vs sample vs settings vs database vs filesystem). Django’s @register() framework is designed for single-source validation. These checks need to:
- Parse
.env.sampleas a raw file - Scan
settings.pysource foros.environ.getreferences - Compare key sets across multiple files
CheckResult Dataclass
@dataclass
class CheckResult:
level: str # "ok", "info", "warn", "error"
message: str
hint: str = ""
category: str = "" # "env", "cluster", "runtime", "database", "installation"
Each module exposes run() -> list[CheckResult].
JSON Output
{
"profile": {
"role": "agent",
"hub_url": "https://monitoring-hub.example.com",
"environment": "production",
"debug": false,
"deploy_method": "bare",
"database": "sqlite:///var/lib/sm/db.sqlite3",
"celery_broker": "redis://localhost:6379/0",
"celery_eager": false,
"metrics_backend": "statsd",
"instance_id": "node-west-1",
"logs_dir": "/var/log/sm/"
},
"pipeline": {
"channels": [{"name": "slack", "active": true}],
"intelligence": [{"name": "local", "active": true}],
"last_run": {"timestamp": "2026-04-05T08:12:00Z", "status": "notified"}
},
"definitions": [
{
"name": "full-pipeline",
"active": true,
"stages": [
{"stage": "alerts", "drivers": ["webhook"]},
{"stage": "checkers", "checkers": ["cpu", "memory", "disk"]},
{"stage": "intelligence", "providers": ["local"]},
{"stage": "notify", "channels": ["slack"]}
]
}
],
"checks": [
{"level": "error", "category": "cluster", "message": "...", "hint": "..."}
],
"summary": {"passed": 12, "warnings": 2, "errors": 1}
}
Testing
- Each status module:
apps/checkers/_tests/status/test_<module>.py - Mock
.env/.env.samplecontent and Django settings - Dashboard rendering tested with known config states
- JSON output tested for schema correctness
- 100% branch coverage required
CLI Integration
Add status subcommand to bin/cli.sh alongside existing health.