Cluster Setup Installer Integration & Docs
Date: 2026-04-02 Status: Approved
Problem
Setting up a cluster agent or hub requires manually editing .env and adding cron entries. The installer (bin/install.sh) has no awareness of cluster roles, and the docs in Deployment.md lack verification steps, security guidance, and troubleshooting.
Goal
- Add cluster role prompts to
bin/install.shso agents and hubs can be configured during initial setup - Add
push_to_hubscheduling tobin/setup_cron.sh - Expand the Multi-Instance section in
docs/Deployment.md
Design
Installer changes (bin/install.sh)
After the existing post-install prompts (health check, cron, aliases, systemd), add a cluster role section. This runs for both bare-metal and Docker installs (before the Docker handoff, cluster vars are just .env entries).
Flow
Configure this instance for multi-instance (cluster) mode? [y/N]
→ N: skip, standalone mode (default, zero changes)
→ Y: continue:
Select cluster role:
1) agent — run checkers locally, push results to a hub
2) hub — accept alerts from remote agents
3) both — agent + hub (push to another hub while accepting agents)
[agent or both]
HUB_URL (e.g. https://monitoring-hub.example.com): <required>
INSTANCE_ID (default: <hostname>): <optional>
[hub or both]
(no extra prompts — just enables CLUSTER_ENABLED=1)
[agent, hub, or both]
WEBHOOK_SECRET_CLUSTER: <required, shared secret>
Write to .env, run push_to_hub --dry-run for agents to verify.
.env writes
| Variable | Agent | Hub | Both |
|---|---|---|---|
HUB_URL | user value | skip | user value |
CLUSTER_ENABLED | skip | 1 | 1 |
INSTANCE_ID | user value or hostname | skip | user value or hostname |
WEBHOOK_SECRET_CLUSTER | user value | user value | user value |
All writes use dotenv_set (overwrites empty values from .env.sample).
Verification
For agent/both roles, run push_to_hub --dry-run after writing .env. This validates that the payload builds correctly without actually POSTing. Show the output so the user can confirm checkers are detected.
Cron changes (bin/setup_cron.sh)
After the auto-update prompt, check if HUB_URL is set in .env. If so, offer to schedule push_to_hub:
HUB_URL detected — schedule automatic push to hub? [Y/n]
If yes, add a cron entry on the same schedule:
*/5 * * * * cd /path/to/project && uv run python manage.py push_to_hub --json >> push.log 2>&1 # server-maintanence cluster push
Docs changes (docs/Deployment.md)
Expand the Multi-Instance (Cluster) section with:
Verification — commands to confirm agent and hub are working:
# Agent: dry-run to verify payload
uv run python manage.py push_to_hub --dry-run
# Agent: single push to verify connectivity
uv run python manage.py push_to_hub
# Hub: verify cluster driver is registered
uv run python manage.py shell -c "from apps.alerts.drivers import DRIVER_REGISTRY; print('cluster' in DRIVER_REGISTRY)"
Security notes:
- Always use HTTPS for
HUB_URLin production WEBHOOK_SECRET_CLUSTERmust match on agent and hub- HMAC-SHA256 signature is sent via
X-Cluster-Signatureheader - Without a shared secret, payloads are accepted unsigned (dev only)
Troubleshooting:
| Symptom | Cause | Fix |
|---|---|---|
push_to_hub → “HUB_URL not configured” | Missing .env entry | Set HUB_URL in .env |
push_to_hub → connection refused | Hub not running or wrong URL | Verify hub is accessible, check URL |
push_to_hub → 403 Forbidden | Signature mismatch | Ensure WEBHOOK_SECRET_CLUSTER matches on both sides |
push_to_hub → 404 Not Found | Cluster driver not registered on hub | Set CLUSTER_ENABLED=1 on hub, restart |
| Alerts arrive but no notifications | Pipeline not configured on hub | Run setup_instance on hub to create pipeline |
File Changes
| File | Change |
|---|---|
bin/install.sh | Add cluster role prompts after existing post-install section |
bin/setup_cron.sh | Add push_to_hub cron option when HUB_URL is set |
docs/Deployment.md | Expand cluster section with verification, security, troubleshooting |
Non-Goals
- Changing the cluster driver or
push_to_hubcommand - Adding a standalone
bin/setup_agent.shscript - Auto-generating
WEBHOOK_SECRET_CLUSTER(user should use their own shared secret)