Deployment
Production deployment guide for Server Monitoring. Choose Docker Compose for quick deploys or bare metal with systemd for full control.
[toc]
Prerequisites
- Python 3.10+
uv- Redis (message broker for Celery)
- Nginx (reverse proxy, optional but recommended)
Environment Variables
Create /etc/server-monitoring/env (systemd) or .env (Docker) with these values:
| Variable | Default | Required | Purpose |
|---|---|---|---|
DJANGO_SECRET_KEY | — | Yes | Cryptographic signing key |
DJANGO_DEBUG | 1 | Yes (set 0) | Disable debug mode in production |
DJANGO_ALLOWED_HOSTS | — | Yes | Comma-separated hostnames (e.g. monitoring.example.com) |
CELERY_BROKER_URL | redis://localhost:6379/0 | No | Redis broker URL |
ENABLE_CELERY_ORCHESTRATION | 1 | No | Enable async pipeline via Celery |
API_KEY_AUTH_ENABLED | 0 | No | Require API keys for endpoints |
RATE_LIMIT_ENABLED | 0 | No | Enable rate limiting middleware |
WEBHOOK_SECRET_<DRIVER> | — | No | Signature verification per driver (e.g. WEBHOOK_SECRET_GRAFANA) |
Minimal production .env:
DJANGO_SECRET_KEY=your-random-secret-key-here
DJANGO_DEBUG=0
DJANGO_ALLOWED_HOSTS=monitoring.example.com
ENABLE_CELERY_ORCHESTRATION=1
Generate a secret key:
python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"
Option 1: Docker Compose
The fastest way to get a production stack running. Includes Django (gunicorn), Celery worker, and Redis.
Quick start: Run
./bin/install.shand select docker mode to automate the steps below (.envsetup, build, start, and health verification).
1.1 Clone and configure
git clone git@github.com:ikidnapmyself/server-monitoring.git
cd server-monitoring
cp .env.sample .env
Edit .env with the production values from the table above. The Docker Compose file reads config from .env and automatically overrides CELERY_BROKER_URL to use the internal redis service hostname — you do not need to change that value in .env for Docker deployments.
1.2 Start the stack
docker compose -f deploy/docker/docker-compose.yml up -d
This starts three services:
| Service | What it does |
|---|---|
redis | Message broker for Celery |
web | Django app served by gunicorn on port 8000 |
celery | Celery worker processing pipeline tasks |
1.3 Verify
# Check all services are running
docker compose -f deploy/docker/docker-compose.yml ps
# Check logs
docker compose -f deploy/docker/docker-compose.yml logs web
docker compose -f deploy/docker/docker-compose.yml logs celery
# Test health endpoint
curl http://localhost:8000/alerts/webhook/
1.4 Run migrations manually (if needed)
Migrations run automatically on container start. To run them manually:
docker compose -f deploy/docker/docker-compose.yml exec web python manage.py migrate
1.5 Create an API key
docker compose -f deploy/docker/docker-compose.yml exec web python manage.py shell -c "
from config.models import APIKey
key = APIKey.objects.create(name='my-service')
print(f'API Key: {key._raw_key}')
print('Save this key — it cannot be retrieved again.')
"
Option 2: Bare Metal / VPS with systemd
For full control on a Linux server.
2.1 Install Redis
# Ubuntu/Debian
sudo apt install redis-server
# On Debian/Ubuntu the service is usually named redis-server
# On RHEL/Fedora/Arch it's redis
sudo systemctl enable --now redis-server
# Verify
redis-cli ping # Should return PONG
2.2 Clone and install
sudo mkdir -p /opt/server-monitoring
sudo chown www-data:www-data /opt/server-monitoring
sudo -u www-data git clone git@github.com:ikidnapmyself/server-monitoring.git /opt/server-monitoring
cd /opt/server-monitoring
# Install uv and dependencies as www-data
sudo -u www-data sh -c 'curl -LsSf https://astral.sh/uv/install.sh | sh'
sudo -u www-data uv sync --frozen --no-dev --extra prod
2.3 Configure environment
sudo mkdir -p /etc/server-monitoring
sudo tee /etc/server-monitoring/env << 'EOF'
DJANGO_SECRET_KEY=your-random-secret-key-here
DJANGO_DEBUG=0
DJANGO_ALLOWED_HOSTS=monitoring.example.com
CELERY_BROKER_URL=redis://localhost:6379/0
ENABLE_CELERY_ORCHESTRATION=1
EOF
sudo chown root:www-data /etc/server-monitoring/env
sudo chmod 640 /etc/server-monitoring/env
2.4 Run migrations and collect static files
cd /opt/server-monitoring
set -a; source /etc/server-monitoring/env; set +a
uv run python manage.py migrate --noinput
uv run python manage.py collectstatic --noinput
2.5 Install systemd units
sudo cp deploy/systemd/server-monitoring.service /etc/systemd/system/
sudo cp deploy/systemd/server-monitoring-celery.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now server-monitoring server-monitoring-celery
Automated: Run
sudo ./bin/install.sh deployto automate steps 2.4-2.6 (migrations, static files, unit installation, and service startup with health verification). Or usesudo ./bin/install.shin prod mode when selecting the systemd deployment option.Security note: Running the installer with
sudoexecutes all shell code as root. Review the deploy module (bin/install/deploy.sh) before running and ensure the repository has not been tampered with. Prefer running onlyinstall.sh deploywithsudorather than the full installer to minimize the root-privileged surface.
2.6 Verify
sudo systemctl status server-monitoring
sudo systemctl status server-monitoring-celery
# Test via unix socket
curl --unix-socket /run/server-monitoring/gunicorn.sock http://localhost/alerts/webhook/
Nginx Reverse Proxy
A sample config is provided at deploy/docker/nginx.conf. Two values must be adjusted per deployment:
| Setting | Docker | systemd |
|---|---|---|
upstream | server web:8000; | server unix:/run/server-monitoring/gunicorn.sock; |
location /static/ alias | /app/staticfiles/ (shared volume) | /opt/server-monitoring/staticfiles/ |
Docker setup
Nginx runs on the host (or as another container) and proxies to the web service. If Nginx runs as a separate container, it needs access to the same staticfiles volume or network.
systemd setup
Change both the upstream and the static files path:
upstream django {
server unix:/run/server-monitoring/gunicorn.sock;
}
location /static/ {
alias /opt/server-monitoring/staticfiles/;
}
Install on the host
sudo apt install nginx
sudo cp deploy/docker/nginx.conf /etc/nginx/sites-available/server-monitoring
sudo ln -s /etc/nginx/sites-available/server-monitoring /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
SSL with Let’s Encrypt
sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d monitoring.example.com
Certbot will modify the Nginx config to add SSL. The commented SSL block in deploy/docker/nginx.conf shows the manual configuration if you prefer.
Webhook Ingestion
External monitoring tools (Grafana, AlertManager, PagerDuty, etc.) send alerts via webhook:
POST /alerts/webhook/ # Auto-detect driver from payload
POST /alerts/webhook/<driver>/ # Driver-specific endpoint
Sync vs Async
The behavior depends on ENABLE_CELERY_ORCHESTRATION:
| Setting | Behavior | Response |
|---|---|---|
0 (default) | Pipeline runs synchronously in the request | 200 OK with results |
1 | Pipeline queued to Celery worker | 202 Accepted with pipeline ID |
Automatic fallback
When ENABLE_CELERY_ORCHESTRATION=1 but the Redis broker is unreachable, the webhook view automatically falls back to synchronous processing. No alerts are lost.
Signature verification
Set WEBHOOK_SECRET_<DRIVER> environment variables to enable HMAC signature verification:
WEBHOOK_SECRET_GRAFANA=your-grafana-webhook-secret
WEBHOOK_SECRET_ALERTMANAGER=your-alertmanager-secret
Requests with invalid signatures receive 403 Forbidden.
Monitoring the Deployment
System preflight
uv run python manage.py preflight # All system checks, grouped
uv run python manage.py preflight --json # JSON output for CI
Health checks
uv run python manage.py check_health # CPU, memory, disk, network, process
uv run python manage.py check_health --list
Pipeline history
uv run python manage.py monitor_pipeline --limit 10
Celery worker health
celery -A config inspect ping # Check if workers are responding
celery -A config inspect active # Show active tasks
For Docker:
docker compose -f deploy/docker/docker-compose.yml exec celery celery -A config inspect ping
Multi-Instance (Cluster)
Deploy multiple instances across servers: agents monitor locally and push alerts to a hub that runs the full pipeline (intelligence + notifications).
Architecture
Agent (server-1) ──POST──┐
Agent (server-2) ──POST──┤──▶ Hub ──▶ intelligence ──▶ notify
Agent (server-3) ──POST──┘ (receives cluster alerts)
All instances run the same codebase. Role is determined by environment variables.
Agent setup
On each server you want to monitor:
- Install the project (
./bin/install.sh— select “agent” when prompted for cluster role) - Add to
.env:
HUB_URL=https://monitoring-hub.example.com
WEBHOOK_SECRET_CLUSTER=your-shared-secret
INSTANCE_ID=web-server-01
- Schedule the push command via cron:
# Every 5 minutes
*/5 * * * * cd /opt/server-monitoring && uv run python manage.py push_to_hub --json >> push.log 2>&1
Or run manually:
uv run python manage.py push_to_hub # Push all checker results
uv run python manage.py push_to_hub --dry-run # Preview without sending
uv run python manage.py push_to_hub --checkers cpu,memory # Specific checkers
Tip: The installer and
bin/install.sh croncan configure all of the above interactively. Manual.envediting is only needed if you skipped the prompts.
Hub setup
On the central monitoring server:
- Install the project (
./bin/install.sh— select “hub” when prompted for cluster role) - Add to
.env:
CLUSTER_ENABLED=1
WEBHOOK_SECRET_CLUSTER=your-shared-secret
The hub accepts cluster payloads at POST /alerts/webhook/cluster/ and processes them through the full pipeline. Each alert carries instance_id and hostname labels for per-server filtering.
Standalone (default)
Existing installs with neither HUB_URL nor CLUSTER_ENABLED set continue to work as standalone instances with no changes.
Verification
After setting up an agent or hub, verify the configuration:
Agent verification:
# Dry-run: builds payload, shows what would be sent (no network call)
uv run python manage.py push_to_hub --dry-run
# Single push: sends one payload to the hub and reports the result
uv run python manage.py push_to_hub
# Push specific checkers only
uv run python manage.py push_to_hub --checkers cpu,memory --dry-run
Hub verification:
# Confirm the cluster driver is registered
uv run python manage.py shell -c "from apps.alerts.drivers import DRIVER_REGISTRY; print('cluster' in DRIVER_REGISTRY)"
# Expected output: True
# Check Django system checks pass
uv run python manage.py check
Security
- Always use HTTPS for
HUB_URLin production. Payloads contain server metrics and alert details. WEBHOOK_SECRET_CLUSTERmust be identical on agents and hub. It is used to compute an HMAC-SHA256 signature sent via theX-Cluster-Signatureheader.- Without a shared secret, payloads are accepted unsigned — acceptable for local development but not for production.
- The shared secret is never transmitted in the payload; only the signature is sent.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
push_to_hub exits with “HUB_URL not configured” | HUB_URL missing from .env | Add HUB_URL=https://your-hub.example.com to .env |
push_to_hub exits with connection refused | Hub not running or wrong URL | Verify hub is accessible: curl -s $HUB_URL/alerts/webhook/cluster/ |
push_to_hub returns 403 Forbidden | HMAC signature mismatch | Ensure WEBHOOK_SECRET_CLUSTER is identical on agent and hub |
push_to_hub returns 404 Not Found | Cluster driver not registered | Set CLUSTER_ENABLED=1 in hub .env and restart |
| Alerts arrive on hub but no notifications fire | Pipeline not configured | Run uv run python manage.py setup_instance on the hub |
push_to_hub --dry-run shows 0 alerts | No checkers returned results | Run uv run python manage.py check_health to verify checkers work |