Deployment

Production deployment guide for Server Monitoring. Choose Docker Compose for quick deploys or bare metal with systemd for full control.

[toc]

Prerequisites

Python 3.10+
uv
Redis (message broker for Celery)
Nginx (reverse proxy, optional but recommended)

Environment Variables

Create /etc/server-monitoring/env (systemd) or .env (Docker) with these values:

Variable	Default	Required	Purpose
`DJANGO_SECRET_KEY`	—	Yes	Cryptographic signing key
`DJANGO_DEBUG`	`1`	Yes (set `0`)	Disable debug mode in production
`DJANGO_ALLOWED_HOSTS`	—	Yes	Comma-separated hostnames (e.g. `monitoring.example.com`)
`CELERY_BROKER_URL`	`redis://localhost:6379/0`	No	Redis broker URL
`ENABLE_CELERY_ORCHESTRATION`	`1`	No	Enable async pipeline via Celery
`API_KEY_AUTH_ENABLED`	`0`	No	Require API keys for endpoints
`RATE_LIMIT_ENABLED`	`0`	No	Enable rate limiting middleware
`WEBHOOK_SECRET_<DRIVER>`	—	No	Signature verification per driver (e.g. `WEBHOOK_SECRET_GRAFANA`)

Minimal production .env:

DJANGO_SECRET_KEY=your-random-secret-key-here
DJANGO_DEBUG=0
DJANGO_ALLOWED_HOSTS=monitoring.example.com
ENABLE_CELERY_ORCHESTRATION=1

Generate a secret key:

python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"

Option 1: Docker Compose

The fastest way to get a production stack running. Includes Django (gunicorn), Celery worker, and Redis.

Quick start: Run ./bin/install.sh and select docker mode to automate the steps below (.env setup, build, start, and health verification).

1.1 Clone and configure

git clone git@github.com:ikidnapmyself/server-monitoring.git
cd server-monitoring
cp .env.sample .env

Edit .env with the production values from the table above. The Docker Compose file reads config from .env and automatically overrides CELERY_BROKER_URL to use the internal redis service hostname — you do not need to change that value in .env for Docker deployments.

1.2 Start the stack

docker compose -f deploy/docker/docker-compose.yml up -d

This starts three services:

Service	What it does
`redis`	Message broker for Celery
`web`	Django app served by gunicorn on port 8000
`celery`	Celery worker processing pipeline tasks

1.3 Verify

# Check all services are running
docker compose -f deploy/docker/docker-compose.yml ps

# Check logs
docker compose -f deploy/docker/docker-compose.yml logs web
docker compose -f deploy/docker/docker-compose.yml logs celery

# Test health endpoint
curl http://localhost:8000/alerts/webhook/

1.4 Run migrations manually (if needed)

Migrations run automatically on container start. To run them manually:

docker compose -f deploy/docker/docker-compose.yml exec web python manage.py migrate

1.5 Create an API key

docker compose -f deploy/docker/docker-compose.yml exec web python manage.py shell -c "
from config.models import APIKey
key = APIKey.objects.create(name='my-service')
print(f'API Key: {key._raw_key}')
print('Save this key — it cannot be retrieved again.')
"

Option 2: Bare Metal / VPS with systemd

For full control on a Linux server.

2.1 Install Redis

# Ubuntu/Debian
sudo apt install redis-server
# On Debian/Ubuntu the service is usually named redis-server
# On RHEL/Fedora/Arch it's redis
sudo systemctl enable --now redis-server

# Verify
redis-cli ping   # Should return PONG

2.2 Clone and install

sudo mkdir -p /opt/server-monitoring
sudo chown www-data:www-data /opt/server-monitoring
sudo -u www-data git clone git@github.com:ikidnapmyself/server-monitoring.git /opt/server-monitoring
cd /opt/server-monitoring

# Install uv and dependencies as www-data
sudo -u www-data sh -c 'curl -LsSf https://astral.sh/uv/install.sh | sh'
sudo -u www-data uv sync --frozen --no-dev --extra prod

2.3 Configure environment

sudo mkdir -p /etc/server-monitoring
sudo tee /etc/server-monitoring/env << 'EOF'
DJANGO_SECRET_KEY=your-random-secret-key-here
DJANGO_DEBUG=0
DJANGO_ALLOWED_HOSTS=monitoring.example.com
CELERY_BROKER_URL=redis://localhost:6379/0
ENABLE_CELERY_ORCHESTRATION=1
EOF
sudo chown root:www-data /etc/server-monitoring/env
sudo chmod 640 /etc/server-monitoring/env

2.4 Run migrations and collect static files

cd /opt/server-monitoring
set -a; source /etc/server-monitoring/env; set +a

uv run python manage.py migrate --noinput
uv run python manage.py collectstatic --noinput

2.5 Install systemd units

sudo cp deploy/systemd/server-monitoring.service /etc/systemd/system/
sudo cp deploy/systemd/server-monitoring-celery.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now server-monitoring server-monitoring-celery

Automated: Run sudo ./bin/install.sh deploy to automate steps 2.4-2.6 (migrations, static files, unit installation, and service startup with health verification). Or use sudo ./bin/install.sh in prod mode when selecting the systemd deployment option.

Security note: Running the installer with sudo executes all shell code as root. Review the deploy module (bin/install/deploy.sh) before running and ensure the repository has not been tampered with. Prefer running only install.sh deploy with sudo rather than the full installer to minimize the root-privileged surface.

2.6 Verify

sudo systemctl status server-monitoring
sudo systemctl status server-monitoring-celery

# Test via unix socket
curl --unix-socket /run/server-monitoring/gunicorn.sock http://localhost/alerts/webhook/

Nginx Reverse Proxy

A sample config is provided at deploy/docker/nginx.conf. Two values must be adjusted per deployment:

Setting	Docker	systemd
`upstream`	`server web:8000;`	`server unix:/run/server-monitoring/gunicorn.sock;`
`location /static/ alias`	`/app/staticfiles/` (shared volume)	`/opt/server-monitoring/staticfiles/`

Docker setup

Nginx runs on the host (or as another container) and proxies to the web service. If Nginx runs as a separate container, it needs access to the same staticfiles volume or network.

systemd setup

Change both the upstream and the static files path:

upstream django {
    server unix:/run/server-monitoring/gunicorn.sock;
}

location /static/ {
    alias /opt/server-monitoring/staticfiles/;
}

Install on the host

sudo apt install nginx
sudo cp deploy/docker/nginx.conf /etc/nginx/sites-available/server-monitoring
sudo ln -s /etc/nginx/sites-available/server-monitoring /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

SSL with Let’s Encrypt

sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d monitoring.example.com

Certbot will modify the Nginx config to add SSL. The commented SSL block in deploy/docker/nginx.conf shows the manual configuration if you prefer.

Webhook Ingestion

External monitoring tools (Grafana, AlertManager, PagerDuty, etc.) send alerts via webhook:

POST /alerts/webhook/              # Auto-detect driver from payload
POST /alerts/webhook/<driver>/     # Driver-specific endpoint

Sync vs Async

The behavior depends on ENABLE_CELERY_ORCHESTRATION:

Setting	Behavior	Response
`0` (default)	Pipeline runs synchronously in the request	`200 OK` with results
`1`	Pipeline queued to Celery worker	`202 Accepted` with pipeline ID

Automatic fallback

When ENABLE_CELERY_ORCHESTRATION=1 but the Redis broker is unreachable, the webhook view automatically falls back to synchronous processing. No alerts are lost.

Signature verification

Set WEBHOOK_SECRET_<DRIVER> environment variables to enable HMAC signature verification:

WEBHOOK_SECRET_GRAFANA=your-grafana-webhook-secret
WEBHOOK_SECRET_ALERTMANAGER=your-alertmanager-secret

Requests with invalid signatures receive 403 Forbidden.

Monitoring the Deployment

System preflight

uv run python manage.py preflight          # All system checks, grouped
uv run python manage.py preflight --json   # JSON output for CI

Health checks

uv run python manage.py check_health       # CPU, memory, disk, network, process
uv run python manage.py check_health --list

Pipeline history

uv run python manage.py monitor_pipeline --limit 10

Celery worker health

celery -A config inspect ping              # Check if workers are responding
celery -A config inspect active            # Show active tasks

For Docker:

docker compose -f deploy/docker/docker-compose.yml exec celery celery -A config inspect ping

Multi-Instance (Cluster)

Deploy multiple instances across servers: agents monitor locally and push alerts to a hub that runs the full pipeline (intelligence + notifications).

Architecture

Agent (server-1)  ──POST──┐
Agent (server-2)  ──POST──┤──▶  Hub  ──▶  intelligence ──▶ notify
Agent (server-3)  ──POST──┘     (receives cluster alerts)

All instances run the same codebase. Role is determined by environment variables.

Agent setup

On each server you want to monitor:

Install the project (./bin/install.sh — select “agent” when prompted for cluster role)
Add to .env:

HUB_URL=https://monitoring-hub.example.com
WEBHOOK_SECRET_CLUSTER=your-shared-secret
INSTANCE_ID=web-server-01

Schedule the push command via cron:

# Every 5 minutes
*/5 * * * * cd /opt/server-monitoring && uv run python manage.py push_to_hub --json >> push.log 2>&1

Or run manually:

uv run python manage.py push_to_hub              # Push all checker results
uv run python manage.py push_to_hub --dry-run    # Preview without sending
uv run python manage.py push_to_hub --checkers cpu,memory  # Specific checkers

Tip: The installer and bin/install.sh cron can configure all of the above interactively. Manual .env editing is only needed if you skipped the prompts.

Hub setup

On the central monitoring server:

Install the project (./bin/install.sh — select “hub” when prompted for cluster role)
Add to .env:

CLUSTER_ENABLED=1
WEBHOOK_SECRET_CLUSTER=your-shared-secret

The hub accepts cluster payloads at POST /alerts/webhook/cluster/ and processes them through the full pipeline. Each alert carries instance_id and hostname labels for per-server filtering.

Standalone (default)

Existing installs with neither HUB_URL nor CLUSTER_ENABLED set continue to work as standalone instances with no changes.

Verification

After setting up an agent or hub, verify the configuration:

Agent verification:

# Dry-run: builds payload, shows what would be sent (no network call)
uv run python manage.py push_to_hub --dry-run

# Single push: sends one payload to the hub and reports the result
uv run python manage.py push_to_hub

# Push specific checkers only
uv run python manage.py push_to_hub --checkers cpu,memory --dry-run

Hub verification:

# Confirm the cluster driver is registered
uv run python manage.py shell -c "from apps.alerts.drivers import DRIVER_REGISTRY; print('cluster' in DRIVER_REGISTRY)"
# Expected output: True

# Check Django system checks pass
uv run python manage.py check

Security

Always use HTTPS for HUB_URL in production. Payloads contain server metrics and alert details.
WEBHOOK_SECRET_CLUSTER must be identical on agents and hub. It is used to compute an HMAC-SHA256 signature sent via the X-Cluster-Signature header.
Without a shared secret, payloads are accepted unsigned — acceptable for local development but not for production.
The shared secret is never transmitted in the payload; only the signature is sent.

Troubleshooting

Symptom	Cause	Fix
`push_to_hub` exits with “HUB_URL not configured”	`HUB_URL` missing from `.env`	Add `HUB_URL=https://your-hub.example.com` to `.env`
`push_to_hub` exits with connection refused	Hub not running or wrong URL	Verify hub is accessible: `curl -s $HUB_URL/alerts/webhook/cluster/`
`push_to_hub` returns 403 Forbidden	HMAC signature mismatch	Ensure `WEBHOOK_SECRET_CLUSTER` is identical on agent and hub
`push_to_hub` returns 404 Not Found	Cluster driver not registered	Set `CLUSTER_ENABLED=1` in hub `.env` and restart
Alerts arrive on hub but no notifications fire	Pipeline not configured	Run `uv run python manage.py setup_instance` on the hub
`push_to_hub --dry-run` shows 0 alerts	No checkers returned results	Run `uv run python manage.py check_health` to verify checkers work