Deployment

Production deployment guide for Server Monitoring. Choose Docker Compose for quick deploys or bare metal with systemd for full control.

[toc]


Prerequisites

  • Python 3.10+
  • uv
  • Redis (message broker for Celery)
  • Nginx (reverse proxy, optional but recommended)

Environment Variables

Create /etc/server-monitoring/env (systemd) or .env (Docker) with these values:

Variable Default Required Purpose
DJANGO_SECRET_KEY Yes Cryptographic signing key
DJANGO_DEBUG 1 Yes (set 0) Disable debug mode in production
DJANGO_ALLOWED_HOSTS Yes Comma-separated hostnames (e.g. monitoring.example.com)
CELERY_BROKER_URL redis://localhost:6379/0 No Redis broker URL
ENABLE_CELERY_ORCHESTRATION 1 No Enable async pipeline via Celery
API_KEY_AUTH_ENABLED 0 No Require API keys for endpoints
RATE_LIMIT_ENABLED 0 No Enable rate limiting middleware
WEBHOOK_SECRET_<DRIVER> No Signature verification per driver (e.g. WEBHOOK_SECRET_GRAFANA)

Minimal production .env:

DJANGO_SECRET_KEY=your-random-secret-key-here
DJANGO_DEBUG=0
DJANGO_ALLOWED_HOSTS=monitoring.example.com
ENABLE_CELERY_ORCHESTRATION=1

Generate a secret key:

python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"

Option 1: Docker Compose

The fastest way to get a production stack running. Includes Django (gunicorn), Celery worker, and Redis.

Quick start: Run ./bin/install.sh and select docker mode to automate the steps below (.env setup, build, start, and health verification).

1.1 Clone and configure

git clone git@github.com:ikidnapmyself/server-monitoring.git
cd server-monitoring
cp .env.sample .env

Edit .env with the production values from the table above. The Docker Compose file reads config from .env and automatically overrides CELERY_BROKER_URL to use the internal redis service hostname — you do not need to change that value in .env for Docker deployments.

1.2 Start the stack

docker compose -f deploy/docker/docker-compose.yml up -d

This starts three services:

Service What it does
redis Message broker for Celery
web Django app served by gunicorn on port 8000
celery Celery worker processing pipeline tasks

1.3 Verify

# Check all services are running
docker compose -f deploy/docker/docker-compose.yml ps

# Check logs
docker compose -f deploy/docker/docker-compose.yml logs web
docker compose -f deploy/docker/docker-compose.yml logs celery

# Test health endpoint
curl http://localhost:8000/alerts/webhook/

1.4 Run migrations manually (if needed)

Migrations run automatically on container start. To run them manually:

docker compose -f deploy/docker/docker-compose.yml exec web python manage.py migrate

1.5 Create an API key

docker compose -f deploy/docker/docker-compose.yml exec web python manage.py shell -c "
from config.models import APIKey
key = APIKey.objects.create(name='my-service')
print(f'API Key: {key._raw_key}')
print('Save this key — it cannot be retrieved again.')
"

Option 2: Bare Metal / VPS with systemd

For full control on a Linux server.

2.1 Install Redis

# Ubuntu/Debian
sudo apt install redis-server
# On Debian/Ubuntu the service is usually named redis-server
# On RHEL/Fedora/Arch it's redis
sudo systemctl enable --now redis-server

# Verify
redis-cli ping   # Should return PONG

2.2 Clone and install

sudo mkdir -p /opt/server-monitoring
sudo chown www-data:www-data /opt/server-monitoring
sudo -u www-data git clone git@github.com:ikidnapmyself/server-monitoring.git /opt/server-monitoring
cd /opt/server-monitoring

# Install uv and dependencies as www-data
sudo -u www-data sh -c 'curl -LsSf https://astral.sh/uv/install.sh | sh'
sudo -u www-data uv sync --frozen --no-dev --extra prod

2.3 Configure environment

sudo mkdir -p /etc/server-monitoring
sudo tee /etc/server-monitoring/env << 'EOF'
DJANGO_SECRET_KEY=your-random-secret-key-here
DJANGO_DEBUG=0
DJANGO_ALLOWED_HOSTS=monitoring.example.com
CELERY_BROKER_URL=redis://localhost:6379/0
ENABLE_CELERY_ORCHESTRATION=1
EOF
sudo chown root:www-data /etc/server-monitoring/env
sudo chmod 640 /etc/server-monitoring/env

2.4 Run migrations and collect static files

cd /opt/server-monitoring
set -a; source /etc/server-monitoring/env; set +a

uv run python manage.py migrate --noinput
uv run python manage.py collectstatic --noinput

2.5 Install systemd units

sudo cp deploy/systemd/server-monitoring.service /etc/systemd/system/
sudo cp deploy/systemd/server-monitoring-celery.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now server-monitoring server-monitoring-celery

Automated: Run sudo ./bin/install.sh deploy to automate steps 2.4-2.6 (migrations, static files, unit installation, and service startup with health verification). Or use sudo ./bin/install.sh in prod mode when selecting the systemd deployment option.

Security note: Running the installer with sudo executes all shell code as root. Review the deploy module (bin/install/deploy.sh) before running and ensure the repository has not been tampered with. Prefer running only install.sh deploy with sudo rather than the full installer to minimize the root-privileged surface.

2.6 Verify

sudo systemctl status server-monitoring
sudo systemctl status server-monitoring-celery

# Test via unix socket
curl --unix-socket /run/server-monitoring/gunicorn.sock http://localhost/alerts/webhook/

Nginx Reverse Proxy

A sample config is provided at deploy/docker/nginx.conf. Two values must be adjusted per deployment:

Setting Docker systemd
upstream server web:8000; server unix:/run/server-monitoring/gunicorn.sock;
location /static/ alias /app/staticfiles/ (shared volume) /opt/server-monitoring/staticfiles/

Docker setup

Nginx runs on the host (or as another container) and proxies to the web service. If Nginx runs as a separate container, it needs access to the same staticfiles volume or network.

systemd setup

Change both the upstream and the static files path:

upstream django {
    server unix:/run/server-monitoring/gunicorn.sock;
}

location /static/ {
    alias /opt/server-monitoring/staticfiles/;
}

Install on the host

sudo apt install nginx
sudo cp deploy/docker/nginx.conf /etc/nginx/sites-available/server-monitoring
sudo ln -s /etc/nginx/sites-available/server-monitoring /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

SSL with Let’s Encrypt

sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d monitoring.example.com

Certbot will modify the Nginx config to add SSL. The commented SSL block in deploy/docker/nginx.conf shows the manual configuration if you prefer.


Webhook Ingestion

External monitoring tools (Grafana, AlertManager, PagerDuty, etc.) send alerts via webhook:

POST /alerts/webhook/              # Auto-detect driver from payload
POST /alerts/webhook/<driver>/     # Driver-specific endpoint

Sync vs Async

The behavior depends on ENABLE_CELERY_ORCHESTRATION:

Setting Behavior Response
0 (default) Pipeline runs synchronously in the request 200 OK with results
1 Pipeline queued to Celery worker 202 Accepted with pipeline ID

Automatic fallback

When ENABLE_CELERY_ORCHESTRATION=1 but the Redis broker is unreachable, the webhook view automatically falls back to synchronous processing. No alerts are lost.

Signature verification

Set WEBHOOK_SECRET_<DRIVER> environment variables to enable HMAC signature verification:

WEBHOOK_SECRET_GRAFANA=your-grafana-webhook-secret
WEBHOOK_SECRET_ALERTMANAGER=your-alertmanager-secret

Requests with invalid signatures receive 403 Forbidden.


Monitoring the Deployment

System preflight

uv run python manage.py preflight          # All system checks, grouped
uv run python manage.py preflight --json   # JSON output for CI

Health checks

uv run python manage.py check_health       # CPU, memory, disk, network, process
uv run python manage.py check_health --list

Pipeline history

uv run python manage.py monitor_pipeline --limit 10

Celery worker health

celery -A config inspect ping              # Check if workers are responding
celery -A config inspect active            # Show active tasks

For Docker:

docker compose -f deploy/docker/docker-compose.yml exec celery celery -A config inspect ping

Multi-Instance (Cluster)

Deploy multiple instances across servers: agents monitor locally and push alerts to a hub that runs the full pipeline (intelligence + notifications).

Architecture

Agent (server-1)  ──POST──┐
Agent (server-2)  ──POST──┤──▶  Hub  ──▶  intelligence ──▶ notify
Agent (server-3)  ──POST──┘     (receives cluster alerts)

All instances run the same codebase. Role is determined by environment variables.

Agent setup

On each server you want to monitor:

  1. Install the project (./bin/install.sh — select “agent” when prompted for cluster role)
  2. Add to .env:
HUB_URL=https://monitoring-hub.example.com
WEBHOOK_SECRET_CLUSTER=your-shared-secret
INSTANCE_ID=web-server-01
  1. Schedule the push command via cron:
# Every 5 minutes
*/5 * * * * cd /opt/server-monitoring && uv run python manage.py push_to_hub --json >> push.log 2>&1

Or run manually:

uv run python manage.py push_to_hub              # Push all checker results
uv run python manage.py push_to_hub --dry-run    # Preview without sending
uv run python manage.py push_to_hub --checkers cpu,memory  # Specific checkers

Tip: The installer and bin/install.sh cron can configure all of the above interactively. Manual .env editing is only needed if you skipped the prompts.

Hub setup

On the central monitoring server:

  1. Install the project (./bin/install.sh — select “hub” when prompted for cluster role)
  2. Add to .env:
CLUSTER_ENABLED=1
WEBHOOK_SECRET_CLUSTER=your-shared-secret

The hub accepts cluster payloads at POST /alerts/webhook/cluster/ and processes them through the full pipeline. Each alert carries instance_id and hostname labels for per-server filtering.

Standalone (default)

Existing installs with neither HUB_URL nor CLUSTER_ENABLED set continue to work as standalone instances with no changes.

Verification

After setting up an agent or hub, verify the configuration:

Agent verification:

# Dry-run: builds payload, shows what would be sent (no network call)
uv run python manage.py push_to_hub --dry-run

# Single push: sends one payload to the hub and reports the result
uv run python manage.py push_to_hub

# Push specific checkers only
uv run python manage.py push_to_hub --checkers cpu,memory --dry-run

Hub verification:

# Confirm the cluster driver is registered
uv run python manage.py shell -c "from apps.alerts.drivers import DRIVER_REGISTRY; print('cluster' in DRIVER_REGISTRY)"
# Expected output: True

# Check Django system checks pass
uv run python manage.py check

Security

  • Always use HTTPS for HUB_URL in production. Payloads contain server metrics and alert details.
  • WEBHOOK_SECRET_CLUSTER must be identical on agents and hub. It is used to compute an HMAC-SHA256 signature sent via the X-Cluster-Signature header.
  • Without a shared secret, payloads are accepted unsigned — acceptable for local development but not for production.
  • The shared secret is never transmitted in the payload; only the signature is sent.

Troubleshooting

Symptom Cause Fix
push_to_hub exits with “HUB_URL not configured” HUB_URL missing from .env Add HUB_URL=https://your-hub.example.com to .env
push_to_hub exits with connection refused Hub not running or wrong URL Verify hub is accessible: curl -s $HUB_URL/alerts/webhook/cluster/
push_to_hub returns 403 Forbidden HMAC signature mismatch Ensure WEBHOOK_SECRET_CLUSTER is identical on agent and hub
push_to_hub returns 404 Not Found Cluster driver not registered Set CLUSTER_ENABLED=1 in hub .env and restart
Alerts arrive on hub but no notifications fire Pipeline not configured Run uv run python manage.py setup_instance on the hub
push_to_hub --dry-run shows 0 alerts No checkers returned results Run uv run python manage.py check_health to verify checkers work

This site uses Just the Docs, a documentation theme for Jekyll.