Security
This document describes the security posture, configuration, and guidelines for the server monitoring system.
Secret Management
Django Secret Key
The DJANGO_SECRET_KEY environment variable is required. The application raises a RuntimeError at startup if it is not set (config/settings.py:35).
# Generate a production-grade key
python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"
Environment Variables
Secrets are loaded from environment variables via python-dotenv (config/env.py). Key rules:
- Never commit
.envfiles — only.env.sampleis tracked - Existing shell environment variables always take precedence (
override=False) .env.devis loaded only whenDJANGO_ENV=dev
Security-sensitive variables:
| Variable | Purpose | Required |
|---|---|---|
DJANGO_SECRET_KEY | Django cryptographic signing | Yes (enforced) |
DJANGO_DB_PASSWORD | Database credentials | When using MySQL/PostgreSQL |
CELERY_BROKER_URL | Redis connection (may contain password) | When using Celery |
DB-Stored Secrets
API keys and credentials for notification channels and intelligence providers are stored in database JSON fields:
NotificationChannel.config— webhook URLs, SMTP passwords, API keysIntelligenceProvider.config— AI provider API keys
In production deployments:
- Use secret references rather than raw values where possible
- Restrict database access to the Django application user
- Consider encrypting sensitive fields at the application layer for high-security environments
Django Security Configuration
Middleware Stack
The following security middleware is enabled (config/settings.py:64-72):
| Middleware | Protection |
|---|---|
SecurityMiddleware | HTTPS redirects, HSTS, content type sniffing |
CsrfViewMiddleware | Cross-site request forgery protection |
AuthenticationMiddleware | Session-based authentication |
XFrameOptionsMiddleware | Clickjacking protection |
Password Validation
All four Django password validators are enabled:
UserAttributeSimilarityValidatorMinimumLengthValidatorCommonPasswordValidatorNumericPasswordValidator
HTTPS Hardening (Production)
The following settings should be enabled in production environments via environment variables or a production settings module:
SECURE_SSL_REDIRECT = True
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True
SECURE_HSTS_SECONDS = 31536000 # 1 year
SECURE_HSTS_INCLUDE_SUBDOMAINS = True
SECURE_HSTS_PRELOAD = True
SECURE_CONTENT_TYPE_NOSNIFF = True
Debug Mode
DJANGO_DEBUG defaults to 1 (enabled) for local development. Always set DJANGO_DEBUG=0 in production.
Webhook Security
CSRF Exemption
The webhook endpoint (POST /alerts/webhook/) is CSRF-exempted because external alerting systems (Grafana, Alertmanager, PagerDuty, etc.) cannot include Django CSRF tokens. This is standard practice for webhook receivers.
Payload Validation
Each alert driver validates incoming payloads structurally via validate() and parse() methods. Drivers check for required keys and expected payload shapes.
API Key Authentication
API endpoints require authentication via API key for non-GET requests. Keys are managed via the Django admin.
Setup
- Enable: set
API_KEY_AUTH_ENABLED=1in your environment - Create a key via admin (
/admin/config_app/apikey/) or shell:
from config.models import APIKey
key = APIKey.objects.create(name="my-client")
print(key.key) # 40-char hex token
Usage
Include the key in every request via one of:
Authorization: Bearer <key>
X-API-Key: <key>
Endpoint Restrictions
Keys can optionally restrict access to specific path prefixes via the allowed_endpoints JSON field (e.g., ["/alerts/"]). Empty list = access all API endpoints.
Exempt Paths
The following paths do not require an API key:
/alerts/webhook/—GEThealth check for the webhook endpoint/intelligence/health/—GETservice health status/admin/*— Django session auth (not API key auth)/static/*
All other GET and POST requests on API paths (/alerts/, /orchestration/, /notify/, /intelligence/) require a valid key. In particular, data-returning endpoints such as /orchestration/pipelines/, /intelligence/providers/, and /intelligence/recommendations/ are not exempt.
Webhook Signature Verification
Drivers support opt-in HMAC signature verification. When a secret is configured for a driver, incoming webhooks must include a valid signature.
Configuration
Set an environment variable per driver:
| Variable | Driver |
|---|---|
WEBHOOK_SECRET_GRAFANA | Grafana |
WEBHOOK_SECRET_PAGERDUTY | PagerDuty |
WEBHOOK_SECRET_NEWRELIC | New Relic |
WEBHOOK_SECRET_GENERIC | Generic webhook |
Drivers without native signature support (Alertmanager, Datadog, OpsGenie, Zabbix) do not perform verification.
How It Works
- The driver declares its signature header (e.g.,
X-Grafana-Signature) - On incoming POST, if the env var is set, the middleware computes
HMAC-SHA256(secret, request.body)and compares with the header value - Missing or invalid signature →
403 Forbidden - No env var configured → verification skipped (opt-in)
Rate Limiting
Application-level rate limiting using Django cache with fixed-window counters (one bucket per UTC minute per identity/prefix).
Configuration
Enable: RATE_LIMIT_ENABLED=1
Default limits (configurable via RATE_LIMITS in settings):
| Path prefix | Limit |
|---|---|
/alerts/ | 120 req/min |
/orchestration/ | 30 req/min |
/notify/ | 30 req/min |
/intelligence/ | 20 req/min |
Identity
Limits are tracked per API key name (if authenticated) or per client IP (via X-Forwarded-For or REMOTE_ADDR).
Cache Backend
Rate limiting requires a shared cache backend (Redis or Memcached) in multi-process deployments. A Django system check (config.W001) warns if rate limiting is enabled with LocMemCache.
Current Limitations
The following are known areas for improvement:
- DB-stored secrets not encrypted — API keys and provider credentials in JSON fields are not encrypted at rest. Consider field-level encryption for high-security environments.
Data Handling
Redacted References
The pipeline stores references to data rather than raw payloads to avoid leaking secrets:
normalized_payload_ref— Reference to normalized inbound payload (no raw secrets)checker_output_ref— Reference to checker outputintelligence_output_ref— Reference to AI analysis (prompt/response refs, redacted)notify_output_ref— Reference to notification delivery results
Logging
- Never log raw webhook payloads that may contain credentials
- Never log API keys, tokens, or webhook URLs
- Use structured logging with
trace_id/run_idfor correlation without exposing sensitive data
Celery Security
- JSON-only serialization —
CELERY_ACCEPT_CONTENT,CELERY_TASK_SERIALIZER, andCELERY_RESULT_SERIALIZERare all set tojson, preventing pickle deserialization attacks - Broker URL may contain credentials — treat
CELERY_BROKER_URLas a secret
CI Security Checks
Automated Checks (.github/workflows/security.yml)
The security workflow runs automatically on:
- Every push to
main - Pull requests that change Python files,
pyproject.toml,uv.lock, Docker config, or the workflow itself
Code Security job:
| Check | Tool | What it does |
|---|---|---|
| Dependency audit | pip-audit | Scans installed packages for known CVEs |
| Security lint | bandit | Static analysis for common Python security issues |
| Secret detection | detect-secrets | Scans for accidentally committed credentials |
Docker Security job:
| Check | Tool | What it does |
|---|---|---|
| Image vulnerability scan | trivy | Scans the Docker image for OS and library CVEs (blocks on CRITICAL) |
| HIGH vulnerability report | trivy | Reports HIGH-severity vulnerabilities (non-blocking) |
Addressing Security Alerts
When a vulnerability is reported (by pip-audit, GitHub Dependabot, or manual audit):
- Identify the package and fix version — check the CVE details for the patched version
- Bump the dependency:
- Direct dependency: update version in
pyproject.toml, thenuv lock - Transitive dependency:
uv lock --upgrade-package <package>
- Direct dependency: update version in
- Verify the fix:
uv sync --extra dev && uv run pip-audit --strict --desc - Create a PR — the security workflow triggers automatically for dependency changes
- Merge promptly — security fixes should not wait in review queues
For Docker image vulnerabilities (trivy), rebuild with an updated base image or pin a patched version of the affected OS package.
CI Pipeline (.github/workflows/ci.yml)
- Lint: Black formatting + Ruff linting (catches common issues)
- Type check: mypy with django-stubs (catches type-related bugs)
- Tests: pytest across Python 3.10, 3.11, 3.12
- Django checks:
manage.py check+ migration consistency
Admin Interface
The Django admin (/admin/) is the primary operations surface:
- Protected by Django’s built-in staff/superuser authentication
- Custom
MonitoringAdminSiteinherits all default auth protections - Admin actions (acknowledge, resolve, retry) require authenticated staff access
Reporting Vulnerabilities
If you discover a security vulnerability, please report it responsibly by opening a private issue or contacting the maintainer directly. Do not disclose vulnerabilities in public issues.