AnalysisRun Audit Logging via BaseProvider.run()
Context
/admin/intelligence/analysisrun/ shows no rows because no production code creates AnalysisRun records. The model exists with a full lifecycle API (mark_started, mark_succeeded, mark_failed) but is never wired into any execution path.
Additionally, BaseProvider exposes two redundant abstract methods — analyze(incident) and get_recommendations() — that both providers already collapse internally (analyze(None) delegates to get_recommendations()). Every call site duplicates the same if/else routing between them.
Goals
- Every provider execution creates an
AnalysisRunaudit record — pipeline, management command, and view - Collapse the two-method provider interface into a single
analyze(incident=None, analysis_type="") - Sensitive provider config (API keys) is redacted before storage
- Audit logging never breaks analysis execution (DB failures caught and logged)
Design
BaseProvider interface
Remove get_recommendations() as an abstract method. Add run() and _redact_config():
class BaseProvider(ABC):
name: str = "base"
description: str = "Base intelligence provider"
@abstractmethod
def analyze(
self,
incident: Any | None = None,
analysis_type: str = "",
) -> list[Recommendation]:
"""Single public analysis method."""
...
def run(
self,
*,
incident: Any | None = None,
analysis_type: str = "",
trace_id: str = "",
pipeline_run_id: str = "",
provider_config: dict | None = None,
) -> list[Recommendation]:
"""Wrap analyze(), manage AnalysisRun lifecycle, return recommendations."""
...
@staticmethod
def _redact_config(config: dict) -> dict:
"""Mask sensitive keys before storage."""
...
run() lifecycle
- Create
AnalysisRunin PENDING state with provider info, incident FK, trace/pipeline IDs, and redacted config - Call
mark_started() - Call
self.analyze(incident, analysis_type) - On success:
mark_succeeded(recommendations=[r.to_dict() for r in recs]) - On exception:
mark_failed(error_message=str(exc)), then re-raise - DB failures (creating/updating AnalysisRun) caught and logged — never break analysis
Key difference from CheckRun: provider exceptions are re-raised after mark_failed() because callers (executor, views) have their own error handling. CheckRun swallows exceptions into an UNKNOWN result because checkers must always return a result.
Config redaction
SENSITIVE_PATTERNS = {"key", "secret", "token", "password", "api"}
@staticmethod
def _redact_config(config: dict) -> dict:
return {
k: "***" if any(p in k.lower() for p in SENSITIVE_PATTERNS) else v
for k, v in config.items()
}
Provider changes
LocalRecommendationProvider:
analyze(incident=None, analysis_type="")becomes the single entry pointanalysis_type="memory"→ calls_get_memory_recommendations()analysis_type="disk"→ calls_get_disk_recommendations()incidentprovided → existing incident-type detection (unchanged)- Neither → general system scan (current
get_recommendations()body inlined) get_recommendations()deleted
OpenAIRecommendationProvider:
analyze(incident=None, analysis_type="")— returns[]when no incident (unchanged behavior)analysis_typeignored (OpenAI is incident-driven only)get_recommendations()deleted
Call sites
All become provider.run(...):
| File | Before | After |
|---|---|---|
apps/orchestration/executors.py | provider.analyze(incident) / provider.get_recommendations() | provider.run(incident=incident, trace_id=ctx.trace_id, pipeline_run_id=ctx.run_id, provider_config=provider_config) |
apps/intelligence/management/commands/get_recommendations.py | provider.analyze(incident) / provider._get_memory_recommendations() / provider._get_disk_recommendations() / provider.get_recommendations() | provider.run(incident=incident, analysis_type=...) |
apps/intelligence/views/recommendations.py | provider.analyze(incident) / provider.get_recommendations() | provider.run(incident=incident) |
apps/intelligence/views/memory.py | provider._get_memory_recommendations() | provider.run(analysis_type="memory") |
apps/intelligence/views/disk.py | provider._get_disk_recommendations(path) | provider.run(analysis_type="disk") |
apps/intelligence/providers/local.py | get_local_recommendations() convenience function | Update to call provider.run() |
Testing
- BaseProvider.run(): FakeProvider with ~8 tests mirroring CheckRun pattern — lifecycle states, duration, trace_id, pipeline_run_id, incident FK, DB failure resilience, exception handling, config redaction
- Provider refactor: Update existing local/openai tests —
get_recommendations()→analyze(), verifyanalysis_typerouting in local provider - Call sites: Update existing executor/command/view tests to verify
run()is called
Files touched
| File | Change |
|---|---|
apps/intelligence/providers/base.py | Remove get_recommendations() abstract, add run(), _redact_config() |
apps/intelligence/providers/local.py | Merge get_recommendations() into analyze(), add analysis_type param |
apps/intelligence/providers/openai.py | Remove get_recommendations(), add analysis_type param |
apps/orchestration/executors.py | Replace analyze()/get_recommendations() routing → provider.run() |
apps/intelligence/management/commands/get_recommendations.py | Route all modes through provider.run() |
apps/intelligence/views/recommendations.py | provider.run() |
apps/intelligence/views/memory.py | provider.run(analysis_type="memory") |
apps/intelligence/views/disk.py | provider.run(analysis_type="disk") |
apps/intelligence/_tests/... | New and updated tests |
Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Where to manage AnalysisRun lifecycle | BaseProvider.run() | Mirrors CheckRun pattern, single instrumentation point |
| No incident available | incident=None is fine | FK is already nullable, admin handles it |
| Provider config storage | Redacted | Mask keys matching sensitive patterns, useful for debugging |
| Scope of audit logging | All entry points | Security: every provider execution must have an audit trail |
| Two-method split | Collapse into analyze(incident=None) | get_recommendations() is redundant, both providers already delegate internally |
| Targeted analysis modes | analysis_type hint parameter | Routes through run() for audit logging while preserving diagnostic shortcuts |
| Provider exceptions | Re-raise after mark_failed() | Callers have their own error handling, unlike CheckRun which swallows into UNKNOWN |