Remove `check_and_alert` Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Remove the check_and_alert management command, absorb its unique CLI flags into run_pipeline, and update all references so all executions are tracked through orchestration.

Architecture: The check_and_alert command bypasses orchestration by calling CheckAlertBridge directly. We remove the command, add its flags (--checkers, --hostname, --label, --no-incidents, --warning-threshold, --critical-threshold) to run_pipeline, and wire them through the payload so CheckExecutor passes them to CheckAlertBridge. All doc/script references get updated.

Tech Stack: Django management commands, Python, shell scripts (bash)

Task 1: Add checker flags to `run_pipeline` command

Files:

Modify: apps/orchestration/management/commands/run_pipeline.py

Step 1: Add new arguments to add_arguments

In run_pipeline.py, after the --json argument (around line 103), add:

        parser.add_argument(
            "--checkers",
            nargs="+",
            help="Specific checkers to run (e.g., cpu memory disk). Only used with --checks-only.",
        )
        parser.add_argument(
            "--hostname",
            type=str,
            help="Override hostname in alert labels.",
        )
        parser.add_argument(
            "--label",
            action="append",
            dest="labels",
            metavar="KEY=VALUE",
            help="Additional label for alerts (can be repeated).",
        )
        parser.add_argument(
            "--no-incidents",
            action="store_true",
            help="Skip automatic incident creation.",
        )
        parser.add_argument(
            "--warning-threshold",
            type=float,
            help="Override warning threshold for all checkers.",
        )
        parser.add_argument(
            "--critical-threshold",
            type=float,
            help="Override critical threshold for all checkers.",
        )

Step 2: Wire flags into payload in _get_payload

In the _get_payload method, update the return dict (around line 210) to include the new flags:

        # Parse labels
        labels = {}
        if options.get("labels"):
            for label in options["labels"]:
                if "=" not in label:
                    raise CommandError(f"Invalid label format: {label}. Use KEY=VALUE.")
                key, value = label.split("=", 1)
                labels[key] = value

        # Build checker configs from threshold overrides
        checker_names = options.get("checkers")
        checker_configs = {}
        if checker_names:
            for name in checker_names:
                config = {}
                if options.get("warning_threshold") is not None:
                    config["warning_threshold"] = options["warning_threshold"]
                if options.get("critical_threshold") is not None:
                    config["critical_threshold"] = options["critical_threshold"]
                if config:
                    checker_configs[name] = config
        else:
            # Apply thresholds to all checkers if no specific checkers selected
            if options.get("warning_threshold") is not None or options.get("critical_threshold") is not None:
                checker_configs["__all__"] = {}
                if options.get("warning_threshold") is not None:
                    checker_configs["__all__"]["warning_threshold"] = options["warning_threshold"]
                if options.get("critical_threshold") is not None:
                    checker_configs["__all__"]["critical_threshold"] = options["critical_threshold"]

        return {
            "payload": inner_payload,
            "driver": options["source"] if options["source"] != "cli" else None,
            "checker_names": checker_names,
            "checker_configs": checker_configs if checker_configs else None,
            "labels": labels if labels else None,
            "hostname": options.get("hostname"),
            "no_incidents": options.get("no_incidents", False),
        }

Step 3: Run existing tests to confirm nothing breaks

Run: uv run pytest apps/orchestration/_tests/test_run_pipeline_command.py -v Expected: All existing tests PASS

Step 4: Commit

git add apps/orchestration/management/commands/run_pipeline.py
git commit -m "feat: add checker flags to run_pipeline command"

Task 2: Wire `hostname` and `no_incidents` through `CheckExecutor`

Files:

Modify: apps/orchestration/executors.py

Step 1: Write failing test

Add to apps/orchestration/_tests/test_executors.py:

def test_check_executor_passes_hostname_and_no_incidents(self):
    """CheckExecutor passes hostname and no_incidents to CheckAlertBridge."""
    mock_bridge = mock.Mock()
    mock_bridge.run_checks_and_alert.return_value = mock.Mock(
        checks_run=1,
        errors=[],
        check_results=[],
    )

    ctx = StageContext(
        trace_id="t",
        run_id="r",
        payload={
            "hostname": "web-01",
            "no_incidents": True,
            "checker_names": ["cpu"],
        },
    )

    with patch(
        "apps.alerts.check_integration.CheckAlertBridge",
        return_value=mock_bridge,
    ) as mock_cls:
        executor = CheckExecutor()
        executor.execute(ctx)

    mock_cls.assert_called_once_with(
        hostname="web-01",
        auto_create_incidents=False,
    )

Step 2: Run test to verify it fails

Run: uv run pytest apps/orchestration/_tests/test_executors.py::CheckExecutorTest::test_check_executor_passes_hostname_and_no_incidents -v Expected: FAIL — CheckAlertBridge() is called with no args

Step 3: Modify CheckExecutor.execute in executors.py

Replace lines 110-113 (the CheckAlertBridge instantiation block):

            from apps.alerts.check_integration import CheckAlertBridge

            hostname = payload.get("hostname")
            no_incidents = payload.get("no_incidents", False)

            bridge_kwargs = {}
            if hostname:
                bridge_kwargs["hostname"] = hostname
            bridge_kwargs["auto_create_incidents"] = not no_incidents

            bridge = CheckAlertBridge(**bridge_kwargs)

Step 4: Run test to verify it passes

Run: uv run pytest apps/orchestration/_tests/test_executors.py -v Expected: All PASS

Step 5: Commit

git add apps/orchestration/executors.py apps/orchestration/_tests/test_executors.py
git commit -m "feat: wire hostname and no_incidents through CheckExecutor"

Task 3: Write tests for new `run_pipeline` flags

Files:

Modify: apps/orchestration/_tests/test_run_pipeline_command.py

Step 1: Write tests for the new flags

Add these tests to RunPipelineCommandTest:

@mock.patch("apps.orchestration.management.commands.run_pipeline.PipelineOrchestrator")
def test_checks_only_with_checkers_flag(self, mock_orchestrator):
    """--checks-only with --checkers passes checker_names in payload."""
    mock_result = mock.Mock()
    mock_result.status = "COMPLETED"
    mock_result.trace_id = "t"
    mock_result.run_id = "r"
    mock_result.total_duration_ms = 1.0
    mock_result.ingest = None
    mock_result.check = {"checks_run": 1, "checks_passed": 1, "checks_failed": 0, "duration_ms": 1}
    mock_result.analyze = None
    mock_result.notify = None
    mock_result.errors = []
    mock_result.to_dict.return_value = {"status": "COMPLETED"}
    mock_orchestrator.return_value.run_pipeline.return_value = mock_result

    out = io.StringIO()
    call_command("run_pipeline", "--checks-only", "--checkers", "cpu", "memory", stdout=out)

    call_args = mock_orchestrator.return_value.run_pipeline.call_args
    payload = call_args[1]["payload"] if "payload" in call_args[1] else call_args[0][0]
    self.assertEqual(payload["checker_names"], ["cpu", "memory"])

@mock.patch("apps.orchestration.management.commands.run_pipeline.PipelineOrchestrator")
def test_checks_only_with_hostname_flag(self, mock_orchestrator):
    """--hostname is passed through the payload."""
    mock_result = mock.Mock()
    mock_result.status = "COMPLETED"
    mock_result.trace_id = "t"
    mock_result.run_id = "r"
    mock_result.total_duration_ms = 1.0
    mock_result.ingest = None
    mock_result.check = {"checks_run": 1, "checks_passed": 1, "checks_failed": 0, "duration_ms": 1}
    mock_result.analyze = None
    mock_result.notify = None
    mock_result.errors = []
    mock_result.to_dict.return_value = {"status": "COMPLETED"}
    mock_orchestrator.return_value.run_pipeline.return_value = mock_result

    out = io.StringIO()
    call_command("run_pipeline", "--checks-only", "--hostname", "web-01", stdout=out)

    call_args = mock_orchestrator.return_value.run_pipeline.call_args
    payload = call_args[1]["payload"] if "payload" in call_args[1] else call_args[0][0]
    self.assertEqual(payload["hostname"], "web-01")

@mock.patch("apps.orchestration.management.commands.run_pipeline.PipelineOrchestrator")
def test_checks_only_with_labels(self, mock_orchestrator):
    """--label KEY=VALUE flags are parsed and passed in payload."""
    mock_result = mock.Mock()
    mock_result.status = "COMPLETED"
    mock_result.trace_id = "t"
    mock_result.run_id = "r"
    mock_result.total_duration_ms = 1.0
    mock_result.ingest = None
    mock_result.check = {"checks_run": 1, "checks_passed": 1, "checks_failed": 0, "duration_ms": 1}
    mock_result.analyze = None
    mock_result.notify = None
    mock_result.errors = []
    mock_result.to_dict.return_value = {"status": "COMPLETED"}
    mock_orchestrator.return_value.run_pipeline.return_value = mock_result

    out = io.StringIO()
    call_command(
        "run_pipeline", "--checks-only",
        "--label", "env=production", "--label", "team=sre",
        stdout=out,
    )

    call_args = mock_orchestrator.return_value.run_pipeline.call_args
    payload = call_args[1]["payload"] if "payload" in call_args[1] else call_args[0][0]
    self.assertEqual(payload["labels"], {"env": "production", "team": "sre"})

def test_invalid_label_format_raises_error(self):
    """--label without = raises CommandError."""
    out = io.StringIO()
    with self.assertRaises(CommandError) as ctx:
        call_command("run_pipeline", "--checks-only", "--label", "badlabel", stdout=out)
    self.assertIn("KEY=VALUE", str(ctx.exception))

@mock.patch("apps.orchestration.management.commands.run_pipeline.PipelineOrchestrator")
def test_checks_only_with_no_incidents(self, mock_orchestrator):
    """--no-incidents flag is passed in payload."""
    mock_result = mock.Mock()
    mock_result.status = "COMPLETED"
    mock_result.trace_id = "t"
    mock_result.run_id = "r"
    mock_result.total_duration_ms = 1.0
    mock_result.ingest = None
    mock_result.check = {"checks_run": 1, "checks_passed": 1, "checks_failed": 0, "duration_ms": 1}
    mock_result.analyze = None
    mock_result.notify = None
    mock_result.errors = []
    mock_result.to_dict.return_value = {"status": "COMPLETED"}
    mock_orchestrator.return_value.run_pipeline.return_value = mock_result

    out = io.StringIO()
    call_command("run_pipeline", "--checks-only", "--no-incidents", stdout=out)

    call_args = mock_orchestrator.return_value.run_pipeline.call_args
    payload = call_args[1]["payload"] if "payload" in call_args[1] else call_args[0][0]
    self.assertTrue(payload["no_incidents"])

@mock.patch("apps.orchestration.management.commands.run_pipeline.PipelineOrchestrator")
def test_checks_only_with_threshold_overrides(self, mock_orchestrator):
    """--warning-threshold and --critical-threshold are passed as checker_configs."""
    mock_result = mock.Mock()
    mock_result.status = "COMPLETED"
    mock_result.trace_id = "t"
    mock_result.run_id = "r"
    mock_result.total_duration_ms = 1.0
    mock_result.ingest = None
    mock_result.check = {"checks_run": 1, "checks_passed": 1, "checks_failed": 0, "duration_ms": 1}
    mock_result.analyze = None
    mock_result.notify = None
    mock_result.errors = []
    mock_result.to_dict.return_value = {"status": "COMPLETED"}
    mock_orchestrator.return_value.run_pipeline.return_value = mock_result

    out = io.StringIO()
    call_command(
        "run_pipeline", "--checks-only",
        "--warning-threshold", "60", "--critical-threshold", "80",
        stdout=out,
    )

    call_args = mock_orchestrator.return_value.run_pipeline.call_args
    payload = call_args[1]["payload"] if "payload" in call_args[1] else call_args[0][0]
    self.assertIn("__all__", payload["checker_configs"])
    self.assertEqual(payload["checker_configs"]["__all__"]["warning_threshold"], 60.0)
    self.assertEqual(payload["checker_configs"]["__all__"]["critical_threshold"], 80.0)

Step 2: Run tests to verify they pass

Run: uv run pytest apps/orchestration/_tests/test_run_pipeline_command.py -v -k "test_checks_only_with_checkers_flag or test_checks_only_with_hostname_flag or test_checks_only_with_labels or test_invalid_label_format or test_checks_only_with_no_incidents or test_checks_only_with_threshold" Expected: All PASS (since Task 1 already added the flags)

Step 3: Commit

git add apps/orchestration/_tests/test_run_pipeline_command.py
git commit -m "test: add tests for new run_pipeline checker flags"

Task 4: Delete `check_and_alert` command

Files:

Delete: apps/alerts/management/commands/check_and_alert.py

Step 1: Delete the file

rm apps/alerts/management/commands/check_and_alert.py

Step 2: Run full test suite to verify nothing imports the deleted command

Run: uv run pytest -v Expected: All PASS (no test imports check_and_alert — tests only exist for CheckAlertBridge)

Step 3: Commit

git add -u apps/alerts/management/commands/check_and_alert.py
git commit -m "refactor: remove check_and_alert command (absorbed into run_pipeline)"

Task 5: Update `bin/setup_cron.sh`

Files:

Modify: bin/setup_cron.sh

Step 1: Replace the cron command on line 96

Change:

CRON_CMD="cd $PROJECT_DIR && $UV_PATH run python manage.py check_and_alert --json >> $PROJECT_DIR/cron.log 2>&1"

To:

CRON_CMD="cd $PROJECT_DIR && $UV_PATH run python manage.py run_pipeline --checks-only --json >> $PROJECT_DIR/cron.log 2>&1"

Step 2: Commit

git add bin/setup_cron.sh
git commit -m "chore: update cron to use run_pipeline --checks-only"

Task 6: Update `bin/setup_aliases.sh`

Files:

Modify: bin/setup_aliases.sh

Step 1: Replace alias on line 99

Change:

alias ${prefix}-check-and-alert='cd "${PROJECT_DIR}" && uv run python manage.py check_and_alert'

To:

alias ${prefix}-check-and-alert='cd "${PROJECT_DIR}" && uv run python manage.py run_pipeline --checks-only'

Step 2: Commit

git add bin/setup_aliases.sh
git commit -m "chore: update check-and-alert alias to use run_pipeline"

Task 7: Update `bin/cli.sh`

Files:

Modify: bin/cli.sh

Step 1: Rewrite check_and_alert_menu() (lines 350-386)

Replace the entire function with one that calls run_pipeline --checks-only with appropriate flags:

check_and_alert_menu() {
    show_banner
    echo -e "${BOLD}═══ Run Checks Pipeline ═══${NC}"
    echo ""
    echo "Run health checks through the orchestrated pipeline"
    echo ""
    echo -e "${BOLD}Available options:${NC}"
    echo "  --checkers NAME...     Specific checkers to run"
    echo "  --hostname=HOST        Override hostname in labels"
    echo "  --no-incidents         Skip incident creation"
    echo ""

    local options=(
        "Run all checks"
        "Run specific checkers"
        "Run all checks (dry run)"
        "Run all checks (JSON output)"
        "Back"
    )

    select opt in "${options[@]}"; do
        case $REPLY in
            1)
                confirm_and_run "uv run python manage.py run_pipeline --checks-only"
                ;;
            2)
                read -p "Enter checker names (space-separated): " checker_names
                if [ -n "$checker_names" ]; then
                    confirm_and_run "uv run python manage.py run_pipeline --checks-only --checkers $checker_names"
                else
                    echo -e "${RED}Checker names required${NC}"
                fi
                ;;
            3)
                confirm_and_run "uv run python manage.py run_pipeline --checks-only --dry-run"
                ;;
            4)
                confirm_and_run "uv run python manage.py run_pipeline --checks-only --json"
                ;;
            5)
                return
                ;;
            *)
                echo -e "${RED}Invalid option${NC}"
                ;;
        esac
        break
    done
}

Step 2: Update the menu item text on line 284

Change:

        "check_and_alert - Run checker and create alert"

To:

        "check_and_alert - Run checks pipeline"

Step 3: Commit

git add bin/cli.sh
git commit -m "chore: update cli.sh to use run_pipeline --checks-only"

Task 8: Update `bin/README.md`

Files:

Modify: bin/README.md

Step 1: Update references

On line 15, change:

| `sm-check-and-alert` | `check_and_alert` | alerts | Run checks and create alerts/incidents |

To:

| `sm-check-and-alert` | `run_pipeline --checks-only` | orchestration | Run checks through orchestrated pipeline |

On line 29, change:

- [`apps/alerts/README.md`](../apps/alerts/README.md) — `check_and_alert` (9 flags)

To:

- [`apps/orchestration/README.md`](../apps/orchestration/README.md) — `run_pipeline` (includes checker flags)

On line 130, change:

- Writes crontab entry for `check_and_alert --json`

To:

- Writes crontab entry for `run_pipeline --checks-only --json`

Step 2: Commit

git add bin/README.md
git commit -m "docs: update bin/README.md references from check_and_alert to run_pipeline"

Task 9: Update `apps/alerts/README.md`

Files:

Modify: apps/alerts/README.md

Step 1: Replace the check_and_alert CLI reference section (lines 342-446)

Replace the entire section with a redirect:

### Checks via Pipeline

The health-check-to-alert workflow is now handled by the orchestrated pipeline. Use:

```bash
# Run all checks through the pipeline
uv run python manage.py run_pipeline --checks-only

# Run specific checkers
uv run python manage.py run_pipeline --checks-only --checkers cpu memory disk

# Dry run
uv run python manage.py run_pipeline --checks-only --dry-run --json

# Custom labels and hostname
uv run python manage.py run_pipeline --checks-only --label env=production --hostname web-01

# Skip incidents
uv run python manage.py run_pipeline --checks-only --no-incidents

# Threshold overrides
uv run python manage.py run_pipeline --checks-only --warning-threshold 60 --critical-threshold 80

See apps/orchestration/README.md for full run_pipeline documentation.

**Step 2: Update the cron examples section (lines 516-522)**

Replace:

*/5 * * * * cd /path/to/project && python manage.py check_and_alert –json » /var/log/health-checks.log 2>&1

- - - - cd /path/to/project && python manage.py check_and_alert –checkers cpu memory –json
        With:
        
        */5 * * * * cd /path/to/project && uv run python manage.py run_pipeline –checks-only –json » /var/log/health-checks.log 2>&1
- - - - cd /path/to/project && uv run python manage.py run_pipeline –checks-only –checkers cpu memory –json ```

Step 3: Commit

git add apps/alerts/README.md
git commit -m "docs: update alerts README to reference run_pipeline --checks-only"

Task 10: Update remaining docs

Files:

Modify: README.md
Modify: docs/Architecture.md
Modify: docs/Installation.md
Modify: docs/Setup-Guide.md
Modify: apps/checkers/README.md

Step 1: Update README.md line 66

Change:

uv run python manage.py check_and_alert

To:

uv run python manage.py run_pipeline --checks-only

Step 2: Update docs/Architecture.md line 79

Change:

| `check_and_alert` | alerts | Run checks and create alerts from results. Flags: `--dry-run`, `--no-incidents`, `--checkers` |

To:

| `run_pipeline --checks-only` | orchestration | Run checks through pipeline. Additional flags: `--checkers`, `--no-incidents`, `--hostname`, `--label`, `--warning-threshold`, `--critical-threshold` |

Step 3: Update docs/Installation.md lines 81 and 248

Change both occurrences of:

uv run python manage.py check_and_alert --json

To:

uv run python manage.py run_pipeline --checks-only --json

Step 4: Update docs/Setup-Guide.md

Lines 69 and 255 — change:

2) Local crontab  (check_and_alert via cron)

To:

2) Local crontab  (run_pipeline --checks-only via cron)

Lines 223 — change:

uv run python manage.py check_and_alert --json

To:

uv run python manage.py run_pipeline --checks-only --json

Step 5: Update apps/checkers/README.md line 65

Change:

> **Note:** Check runs are audit records and cannot be added/edited manually. They are created automatically when checks are run via management commands or the CheckAlertBridge.

To:

> **Note:** Check runs are audit records and cannot be added/edited manually. They are created automatically when checks are run via management commands or the orchestration pipeline.

Step 6: Commit

git add README.md docs/Architecture.md docs/Installation.md docs/Setup-Guide.md apps/checkers/README.md
git commit -m "docs: update all references from check_and_alert to run_pipeline --checks-only"

Task 11: Update `CLAUDE.md`

Files:

Modify: CLAUDE.md

Step 1: Replace the check_and_alert reference in Essential Commands

The current line:

uv run python manage.py check_health --list

Confirm there’s no check_and_alert reference in CLAUDE.md. If there is, replace with run_pipeline --checks-only.

Step 2: Commit (if changes needed)

git add CLAUDE.md
git commit -m "docs: update CLAUDE.md references"

Task 12: Run full test suite and coverage check

Step 1: Run full tests

Run: uv run pytest -v Expected: All PASS

Step 2: Run coverage

Run: uv run coverage run -m pytest && uv run coverage report Expected: No regression — check_and_alert.py is gone so its uncovered lines disappear.

Step 3: Run pre-commit hooks

Run: uv run pre-commit run --all-files Expected: All PASS

Task 13: Update `apps/alerts/tasks.py` docstring

Files:

Modify: apps/alerts/tasks.py

Step 1: Update the docstring on line 103

Change:

    In this starter implementation, we run all checks via CheckAlertBridge.
    This already has solid logic for executing checkers and creating alerts.

To:

    Runs all checks via CheckAlertBridge, which handles executing checkers
    and creating alerts.

Step 2: Commit

git add apps/alerts/tasks.py
git commit -m "docs: simplify tasks.py docstring"

Remove check_and_alert Implementation Plan

Task 1: Add checker flags to run_pipeline command

Task 2: Wire hostname and no_incidents through CheckExecutor

Task 3: Write tests for new run_pipeline flags

Task 4: Delete check_and_alert command

Task 5: Update bin/setup_cron.sh

Task 6: Update bin/setup_aliases.sh

Task 7: Update bin/cli.sh

Task 8: Update bin/README.md

Task 9: Update apps/alerts/README.md

Task 10: Update remaining docs

Task 11: Update CLAUDE.md

Task 12: Run full test suite and coverage check

Task 13: Update apps/alerts/tasks.py docstring

Remove `check_and_alert` Implementation Plan

Task 1: Add checker flags to `run_pipeline` command

Task 2: Wire `hostname` and `no_incidents` through `CheckExecutor`

Task 3: Write tests for new `run_pipeline` flags

Task 4: Delete `check_and_alert` command

Task 5: Update `bin/setup_cron.sh`

Task 6: Update `bin/setup_aliases.sh`

Task 7: Update `bin/cli.sh`

Task 8: Update `bin/README.md`

Task 9: Update `apps/alerts/README.md`

Task 11: Update `CLAUDE.md`

Task 13: Update `apps/alerts/tasks.py` docstring