Deterministic CI/CD Secret-Exposure Response (Without Breaking Builds)

Problem (showing up a lot right now)

CI/CD supply-chain incidents keep repeating a familiar pattern: a compromised GitHub Action / workflow path leaks secrets (PATs, cloud keys, package registry tokens) into logs or exfiltrates them, and teams respond with a mix of panic-rotations, ad-hoc repo permission changes, and “AI says rotate everything” playbooks.

The hard part isn’t knowing what to do—it’s doing it safely, consistently, and provably across dozens/hundreds of repos without breaking builds or locking out responders.

Proposed community topic

A governed, deterministic “CI/CD secret exposure response” pattern in Autom Mate that:

  • accepts high-signal alerts (from GitHub / secret scanners / SIEM)
  • validates scope and blast radius
  • requires explicit approvals for destructive steps
  • executes rotations and permission changes deterministically
  • produces an audit trail you can hand to security + compliance

End-to-end workflow (trigger → validation → approvals → deterministic execution → logging → exception handling)

1) Trigger

  • Trigger on an inbound Webhook from your detector (e.g., secret scanner, SIEM, or GitHub security alert webhook).
  • Payload includes: repo, workflow name, commit SHA, suspected secret type, detection confidence, and evidence pointers.

2) Validation (guardrails before any action)

  • Enrich and validate via REST/HTTP calls:
    • Confirm the repo exists and is in-scope (org allowlist).
    • Confirm the workflow run/commit SHA matches the alert.
    • Classify the secret type (GitHub token vs cloud key vs package registry token).
    • Check whether the secret is actually referenced by active pipelines (to avoid unnecessary outages).
  • If confidence < threshold or scope is ambiguous → route to “human review only” path.

3) Approvals (two-stage)

  • Approval A (Security): approve the containment plan (what will be revoked/rotated, what will be paused).
  • Approval B (Service Owner): approve downtime-impacting steps (e.g., disabling workflows, rotating prod deploy keys).

4) Deterministic execution (no free-form AI actions)

Execute a fixed, pre-approved runbook (all steps parameterized, no improvisation):

  • Containment:
    • Reduce token permissions / revoke compromised token(s) where possible.
    • Temporarily disable the affected workflow(s) or block the risky trigger path.
  • Rotation:
    • Rotate the specific secret(s) (package registry token, cloud key, etc.).
    • Update secret stores / repo secrets.
  • Recovery:
    • Re-enable workflows after a clean test run.
    • Open/append a ticket with the full action log.

5) Logging & auditability

  • Write a structured “incident action ledger” per run:
    • who approved what
    • exact API calls executed
    • before/after state (where safe)
    • timestamps + correlation IDs

6) Exception handling / rollback

  • If rotation breaks builds:
    • auto-create a high-priority ticket and page the owner.
    • roll back only non-security-critical changes (e.g., re-enable workflow with restricted permissions), while keeping revoked tokens revoked.
  • If an API call fails mid-run:
    • stop the run, mark partial completion, and require re-approval to continue.

Two mini examples

Mini example 1: “Compromised GitHub Action leaked secrets into logs”

  • Trigger: alert includes repo + run ID.
  • Validation: confirm the run used a vulnerable action version.
  • Approval: Security approves revoking GitHub tokens + rotating package registry token.
  • Execution: revoke token(s), rotate npm token, update repo secrets, re-run pipeline.

Mini example 2: “Suspicious workflow added to a repo (possible GhostAction-style implant)”

  • Trigger: webhook from repo monitoring tool.
  • Validation: diff workflow YAML, confirm new outbound network destinations.
  • Approval: Service owner approves disabling workflows; Security approves org-wide token review.
  • Execution: disable workflow, quarantine branch, rotate only the secrets that workflow could access.

Discussion questions

  • What’s your minimum approval bar for revocation vs rotation vs workflow disablement?
  • Do you prefer “pause everything” containment, or “surgical containment” based on workflow permission scoping?