Stop stale ServiceNow incidents when monitoring resolves first

When monitoring says “resolved” but the ITSM ticket stays open

We keep seeing a classic IT ops failure mode:

  • Monitoring/observability (e.g., PagerDuty, Dynatrace, Site24x7, etc.) sends a resolve/clear event.
  • The ServiceNow incident doesn’t transition (or can’t) because of required fields, state rules, or integration mapping.
  • Result: stale incidents, noisy queues, broken metrics, and humans doing “close hygiene” work.

ServiceNow itself has common patterns like auto-close after a resolved window, but integrations often fail to update/close incidents when required fields or inbound rules don’t line up. (servicenow.com)

This is exactly the “AI suggests, nothing executes” gap—except it’s not even AI; it’s brittle automation.

Why this is risky if you let AI “just close it”

Letting an AI agent directly close incidents in ITSM is dangerous:

  • It may close the wrong ticket (bad correlation).
  • It may violate process (missing resolution code/notes, wrong state transitions).
  • It may hide recurring issues (premature closure).

So the pattern we want is:

  • AI can recommend and summarize.
  • Autom Mate is the deterministic execution + control layer that enforces policy, approvals, and auditability before any state change.

Autom Mate is designed to and provide an auditable execution trail, rather than “black box” actions.


End-to-end workflow (governed, deterministic)

1) Trigger

  • Trigger: Monitoring tool sends a RESOLVED webhook (or a “monitor up” event) into Autom Mate.
  • Implementation: Webhook tate native). Webhook format is standard in Autom Mate.

2) Validation (context + policy checks)

Autom Mate performs deterministic checks before touching ITSM:

  • Correlation check: Find the ServiceNow incident by correlation ID / external alert ID.
  • State check: Only proceed if incident is in an allowed state (e.g., In Progress, On Hold, Resolved but not Closed).
  • Required fields check: Ensure resolution_code, resolution_notes, close_notes (whatever your instance requires) are present or can be populated.
  • Recurrence guard: If the same CI/service has reopened N times in X days, route to Problem instead of closing.

3) Approval (human or rule-based)

  • Rule-based auto-approval: If the incident is P3/P4, single alert source, no reopen history, and monitoring has been stable for 30 minutes.
  • Human approval: If P1/P2, major incident linked, or correlation confidence is below threshold.

Approval can be done via:

  • Microsoft Teams message to the on-call/assignment group with Approve/Reject.
    • Integration label: REST/HTTP/Webhook action (post to Teams via your preferred method) or Autom Mate’s Teams bot channel if you’re using conversational flows.

4) Deterministic execution across systems

Once approved, Autom Mate executes a strict sequence:

  • Update ServiceNow incident with:
    • resolution summary (from monitoring payload)
    • timestamps
    • resolution code/notes
    • set state to Resolved (or Closed if your process allows)
  • Add a work note: “Closed by governed automation from monitoring RESOLVED event; correlation key = …”

Integration label:

  • ServiceNow update: REST/HTTP/Webhook action (ServiceNow Table API)

5) Logging / audit

  • Autom Mate logs:
    • inbound webhook payload hash
    • correlation decision
    • approval identity + timestamp
    • exact API calls made + responses

This aligns with Autom Mate’s emphasis on transparent, auditable agent/workflow execution.

6) Exception handling / rollback

If ServiceNow rejects the update (missing required fields, business rule blocks, etc.):

  • Create a follow-up task (or assign back) with the exact error message.
  • Post to Teams: “Auto-close failed; needs human input: missing resolution code.”
  • Optional rollback: if partial updates happened, revert fields to previous values (store pre-change snapshot in Autom Mate variables / datastore).

Two mini examples

Mini example 1: “Resolved event arrives, but incident can’t close”

  • Trigger: Dynatrace sends RESOLVED.
  • Validation: incident found, but resolution_code is mandatory.
  • Autom Mate action:
    • If alert type maps to a known resolution code, populate it.
    • Else request approval + ask resolver to pick from allowed codes in Teams.
  • Execution: update incident deterministically.

(Real-world symptom: resolved messages not closing tickets is a known integration pain point.) (community.dynatrace.com)

Mini example 2: “Auto-close window vs. reopen policy”

  • Trigger: incident has been Resolved for 5 days; ServiceNow auto-close property is inconsistent across groups.
  • Autom Mate scheduled run:
    • Find incidents in Resolved older than X days.
    • If no customer updates and monitoring stable, close with standard notes.
    • If customer commented, route back to assignment group.

(ServiceNow auto-close behavior and configuration issues come up frequently.) (servicenow.com)


Discussion questions

  • Do you treat “monitoring resolved” as sufficient to resolve/close, or do you require a human confirmation step for certain severities?
  • What correlation key has been most reliable for you (alert ID, CI + signature, service offering, etc.), and where does it break down?