When monitoring says “resolved” but the ITSM ticket stays open
We keep seeing a classic IT ops failure mode:
- Monitoring/observability (e.g., PagerDuty, Dynatrace, Site24x7, etc.) sends a resolve/clear event.
- The ServiceNow incident doesn’t transition (or can’t) because of required fields, state rules, or integration mapping.
- Result: stale incidents, noisy queues, broken metrics, and humans doing “close hygiene” work.
ServiceNow itself has common patterns like auto-close after a resolved window, but integrations often fail to update/close incidents when required fields or inbound rules don’t line up. (servicenow.com)
This is exactly the “AI suggests, nothing executes” gap—except it’s not even AI; it’s brittle automation.
Why this is risky if you let AI “just close it”
Letting an AI agent directly close incidents in ITSM is dangerous:
- It may close the wrong ticket (bad correlation).
- It may violate process (missing resolution code/notes, wrong state transitions).
- It may hide recurring issues (premature closure).
So the pattern we want is:
- AI can recommend and summarize.
- Autom Mate is the deterministic execution + control layer that enforces policy, approvals, and auditability before any state change.
Autom Mate is designed to and provide an auditable execution trail, rather than “black box” actions.
End-to-end workflow (governed, deterministic)
1) Trigger
- Trigger: Monitoring tool sends a
RESOLVEDwebhook (or a “monitor up” event) into Autom Mate. - Implementation:
Webhooktate native). Webhook format is standard in Autom Mate.
2) Validation (context + policy checks)
Autom Mate performs deterministic checks before touching ITSM:
- Correlation check: Find the ServiceNow incident by correlation ID / external alert ID.
- State check: Only proceed if incident is in an allowed state (e.g.,
In Progress,On Hold,Resolvedbut notClosed). - Required fields check: Ensure
resolution_code,resolution_notes,close_notes(whatever your instance requires) are present or can be populated. - Recurrence guard: If the same CI/service has reopened N times in X days, route to Problem instead of closing.
3) Approval (human or rule-based)
- Rule-based auto-approval: If the incident is P3/P4, single alert source, no reopen history, and monitoring has been stable for 30 minutes.
- Human approval: If P1/P2, major incident linked, or correlation confidence is below threshold.
Approval can be done via:
- Microsoft Teams message to the on-call/assignment group with Approve/Reject.
- Integration label: REST/HTTP/Webhook action (post to Teams via your preferred method) or Autom Mate’s Teams bot channel if you’re using conversational flows.
4) Deterministic execution across systems
Once approved, Autom Mate executes a strict sequence:
- Update ServiceNow incident with:
- resolution summary (from monitoring payload)
- timestamps
- resolution code/notes
- set state to
Resolved(orClosedif your process allows)
- Add a work note: “Closed by governed automation from monitoring RESOLVED event; correlation key = …”
Integration label:
- ServiceNow update: REST/HTTP/Webhook action (ServiceNow Table API)
5) Logging / audit
- Autom Mate logs:
- inbound webhook payload hash
- correlation decision
- approval identity + timestamp
- exact API calls made + responses
This aligns with Autom Mate’s emphasis on transparent, auditable agent/workflow execution.
6) Exception handling / rollback
If ServiceNow rejects the update (missing required fields, business rule blocks, etc.):
- Create a follow-up task (or assign back) with the exact error message.
- Post to Teams: “Auto-close failed; needs human input: missing resolution code.”
- Optional rollback: if partial updates happened, revert fields to previous values (store pre-change snapshot in Autom Mate variables / datastore).
Two mini examples
Mini example 1: “Resolved event arrives, but incident can’t close”
- Trigger: Dynatrace sends
RESOLVED. - Validation: incident found, but
resolution_codeis mandatory. - Autom Mate action:
- If alert type maps to a known resolution code, populate it.
- Else request approval + ask resolver to pick from allowed codes in Teams.
- Execution: update incident deterministically.
(Real-world symptom: resolved messages not closing tickets is a known integration pain point.) (community.dynatrace.com)
Mini example 2: “Auto-close window vs. reopen policy”
- Trigger: incident has been
Resolvedfor 5 days; ServiceNow auto-close property is inconsistent across groups. - Autom Mate scheduled run:
- Find incidents in
Resolvedolder than X days. - If no customer updates and monitoring stable, close with standard notes.
- If customer commented, route back to assignment group.
- Find incidents in
(ServiceNow auto-close behavior and configuration issues come up frequently.) (servicenow.com)
Discussion questions
- Do you treat “monitoring resolved” as sufficient to resolve/close, or do you require a human confirmation step for certain severities?
- What correlation key has been most reliable for you (alert ID, CI + signature, service offering, etc.), and where does it break down?