Stop “expired cert” outages with governed renewals from ITSM
Certificate expirations are one of those problems everyone knows about… until a weekend outage proves the reminders weren’t enough.
The recurring pattern:
- Monitoring detects TLS failures / handshake errors
- Service desk gets flooded with incidents
- Someone renews the cert manually (or worse: renews the wrong one)
- No consistent approvals, no deterministic runbook, and no audit trail of who changed what, where
This is a good example of why AI suggestions alone are risky: an LLM can recommend “renew the cert,” but letting it directly change production endpoints without policy checks + approvals is how you get accidental outages.
Autom Mate fits as the execution + control layer between ITSM/AI and the systems that actually change things (load balancers, gateways, secret stores, etc.). Autom Mate is built to orchestrate incident/change workflows across ITSM and collaboration tools, including approvals and rollback patterns. end workflow (one blueprint)
1) Trigger
- Trigger: A ServiceNow incident is created/updated with category
Certificateand CI/service metadata (e.g., “API Gateway – Prod”). - How it starts:
- ServiceNow record event → Autom Mate flow starts.
- Integration: ServiceNow (Autom Mate library)
2) Validationecks)
Autom Mate enriches and validates before any action:
- Pull CI/service owner + environment (prod/non-prod) from the ticket/CMDB fields.
- Validate policy:
- Is this a standard renewal (same SANs, same key type, same endpoint) or a material change?
- Is the cert within renewal window (e.g., < 30 days) vs already expired?
- Is the requester/assignee allowed to initiate renewal for this service?
- If required fields are missing (endpoint, FQDN, owner), Autom Mate posts a comment and pauses.
Why this matters: ITSM workflows often stall when context is incomplete; automation needs deterministic gates, not “best effort.” (blog.invgate.com)
3) Approval (human or rule-based)
- If non-prod and renewal is “standard”: auto-approve.
- If prod or renewal is “material change”: require explicit approval.
- Approvals are requested in Microsoft Teams with a structured summary:
- impacted service, expiry date, proposed action, rollback plan
- **Microsoft Teams (Autom Mate library)
- ServiceNow approvalsm Mate library)
4) Deterministic execution across systems
Once approved, Autom Mate executes a fixed runbook (no free-form AI actions):
- Create a Change record (or link to an existing one) and attach the execution plan.
- Execute renewal steps via controlled actions:
- Call internal PKI / certificate service API to request/renew (REST/HTTP/Webhook action)
- Deploy cert to target (e.g., load balancer / gateway / web server) (REST/HTTP/Webhook action)
- Restart/reload service if required (REST/HTTP/Webhook action)
- Post progress back to the incident/change.
Autom Mate is designed to orchestrate change workflows and coordinate execution across tools, including rollback whe## 5) Logging / audit
Autom Mate writes an audit-friendly trail:
- Who approved in Teams
- What ticket/change initiated the action
- Which endpoints were updated
- API responses + timestamps
- Final verification results
This aligns with the need for visibility and auditability when automations touch sensitive systems.
#ing / rollback
If verification fails (e.g., handshake still failing, health checks red):
- Autom Mate triggers rollback:
- Re-deploy last-known-good cert bundle (from your internal store) (REST/HTTP/Webhook action)
- Revert config and reload service (REST/HTTP/Webhook action)
- Escalate to on-call in Teams and keep the incident in “Work in Progress / Major Incident” state.
Rollback discipline is repeatedly called out as a key gap in incident/change automation; it must be explicit and rehearsed. (siit.io)
Two mini examples
Mini example 1: “Cert expires in 14 days” (standard renewal)
- Trigger: ServiceNow incident created from monitoring.
- Autom Mate validates it’s non-prod + standard renewal.
- Auto-approves, renews via internal PKI API, deploys, verifies, closes ticket.
Mini example 2: “Cert already expired” (production outage)
- Trigger: Multiple incidents → major incident declared.
- Autom Mate requires approval in Teams (prod + outage).
- Executes renewal + deploy, then runs verification.
- If verification fails, Autom Mate rolls back and escalates with exact failure output.
Why not let AI do this directly?
- AI can misread context (wrong endpoint, wrong environment, wrong cert chain).
- Certificate changes are high-blast-radius.
- You need policy gates + approvals + deterministic execution.
Autom Mate’s role is to keep AI (or humans) from “winging it,” by enford workflow every time.
Discussion questions
- For cert renewals, what do you treat as “standard change” vs “normal change” in your org?
- Where do you want the approval to happen: inside ITSM, or in Teams with ITSM synced for audit?