Stop duplicate subscription entitlements when payment webhooks retry

Stop duplicate subscription entitlements when payment webhooks retry

Payment webhooks will retry. If your handler is slow, returns a 5xx, or you have a transient DB outage, you can receive the same “payment_succeeded / invoice_paid / charge_refunded” event multiple times. If your downstream actions aren’t idempotent, you end up with:

  • Double-granted entitlements (two seats, two plan upgrades, two credits)
  • Refunds that don’t revoke access (or revoke twice)
  • Finance ops spending days reconciling “why does the ledger say X but the product says Y?”

This is a classic “AI can spot it, but AI shouldn’t do it” problem: an LLM can help classify messy cases, but the actual financial/product actions must be deterministic, approval-gated where needed, and fully auditable.

Below is a workflow pattern where AI suggests, Autom Mate executes under control.


End-to-end governed workflow (Autom Mate)

1) Trigger

  • Trigger: Incoming payment provider webhook (e.g., invoice.paid, charge.refunded, payment_failed) via REST/HTTP/Webhook action into Autom Mate.
  • Store the raw payload immediately (immutable) for later audit/replay.

2) Validation (before any side effects)

  • Signature / authenticity check: Validate webhook signature (provider-specific) using PYTHON library.
  • Schema validation: Ensure required fields exist (event id, customer id, amount, currency, status) using Condition modules.
  • Idempotency gate (hard stop):
    • Look up event_id (and optionally event_type) in an “event ledger” table.
    • If already processed → exit (return 200 OK) with “duplicate” decision logged.
    • If not processed → create a “processing” record.

(Why this matters: payment APIs and webhook docs commonly recommend idempotency keys / dedupe by event id to prevent duplicate transactions and duplicate downstream effects. (cashfree.com))

3) AI triage (advisory only)

  • Use an LLM step to suggest a classification and next action:
    • “Safe auto-grant” vs “needs review” vs “block/hold”
    • Confidence + rationale
    • Risk flags (amount unusually high, customer recently refunded, multiple retries, etc.)

Important: AI output is never used as the final executor. It only proposes.

4) Approvals (human or policy-based)

  • Policy-based auto-approval for low-risk cases:
    • e.g., amount < $100, known customer, no prior disputes, event is terminal
  • Human approval required for high-risk cases:
    • e.g., refunds after fulfillment, large upgrades, repeated retries, mismatched currency
  • Approval request sent to a finance/product ops channel via Autom Mate library (if your environment has Teams/Slack installed) or REST/HTTP/Webhook action to your chat/ITSM tool.

5) Deterministic execution (the controlled “do” step)

Once approved (or policy-approved), Autom Mate executes exactly once:

  • Grant / update entitlements in your product system via REST/HTTP/Webhook action
    • Use a deterministic idempotency key like entitlement:{customer_id}:{invoice_id}
  • Post to internal ledger / billing DB via Database library microservice (if available in your deployment) or REST/HTTP/Webhook action to your finance service
  • Update ticket / case (optional) via Autom Mate library (if you use an ITSM tool that’s installed) or REST/HTTP/Webhook action

Autom Mate’s orchestration model is designed for multi-step flows with monitoring and traceability across actions.

6) Logging / audit trail

Log every decision and action:

  • Raw webhook payload hash
  • Signature validation result
  • Idempotency decision (new vs duplicate)
  • AI suggestion + confidence (advisory)
  • Approval identity + timestamp (or policy rule id)
  • Execution steps + responses
  • Autom version executed (for change/audit traceability)

Autom Mate supports execution monitoring and version-aware traceability, which is useful when auditors ask “what logic ran at the time?”

7) Exception handling / rollback

  • If entitlement grant succeeds but ledger post fails:
    • Mark the run as “partial”
    • Create a compensating action (e.g., revoke entitlement) only if your policy allows automatic rollback
    • Otherwise route to human approval for rollback
  • If webhook processing fails mid-run:
    • Keep the idempotency record in “failed” with error details
    • Allow safe replay (same event id) without double-granting

Two mini examples

Example A — Duplicate invoice.paid arrives 3 times

  • Webhook #1: passes signature, not seen before → approved by policy → entitlement granted → ledger posted → mark processed.
  • Webhook #2 and #3: same event_id → Autom Mate exits early, logs “duplicate”, returns 200.

Example B — charge.refunded after fulfillment

  • AI suggests “high risk: refund after fulfillment; needs review.”
  • Autom Mate opens an approval task for finance/product ops.
  • If approved: revoke entitlement + create a credit memo entry (deterministic steps).
  • If rejected: keep entitlement, log decision, attach rationale.

Why AI alone is risky here

  • Webhook payloads can be ambiguous; AI may misread “final” vs “intermediate” states.
  • AI can hallucinate or over-generalize edge cases (partial refunds, multi-invoice customers, proration).
  • Financial/product actions must be idempotent, deterministic, and auditable—especially when retries and replays are expected. (cashfree.com)

Autom Mate’s role is to enforce the guardrails: validations, approvals, deterministic execution, and a complete audit trail.


Discussion questions

  • Where do you draw the line for policy auto-approval vs human approval (amount threshold, customer risk tier, event type)?
  • Do you prefer compensating rollbacks (revoke entitlement) or manual remediation when downstream posting fails?