AI Webhook Processor Template: Starter Guide

Event-driven AI processing template with webhooks

AI Webhook Processor: Starter Template

"Webhook in → AI processes → action out" is the backbone shape of half of all practical AI automation: classify the new support ticket, summarize the merged PR, triage the form submission. This starter guide gives you the production-shaped template — acknowledge fast, verify signatures, process async, stay idempotent — because the naive version (do the LLM call inside the webhook handler) fails in exactly four predictable ways.

Why the naive version fails

text
Naive: webhook → [LLM call, 5-30s] → 200 OK

Timeouts: webhook senders expect a response in seconds; an LLM call blows the budget and the sender retries → duplicate processing.

Retry storms: any 5xx triggers sender retries; without idempotency you process N times.

No signature check = anyone who finds the URL feeds garbage (or prompt injection) into your pipeline.

Backpressure: a burst of events (bulk import, incident flood) piles concurrent LLM calls until rate limits or memory give out.

The template

python
FastAPI + queue worker shape — swap the queue for your stack (SQS/BullMQ/pg-boss)
import hashlib, hmac, json
from fastapi import FastAPI, Request, HTTPException
app = FastAPI()
@app.post('/webhooks/tickets')
async def receive(request: Request):
    raw = await request.body()
    # 1. Verify signature on the RAW body (parsing first breaks the MAC)
    sig = request.headers.get('x-signature', '')
    expected = hmac.new(SECRET, raw, hashlib.sha256).hexdigest()
    if not hmac.compare_digest(sig, expected):
        raise HTTPException(401)    event = json.loads(raw)
    # 2. Idempotency key from the sender's event ID
    if await already_processed(event['id']):
        return {'status': 'duplicate'}            # 200 — sender stops retrying
    # 3. Enqueue and ACK immediately — milliseconds, not seconds
    await queue.send(event)
    return {'status': 'queued'}

python
worker.py — where the AI actually runs
async def handle(event: dict):
    if await already_processed(event['id']):       # check again at execution time
        return
    result = await classify_ticket(event['payload'])   # the LLM step, schema-validated
    await act(result)                                   # route / label / notify
    await mark_processed(event['id'], result)           # provenance row

The four fixes, mapped: ack-then-process kills timeouts; event-ID idempotency (checked at receive AND execute) kills retry duplicates; HMAC on the raw body kills spoofing; the queue gives you backpressure, concurrency caps, and retry-with-backoff for free.

The AI step itself

Standard discipline applies, condensed:

Structured output with a schema, validated before any action (Zod vs Pydantic) — webhook payloads are exactly where malformed output must not flow into actions.

Treat payload content as untrusted: it's user-originated text entering a prompt — prompt-injection surface. Keep instructions in the system role, payload strictly as data, and gate consequential actions (approval patterns).

Mini-tier model + semaphore concurrency in the worker (async patterns); bursts drain at your pace, not the sender's.

Dead-letter queue for events that fail N times — with the LLM's error attached, so debugging starts with context.

Operational notes

Replay capability: store raw events (with TTL) so you can re-run after a prompt fix — pairs with provenance discipline on results.

Graceful shutdown: workers must finish-and-ack or nack-for-redelivery on SIGTERM — the shutdown guide covers it.

Monitor the queue depth, not just errors — a growing backlog is your earliest warning of rate-limit trouble or a poison event.

No-code version: this exact shape is buildable in n8n (Webhook trigger → AI node → actions) for internal-tool-grade loads; graduate to the coded template when volume, latency, or compliance demand it.

FAQ

Why check idempotency twice? Receive-time check stops queue pollution; execute-time check covers the race where two retries both enqueued before the first marked done. Cheap insurance.

Sync response required by the sender? Some webhooks want a verdict in the response. Options: a fast-path small model with a strict timeout (and a safe default on timeout), or renegotiate to async callback — don't put frontier-model latency inside a webhook SLA.

Batch the LLM calls? If the sender bursts (100s of events/minute) and per-event latency doesn't matter, micro-batch in the worker (group 10-20 per prompt) — order-of-magnitude cost/throughput win, same batching logic at small scale.

*Last updated: June 2026.*

Also available in 中文.