← Back to tutorials

AI SaaS Architecture Patterns

Common architecture patterns for AI SaaS applications

AI SaaS Architecture Patterns

AI SaaS products share a recognizable architecture once you've seen a few — and the failures are equally patterned: margin death by token costs, one tenant's data in another's answer, a provider outage taking the product down. This guide catalogs the patterns that hold up, layer by layer.

The reference architecture

text
Clients (web/API)
   │
API layer ──── auth, rate limits, usage metering (per-tenant!)
   │
Orchestration service ── prompts, routing, tools, guardrails
   │                       │
AI gateway ── providers    ├── Retrieval layer (per-tenant stores)
(fallback, cost routing)   ├── Job queue (async/batch AI work)
                           └── Eval + observability spine

Three structural decisions define the rest:

1. The gateway seam. All model traffic flows through one layer that owns provider routing, fallback, caching, and *per-tenant cost attribution*. Without this seam, multi-provider, price negotiation, and "which customer is burning our margin?" are all rewrites.

2. Sync vs async split. Interactive features stream (SSE patterns); everything that can wait goes through a queue (and ideally batch APIs at 50% off). The classic startup mistake is running bulk work through the interactive path — it's 2× the cost and competes with user latency. Workers follow the webhook-processor discipline: idempotent, backpressured, shutdown-safe.

3. The config/registry seam. Prompts, model choices, and parameters resolve from a versioned registry, not code — prompt iteration decouples from deploys, rollback is a pointer flip, and every response logs its config version.

Multi-tenancy: the part you cannot retrofit

Tenant isolation in AI SaaS has three layers, each with a failure story behind it:

  • Retrieval isolation: tenant_id filters on every vector query at minimum (pgvector row-level patterns); separate collections/schemas for enterprise tiers; *test for cross-tenant leakage explicitly* — it's the AI-SaaS equivalent of an auth bypass.
  • Context isolation: prompts assembled per-request from per-tenant config; never cache assembled prompts across tenants; tenant data never enters shared few-shot examples.
  • Cost isolation: per-tenant token metering at the gateway — for billing, for abuse caps, and for noticing that one tenant's workload is 40% of your inference bill (it always is).
  • Enterprise tier additions: tenant-pinned data residency, zero-retention provider routes, DPAs, and bring-your-own-key (their API key, your software) — architecturally cheap *if* the gateway seam exists.

    The margin layer (what makes AI SaaS economics work)

    Token costs are a COGS line that classic SaaS doesn't have; the patterns that protect gross margin, in impact order:

  • Tier routing: mini-tier models for the 70% of calls that are classification/extraction/simple-chat; frontier only where evals prove the delta matters.
  • Caching: response caching on repeated questions; cache-friendly prompt structure (stable prefix first) for provider-side prompt-caching discounts.
  • Right-sized context: retrieval top-k discipline and history summarization — context bloat is the silent margin killer.
  • Batchable workloads batched (embeddings, enrichment, reports).
  • Usage-based pricing alignment: if your costs scale per-token but you price per-seat, one power user inverts your margin — meter and tier the product accordingly.
  • The trust layer

  • Guardrails at boundaries: schema validation on all outputs, injection-aware input handling, approval gates on consequential actions.
  • The eval spine: per-feature eval sets gating prompt/model changes (workflow); canary releases for anything user-visible.
  • Observability: traces + cost + quality metrics per tenant per feature (tool options) — "which tenants saw degraded answers yesterday?" must be a query, not an investigation.
  • Build order for a new AI SaaS

    Week 1: skeleton + gateway seam + one feature streaming end-to-end. Week 2-3: registry, eval set, cost metering (before customers, not after). Then features. The seams cost days early and rewrites late — that's the whole pattern catalog in one sentence.


    *Last updated: June 2026.*

    Also available in 中文.

    AI SaaS Architecture Patterns | AI Skill Navigation | AI Skill Navigation