Django AI Integration: Complete Integration Guide

Adding AI capabilities to Django web applications

Django AI Integration: Complete Guide

Django teams adding AI face one architectural fact before any code: Django's request cycle is traditionally synchronous, and LLM calls take seconds. Every integration decision flows from where you put that wait. The three viable placements: a Celery task (the Django-idiomatic default), an async view streaming tokens (for chat UX), or — only for fast operations — inline in the request. This guide covers all three plus pgvector for RAG on your existing Postgres.

Placement 1: Celery — the Django answer for AI jobs

LLM work that doesn't need a live token stream (summarize on save, classify a submission, generate a draft) belongs in the task queue you likely already run:

python
tasks.py
from celery import shared_task
from openai import OpenAI@shared_task(bind=True, max_retries=3, retry_backoff=True)
def summarize_article(self, article_id):
    from .models import Article
    article = Article.objects.get(pk=article_id)
    try:
        resp = OpenAI().chat.completions.create(
            model='gpt-4o-mini',
            messages=[{'role': 'user', 'content': f'Summarize in 3 bullets:\n{article.body}'}],
        )
        article.summary = resp.choices[0].message.content
        article.save(update_fields=['summary'])
    except Exception as exc:
        raise self.retry(exc=exc)   # backoff handles rate limits

python
views.py — request returns instantly; UI shows "generating…" and polls or gets a push
def request_summary(request, pk):
    summarize_article.delay(pk)
    return JsonResponse({'status': 'queued'})

This keeps gunicorn workers free (a synchronous worker blocked 8s on OpenAI is a worker not serving requests — under load that's an outage), gives you retries with backoff for free, and leaves an audit trail. Default here unless you specifically need streaming.

Placement 2: Async views + SSE for chat UX

Django has supported async views for years; under an ASGI server (uvicorn/daphne) you can stream tokens without Celery:

python
views.py
import json
from django.http import StreamingHttpResponse
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def chat_stream(request):
    prompt = json.loads(request.body)['prompt']
    async def gen():
        stream = await client.chat.completions.create(
            model='gpt-4o-mini',
            messages=[{'role': 'user', 'content': prompt}],
            stream=True,
        )
        async for chunk in stream:
            if token := (chunk.choices[0].delta.content or ''):
                yield f'data: {json.dumps({"token": token})}\n\n'
        yield 'data: [DONE]\n\n'    resp = StreamingHttpResponse(gen(), content_type='text/event-stream')
    resp['Cache-Control'] = 'no-cache'
    resp['X-Accel-Buffering'] = 'no'   # nginx: don't buffer
    return resp

Requirements that trip people: run under ASGI (uvicorn project.asgi:application — streaming through WSGI blocks a worker per connected client), use AsyncOpenAI (a sync client in an async view stalls the event loop — see sync vs async LLM calls), and don't wrap the generator in @login_required-style sync middleware that forces sync mode. ORM calls inside async code need await Model.objects.aget(...) or sync_to_async. Client-side fetch/parse code is identical to the FastAPI streaming recipe.

RAG on the Postgres you already have: pgvector

Django shops rarely need a dedicated vector DB to start — pgvector turns your existing Postgres into one, with migrations and ORM integration via django-pgvector:

python
models.py
from pgvector.django import VectorField, HnswIndex
class DocChunk(models.Model):
    document = models.ForeignKey('Document', on_delete=models.CASCADE)
    text = models.TextField()
    embedding = VectorField(dimensions=1536)   # match your embedding model
    class Meta:
        indexes = [HnswIndex(name='chunk_emb_idx', fields=['embedding'],
                             opclasses=['vector_cosine_ops'])]
query
from pgvector.django import CosineDistance
hits = (DocChunk.objects
        .annotate(dist=CosineDistance('embedding', query_embedding))
        .order_by('dist')[:5])

Embedding generation belongs in Celery (chunk → embed → save, batched). One transactional database, one backup story, and SQL filters compose with vector search. Scale ceiling and when to graduate to a dedicated store: pgvector guide.

Django-specific production notes

Settings discipline: keys via environ/secrets manager, never settings.py literals; pin model names in settings so prompt/model changes are config, not deploys.

Cost guardrails: per-user rate limits on AI endpoints (django-ratelimit), and log token usage per request into your DB — you have an ORM, use it for the audit table.

Admin as AI ops console: register the audit/jobs models — the Django admin is a free dashboard for inspecting failed generations and spend per user.

Cache repeated generations with the framework cache keyed on a hash of (model + template version + input) — Redis backend recommended.

FAQ

Should I switch to FastAPI for the AI parts? If the AI surface is a small streaming API and the rest is classic Django, a sidecar FastAPI service is clean — but async Django views close most of the gap now; don't split the stack just for one endpoint.

Channels/WebSockets? Only for bidirectional needs (collaborative sessions, voice). One-way token flow is SSE territory — see streaming vs polling.

LangChain inside Django? Fine where it earns its keep (complex RAG/agents — LangChain vs LlamaIndex); for straightforward calls the provider SDK plus Celery is less machinery.

*Last updated: June 2026.*

Also available in 中文.