← Back to tutorials

Django AI Integration: Complete Integration Guide

Adding AI capabilities to Django web applications

Django AI Integration: Complete Guide

Django teams adding AI face one architectural fact before any code: Django's request cycle is traditionally synchronous, and LLM calls take seconds. Every integration decision flows from where you put that wait. The three viable placements: a Celery task (the Django-idiomatic default), an async view streaming tokens (for chat UX), or — only for fast operations — inline in the request. This guide covers all three plus pgvector for RAG on your existing Postgres.

Placement 1: Celery — the Django answer for AI jobs

LLM work that doesn't need a live token stream (summarize on save, classify a submission, generate a draft) belongs in the task queue you likely already run:

python

tasks.py

from celery import shared_task from openai import OpenAI

@shared_task(bind=True, max_retries=3, retry_backoff=True) def summarize_article(self, article_id): from .models import Article article = Article.objects.get(pk=article_id) try: resp = OpenAI().chat.completions.create( model='gpt-4o-mini', messages=[{'role': 'user', 'content': f'Summarize in 3 bullets:\n{article.body}'}], ) article.summary = resp.choices[0].message.content article.save(update_fields=['summary']) except Exception as exc: raise self.retry(exc=exc) # backoff handles rate limits

python

views.py — request returns instantly; UI shows "generating…" and polls or gets a push

def request_summary(request, pk): summarize_article.delay(pk) return JsonResponse({'status': 'queued'})

This keeps gunicorn workers free (a synchronous worker blocked 8s on OpenAI is a worker not serving requests — under load that's an outage), gives you retries with backoff for free, and leaves an audit trail. Default here unless you specifically need streaming.

Placement 2: Async views + SSE for chat UX

Django has supported async views for years; under an ASGI server (uvicorn/daphne) you can stream tokens without Celery:

python

views.py

import json from django.http import StreamingHttpResponse from openai import AsyncOpenAI

client = AsyncOpenAI()

async def chat_stream(request): prompt = json.loads(request.body)['prompt']

async def gen(): stream = await client.chat.completions.create( model='gpt-4o-mini', messages=[{'role': 'user', 'content': prompt}], stream=True, ) async for chunk in stream: if token := (chunk.choices[0].delta.content or ''): yield f'data: {json.dumps({"token": token})}\n\n' yield 'data: [DONE]\n\n'

resp = StreamingHttpResponse(gen(), content_type='text/event-stream') resp['Cache-Control'] = 'no-cache' resp['X-Accel-Buffering'] = 'no' # nginx: don't buffer return resp

Requirements that trip people: run under ASGI (uvicorn project.asgi:application — streaming through WSGI blocks a worker per connected client), use AsyncOpenAI (a sync client in an async view stalls the event loop — see sync vs async LLM calls), and don't wrap the generator in @login_required-style sync middleware that forces sync mode. ORM calls inside async code need await Model.objects.aget(...) or sync_to_async. Client-side fetch/parse code is identical to the FastAPI streaming recipe.

RAG on the Postgres you already have: pgvector

Django shops rarely need a dedicated vector DB to start — pgvector turns your existing Postgres into one, with migrations and ORM integration via django-pgvector:

python

models.py

from pgvector.django import VectorField, HnswIndex

class DocChunk(models.Model): document = models.ForeignKey('Document', on_delete=models.CASCADE) text = models.TextField() embedding = VectorField(dimensions=1536) # match your embedding model

class Meta: indexes = [HnswIndex(name='chunk_emb_idx', fields=['embedding'], opclasses=['vector_cosine_ops'])]

query

from pgvector.django import CosineDistance hits = (DocChunk.objects .annotate(dist=CosineDistance('embedding', query_embedding)) .order_by('dist')[:5])

Embedding generation belongs in Celery (chunk → embed → save, batched). One transactional database, one backup story, and SQL filters compose with vector search. Scale ceiling and when to graduate to a dedicated store: pgvector guide.

Django-specific production notes

  • Settings discipline: keys via environ/secrets manager, never settings.py literals; pin model names in settings so prompt/model changes are config, not deploys.
  • Cost guardrails: per-user rate limits on AI endpoints (django-ratelimit), and log token usage per request into your DB — you have an ORM, use it for the audit table.
  • Admin as AI ops console: register the audit/jobs models — the Django admin is a free dashboard for inspecting failed generations and spend per user.
  • Cache repeated generations with the framework cache keyed on a hash of (model + template version + input) — Redis backend recommended.
  • FAQ

    Should I switch to FastAPI for the AI parts? If the AI surface is a small streaming API and the rest is classic Django, a sidecar FastAPI service is clean — but async Django views close most of the gap now; don't split the stack just for one endpoint.

    Channels/WebSockets? Only for bidirectional needs (collaborative sessions, voice). One-way token flow is SSE territory — see streaming vs polling.

    LangChain inside Django? Fine where it earns its keep (complex RAG/agents — LangChain vs LlamaIndex); for straightforward calls the provider SDK plus Celery is less machinery.


    *Last updated: June 2026.*

    Also available in 中文.