Django AI Integration: Complete Integration Guide
Adding AI capabilities to Django web applications
Django AI Integration: Complete Guide
Django teams adding AI face one architectural fact before any code: Django's request cycle is traditionally synchronous, and LLM calls take seconds. Every integration decision flows from where you put that wait. The three viable placements: a Celery task (the Django-idiomatic default), an async view streaming tokens (for chat UX), or — only for fast operations — inline in the request. This guide covers all three plus pgvector for RAG on your existing Postgres.
Placement 1: Celery — the Django answer for AI jobs
LLM work that doesn't need a live token stream (summarize on save, classify a submission, generate a draft) belongs in the task queue you likely already run:
python
tasks.py
from celery import shared_task
from openai import OpenAI@shared_task(bind=True, max_retries=3, retry_backoff=True)
def summarize_article(self, article_id):
from .models import Article
article = Article.objects.get(pk=article_id)
try:
resp = OpenAI().chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': f'Summarize in 3 bullets:\n{article.body}'}],
)
article.summary = resp.choices[0].message.content
article.save(update_fields=['summary'])
except Exception as exc:
raise self.retry(exc=exc) # backoff handles rate limits
python
views.py — request returns instantly; UI shows "generating…" and polls or gets a push
def request_summary(request, pk):
summarize_article.delay(pk)
return JsonResponse({'status': 'queued'})
This keeps gunicorn workers free (a synchronous worker blocked 8s on OpenAI is a worker not serving requests — under load that's an outage), gives you retries with backoff for free, and leaves an audit trail. Default here unless you specifically need streaming.
Placement 2: Async views + SSE for chat UX
Django has supported async views for years; under an ASGI server (uvicorn/daphne) you can stream tokens without Celery:
python
views.py
import json
from django.http import StreamingHttpResponse
from openai import AsyncOpenAIclient = AsyncOpenAI()
async def chat_stream(request):
prompt = json.loads(request.body)['prompt']
async def gen():
stream = await client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': prompt}],
stream=True,
)
async for chunk in stream:
if token := (chunk.choices[0].delta.content or ''):
yield f'data: {json.dumps({"token": token})}\n\n'
yield 'data: [DONE]\n\n'
resp = StreamingHttpResponse(gen(), content_type='text/event-stream')
resp['Cache-Control'] = 'no-cache'
resp['X-Accel-Buffering'] = 'no' # nginx: don't buffer
return resp
Requirements that trip people: run under ASGI (uvicorn project.asgi:application — streaming through WSGI blocks a worker per connected client), use AsyncOpenAI (a sync client in an async view stalls the event loop — see sync vs async LLM calls), and don't wrap the generator in @login_required-style sync middleware that forces sync mode. ORM calls inside async code need await Model.objects.aget(...) or sync_to_async. Client-side fetch/parse code is identical to the FastAPI streaming recipe.
RAG on the Postgres you already have: pgvector
Django shops rarely need a dedicated vector DB to start — pgvector turns your existing Postgres into one, with migrations and ORM integration via django-pgvector:
python
models.py
from pgvector.django import VectorField, HnswIndexclass DocChunk(models.Model):
document = models.ForeignKey('Document', on_delete=models.CASCADE)
text = models.TextField()
embedding = VectorField(dimensions=1536) # match your embedding model
class Meta:
indexes = [HnswIndex(name='chunk_emb_idx', fields=['embedding'],
opclasses=['vector_cosine_ops'])]
query
from pgvector.django import CosineDistance
hits = (DocChunk.objects
.annotate(dist=CosineDistance('embedding', query_embedding))
.order_by('dist')[:5])
Embedding generation belongs in Celery (chunk → embed → save, batched). One transactional database, one backup story, and SQL filters compose with vector search. Scale ceiling and when to graduate to a dedicated store: pgvector guide.
Django-specific production notes
environ/secrets manager, never settings.py literals; pin model names in settings so prompt/model changes are config, not deploys.FAQ
Should I switch to FastAPI for the AI parts? If the AI surface is a small streaming API and the rest is classic Django, a sidecar FastAPI service is clean — but async Django views close most of the gap now; don't split the stack just for one endpoint.
Channels/WebSockets? Only for bidirectional needs (collaborative sessions, voice). One-way token flow is SSE territory — see streaming vs polling.
LangChain inside Django? Fine where it earns its keep (complex RAG/agents — LangChain vs LlamaIndex); for straightforward calls the provider SDK plus Celery is less machinery.
*Last updated: June 2026.*
Also available in 中文.