Streaming AI Responses with Server-Sent Events: Complete Developer Guide 2026
Master Streaming AI Responses with Server-Sent Events with practical examples and production patterns
Streaming AI Responses with Server-Sent Events (2026)
Streaming makes an AI app feel fast: instead of waiting for the full answer, you show tokens as they're generated. Server-Sent Events (SSE) is the simplest transport for this — a one-way stream from server to browser over plain HTTP, perfect for LLM token streams. This guide shows the full path from model to browser.
Why SSE (not WebSockets)
LLM streaming is one-directional (server → client), so you don't need WebSockets' bidirectional complexity. SSE is just an HTTP response with Content-Type: text/event-stream that stays open and emits data: lines. Browsers reconnect automatically via the EventSource API.
Server (FastAPI)
python
pip install fastapi openai sse-starlette uvicorn
from fastapi import FastAPI
from sse_starlette.sse import EventSourceResponse
from openai import OpenAIapp = FastAPI()
client = OpenAI()
@app.get("/chat")
async def chat(q: str):
async def gen():
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": q}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
yield {"data": delta}
yield {"event": "done", "data": "[DONE]"}
return EventSourceResponse(gen())
Client (browser)
js
const es = new EventSource('/chat?q=' + encodeURIComponent(input));
es.onmessage = (e) => { output.textContent += e.data; };
es.addEventListener('done', () => es.close());
That's the whole loop: each token arrives as a message event and is appended to the UI.
Production notes
X-Accel-Buffering: no) or chunks get held back and the stream stutters.streamText + useChat give you streaming without hand-writing SSE. See Vercel AI SDK vs LangChain.js.FAQ
SSE or WebSockets? SSE for one-way token streaming — simpler and auto-reconnects. WebSockets only if you need bidirectional realtime. Why does my stream arrive all at once? A buffering proxy. Disable buffering and flush per chunk. How do I stop billing on disconnect? Detect client disconnect and cancel the upstream completion.
Summary
SSE is the path of least resistance for LLM streaming: open a text/event-stream, yield tokens as they arrive, append them in the browser with EventSource. Disable buffering, flush per token, and cancel on disconnect. On Next.js, let the Vercel AI SDK handle it.
*Last updated: June 2026. Verify streaming APIs against the OpenAI and framework docs.*
Also available in 中文.