Real-Time AI Streaming with WebSockets and SSE
Build responsive AI applications with streaming responses
Real-Time AI Streaming with WebSockets and SSE
Build responsive AI applications with streaming responses
Learn to implement real-time AI response streaming using Server-Sent Events and WebSockets. Build ChatGPT-like streaming UIs with Next.js and FastAPI.
Real-Time AI Streaming with WebSockets and SSE
Why Streaming Matters
Streaming AI responses provides:Server-Sent Events (SSE) Approach
SSE is simpler than WebSockets for one-directional streaming:FastAPI Backend
python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import openaiapp = FastAPI()
@app.post("/stream")
async def stream_response(prompt: str):
async def generate():
stream = await openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
stream=True
)
async for chunk in stream:
if chunk.choices[0].delta.content:
yield f"data: {chunk.choices[0].delta.content}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
Next.js Frontend
typescript
async function streamLLM(prompt: string, onChunk: (text: string) => void) {
const response = await fetch('/api/stream', {
method: 'POST',
body: JSON.stringify({ prompt }),
headers: { 'Content-Type': 'application/json' }
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') return;
onChunk(data);
}
}
}
}
Next.js App Router Streaming
typescript
// app/api/chat/route.ts
import { OpenAIStream, StreamingTextResponse } from 'ai';export async function POST(req: Request) {
const { messages } = await req.json();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
stream: true,
messages
});
const stream = OpenAIStream(response);
return new StreamingTextResponse(stream);
}
Error Handling for Streams
Implement retry logic and partial response recovery for resilient streaming.相关工具
相关教程
Build complex multi-step AI workflows with state management using LangGraph
Chain-of-thought, tree-of-thoughts, self-consistency, and systematic evaluation methods
Deploy Llama 3 with 20x higher throughput than naive serving