OpenAI API Best Practices: Production Guide
Production patterns for the OpenAI API including retries and rate limiting
OpenAI API Best Practices: Production Guide
The gap between "calls the OpenAI API" and "runs it in production" is a known checklist: timeout/retry behavior, structured outputs, cost instrumentation, and the failure modes you haven't hit yet. This guide is that checklist, with the code details that matter. (Most of it transfers verbatim to any provider — the Claude API comparison covers the deltas.)
Client configuration: the defaults are not production values
python
from openai import OpenAI, AsyncOpenAIclient = AsyncOpenAI(
timeout=30.0, # default is much higher — bound your tail latency
max_retries=3, # SDK retries 429/5xx with backoff automatically
)
Reliability patterns
finish_reason: a length finish means truncation you must handle, not an answer.Structured outputs: use the real feature
Use schema-enforced structured outputs (not "please return JSON" prose, not legacy json-mode-and-pray):
python
from pydantic import BaseModelclass Ticket(BaseModel):
category: str
urgency: str
summary: str
resp = client.chat.completions.parse( # SDK validates against the schema
model='gpt-5-mini',
messages=[{'role': 'user', 'content': f'Triage this ticket: {body}'}],
response_format=Ticket,
)
ticket = resp.choices[0].message.parsed
Schema enforcement guarantees *shape*, not *sense* — semantic validation (does this ID exist? is the math right?) stays on you (validation guide).
Cost engineering (the practices that actually move the bill)
max_tokens per route — runaway generation on a malformed prompt is a real cost incident class.Security and correctness
gpt-5-2026-xx style snapshots where offered) and re-run your eval before moving pins — silent model drift breaks tuned prompts (prompt sensitivity).FAQ
Chat Completions or Responses API? New builds: Responses (it's where new features land, and Assistants API users are migrating to it). Existing Chat Completions code keeps working — migrate opportunistically.
Organization keys vs project keys? Project-scoped keys with per-project budgets/limits — blast-radius control when a key leaks or a service runs away.
Rate limit headroom? Watch the rate-limit headers, alert at sustained >70% of tier, and request raises before launches, not during them.
*Last updated: June 2026. Parameter names and model tiers move — verify against platform.openai.com/docs.*
Also available in 中文.