OpenAI Function Calling & Structured Outputs Complete Guide 2026: Make LLM Return Stable JSON
Say goodbye to AI format chaos—build reliable AI apps with official structured outputs
One of the most common bugs in AI applications: the LLM returns data in a format you didn't expect.
OpenAI's Structured Outputs feature launched in late 2024, and by 2026 it has become the standard for production-grade AI applications.
Why Structured Outputs Are Necessary
Unstable Prompt Approach:
python
Bad practice: relying on prompts to control format
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": "Analyze this review and return JSON with {sentiment, score, category}"
}]
)
Problems: sometimes returns markdown code blocks, sometimes extra fields, sometimes completely wrong format
Stable Structured Output Approach:
python
from openai import OpenAI
from pydantic import BaseModelclient = OpenAI()
Define the expected output structure
class SentimentAnalysis(BaseModel):
sentiment: Literal["positive", "neutral", "negative"]
score: float # Confidence between 0-1
category: str # Review categoryUse the parse() method for 100% structured output
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[{
"role": "user",
"content": "Analyze this review: 'This product is amazing!'"
}],
response_format=SentimentAnalysis # Pass Pydantic model
)result = response.choices[0].message.parsed
print(result.sentiment) # "positive"
print(result.score) # 0.95
print(result.category) # "product_review"
Function Calling: Let AI Invoke Tools
Function Calling enables the LLM to "call" external functions, forming the core mechanism of AI Agents.
2.1 Define Tools
python
from openai import OpenAI
import jsonclient = OpenAI()
Define available tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a specified city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g., Beijing, Shanghai"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"],
"additionalProperties": False
},
"strict": True # Strict mode: must exactly match the schema
}
}
]
2.2 Execute Tool Call Loop
python
def run_agent(user_message):
messages = [{"role": "user", "content": user_message}]
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
# If no tool calls, return the final answer
if not message.tool_calls:
return message.content
# Execute tool calls
messages.append(message)
for tool_call in message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
# Actually execute the function
if function_name == "get_weather":
result = get_weather(**function_args)
# Return the result to the model
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})result = run_agent("What's the weather like in Beijing today?")
Complex Structured Outputs
3.1 Nested Structures
python
from pydantic import BaseModel
from typing import List, Optionalclass Step(BaseModel):
step_number: int
action: str
expected_result: str
class TroubleshootingGuide(BaseModel):
problem_summary: str
root_cause: str
severity: Literal["low", "medium", "high", "critical"]
steps: List[Step]
estimated_time_minutes: int
requires_restart: bool
additional_notes: Optional[str] = None
Usage
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"Generate a troubleshooting guide for the following issue: {problem_description}"
}],
response_format=TroubleshootingGuide
)guide = response.choices[0].message.parsed
for step in guide.steps:
print(f"Step {step.step_number}: {step.action}")
3.2 Batch Processing and Concurrency
python
import asyncio
from openai import AsyncOpenAIclient = AsyncOpenAI()
async def analyze_single(text: str) -> SentimentAnalysis:
response = await client.beta.chat.completions.parse(
model="gpt-4o-mini", # Use mini for batch processing to save costs
messages=[{"role": "user", "content": f"Analyze: {text}"}],
response_format=SentimentAnalysis
)
return response.choices[0].message.parsed
async def batch_analyze(texts: list[str]) -> list[SentimentAnalysis]:
# Concurrent processing with a max concurrency limit of 10
semaphore = asyncio.Semaphore(10)
async def process_with_limit(text):
async with semaphore:
return await analyze_single(text)
results = await asyncio.gather(*[process_with_limit(t) for t in texts])
return results
Process 1000 reviews in minutes
texts = load_reviews() # Load review data
results = asyncio.run(batch_analyze(texts))
Best Practices
4.1 Choosing the Right Model
4.2 Error Handling
python
try:
response = client.beta.chat.completions.parse(...)
result = response.choices[0].message.parsed
if result is None:
# Model refused the request (e.g., safety filters)
handle_refusal(response.choices[0].message.refusal)
except Exception as e:
# Log the error and use a fallback
logger.error(f"Structured output failed: {e}")
result = fallback_handler(user_input)
Further Reading
Also available in 中文.