OpenAI Function Calling & Structured Outputs Complete Guide 2026: Make LLM Return Stable JSON

Say goodbye to AI format chaos—build reliable AI apps with official structured outputs

One of the most common bugs in AI applications: the LLM returns data in a format you didn't expect.

OpenAI's Structured Outputs feature launched in late 2024, and by 2026 it has become the standard for production-grade AI applications.

Why Structured Outputs Are Necessary

Unstable Prompt Approach:

python
Bad practice: relying on prompts to control format
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "Analyze this review and return JSON with {sentiment, score, category}"
    }]
)
Problems: sometimes returns markdown code blocks, sometimes extra fields, sometimes completely wrong format

Stable Structured Output Approach:

python
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
Define the expected output structure
class SentimentAnalysis(BaseModel):
    sentiment: Literal["positive", "neutral", "negative"]
    score: float  # Confidence between 0-1
    category: str  # Review category
Use the parse() method for 100% structured output
response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "Analyze this review: 'This product is amazing!'"
    }],
    response_format=SentimentAnalysis  # Pass Pydantic model
)result = response.choices[0].message.parsed
print(result.sentiment)  # "positive"
print(result.score)       # 0.95
print(result.category)    # "product_review"

Function Calling: Let AI Invoke Tools

Function Calling enables the LLM to "call" external functions, forming the core mechanism of AI Agents.

2.1 Define Tools

python
from openai import OpenAI
import json
client = OpenAI()
Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a specified city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g., Beijing, Shanghai"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"],
                "additionalProperties": False
            },
            "strict": True  # Strict mode: must exactly match the schema
        }
    }
]

2.2 Execute Tool Call Loop

python
def run_agent(user_message):
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        
        message = response.choices[0].message
        
        # If no tool calls, return the final answer
        if not message.tool_calls:
            return message.content
        
        # Execute tool calls
        messages.append(message)
        
        for tool_call in message.tool_calls:
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)
            
            # Actually execute the function
            if function_name == "get_weather":
                result = get_weather(**function_args)
            
            # Return the result to the model
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })result = run_agent("What's the weather like in Beijing today?")

Complex Structured Outputs

3.1 Nested Structures

python
from pydantic import BaseModel
from typing import List, Optional
class Step(BaseModel):
    step_number: int
    action: str
    expected_result: str
class TroubleshootingGuide(BaseModel):
    problem_summary: str
    root_cause: str
    severity: Literal["low", "medium", "high", "critical"]
    steps: List[Step]
    estimated_time_minutes: int
    requires_restart: bool
    additional_notes: Optional[str] = None
Usage
response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"Generate a troubleshooting guide for the following issue: {problem_description}"
    }],
    response_format=TroubleshootingGuide
)guide = response.choices[0].message.parsed
for step in guide.steps:
    print(f"Step {step.step_number}: {step.action}")

3.2 Batch Processing and Concurrency

python
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def analyze_single(text: str) -> SentimentAnalysis:
    response = await client.beta.chat.completions.parse(
        model="gpt-4o-mini",  # Use mini for batch processing to save costs
        messages=[{"role": "user", "content": f"Analyze: {text}"}],
        response_format=SentimentAnalysis
    )
    return response.choices[0].message.parsed
async def batch_analyze(texts: list[str]) -> list[SentimentAnalysis]:
    # Concurrent processing with a max concurrency limit of 10
    semaphore = asyncio.Semaphore(10)
    
    async def process_with_limit(text):
        async with semaphore:
            return await analyze_single(text)
    
    results = await asyncio.gather(*[process_with_limit(t) for t in texts])
    return results
Process 1000 reviews in minutes
texts = load_reviews()  # Load review data
results = asyncio.run(batch_analyze(texts))

Best Practices

4.1 Choosing the Right Model

ScenarioRecommended Model

Accuracy firstgpt-4o Speed/cost firstgpt-4o-mini Batch processinggpt-4o-mini + concurrency Local/privacyOllama + Qwen (supports tool calling)

4.2 Error Handling

python
try:
    response = client.beta.chat.completions.parse(...)
    result = response.choices[0].message.parsed
    
    if result is None:
        # Model refused the request (e.g., safety filters)
        handle_refusal(response.choices[0].message.refusal)
    
except Exception as e:
    # Log the error and use a fallback
    logger.error(f"Structured output failed: {e}")
    result = fallback_handler(user_input)

OpenAI Function Calling & Structured Outputs Complete Guide 2026: Make LLM Return Stable JSON

Why Structured Outputs Are Necessary

Bad practice: relying on prompts to control format

Problems: sometimes returns markdown code blocks, sometimes extra fields, sometimes completely wrong format

Define the expected output structure

Use the parse() method for 100% structured output

Function Calling: Let AI Invoke Tools

2.1 Define Tools

Define available tools