OpenAI Assistants API Complete Tutorial 2026: Building Persistent AI Assistants

Thread Management, Code Interpreter, File Search — Full Hands-On with the Assistants API

OpenAI Assistants API Complete Tutorial 2026

Why Choose the Assistants API?

FeatureChat APIAssistants API

Conversation memoryMust manage yourselfAutomatic (Threads) Code executionNot supportedCode Interpreter File searchNot supportedFile Search (RAG) Asynchronous processingNot supportedRuns state machine

Core Concepts

Assistant: Defines the AI persona; created once, reused many times.

Thread: A single conversation that stores all message history.

Run: Executes a single Assistant call (queued → in_progress → completed).

Complete Example

python
from openai import OpenAI
client = OpenAI()
Create an assistant
assistant = client.beta.assistants.create(
    name="Data Analysis Assistant",
    instructions="You are a professional data analyst.",
    model="gpt-4o",
    tools=[{"type": "code_interpreter"}, {"type": "file_search"}]
)
Create a thread and send a message
thread = client.beta.threads.create()
client.beta.threads.messages.create(
    thread_id=thread.id, role="user",
    content="Analyze the sales data and find the fastest-growing category"
)
Run and get results
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)
if run.status == 'completed':
    messages = client.beta.threads.messages.list(thread_id=thread.id)
    print(messages.data[0].content[0].text.value)

File Search (Built-in RAG)

python
file = client.files.create(file=open("docs.pdf", "rb"), purpose="assistants")
vs = client.beta.vector_stores.create(name="Knowledge Base")
client.beta.vector_stores.files.create(vector_store_id=vs.id, file_id=file.id)
client.beta.assistants.update(
    assistant_id=assistant.id,
    tool_resources={"file_search": {"vector_store_ids": [vs.id]}}
)

Streaming Output

python
with client.beta.threads.runs.stream(thread_id=thread.id, assistant_id=assistant.id) as stream:
    for text in stream.text_deltas:
        print(text, end="", flush=True)

Production Considerations

Set max_completion_tokens to control costs.

Runs can take tens of seconds; use polling or webhooks.

Regularly clean up long-inactive threads.

Also available in 中文.