← Back to tutorials

OpenAI Assistants API v2 2026: Files, Code Interpreter, and Threads

Build persistent AI assistants with built-in RAG, code execution, function calling

OpenAI Assistants API: Threads, Files, Code Interpreter — and the Migration You Now Need

Important status update first: OpenAI has deprecated the Assistants API in favor of the Responses API — after the Responses API reached feature parity, OpenAI announced the Assistants API's sunset (with a migration window into 2026). If you're starting fresh: build on the Responses API. If you run Assistants in production: this guide covers what you have, what maps to what, and how to migrate without losing the features you adopted it for (threads, file search, code interpreter).

What the Assistants API gave you (and why people used it)

The pitch was *managed state*: instead of resending conversation history each call, you created an Assistant (instructions + model + tools), opened a Thread per user, appended Messages, and triggered Runs. OpenAI persisted the conversation, auto-truncated context, and hosted two killer tools:

  • File Search — upload documents into vector stores; the assistant retrieves relevant chunks automatically (managed RAG, no pipeline to build)
  • Code Interpreter — a sandboxed Python runtime for data analysis, file processing, and chart generation
  • That convenience is exactly what the Responses API now provides with a cleaner design.

    The migration map

    Assistants conceptResponses API equivalent

    Assistant object (instructions/model/tools)Prompt/config in your code (or stored prompts); tools declared per request Thread + Messagesprevious_response_id chaining (server-side state) or pass messages yourself Run / run pollingGone — a response *is* the result; no poll loop File Search + vector storesSame file_search tool, same vector stores — carries over Code InterpreterSame tool under Responses Function callingSame concept, simpler wire format

    The big win: the awkward create-run-then-poll loop disappears. Where Assistants made you create a run and poll its status, Responses returns (or streams) the answer directly:

    python
    from openai import OpenAI
    client = OpenAI()

    Managed RAG, Responses-style

    resp = client.responses.create( model='gpt-5', input='What does our refund policy say about partial refunds?', tools=[{'type': 'file_search', 'vector_store_ids': [VECTOR_STORE_ID]}], ) print(resp.output_text)

    Multi-turn: chain by ID instead of managing a thread object

    followup = client.responses.create( model='gpt-5', previous_response_id=resp.id, input='And for digital goods specifically?', )

    python
    

    Code Interpreter, Responses-style

    resp = client.responses.create( model='gpt-5', input='Analyze the attached CSV: monthly revenue trend + one chart.', tools=[{'type': 'code_interpreter', 'container': {'type': 'auto', 'file_ids': [file_id]}}], )

    Migration plan that works in practice

  • Inventory which Assistants features you actually use. Most apps use threads + one tool; the exotic corners (run steps inspection, per-run model overrides) are rarer and need individual mapping.
  • Vector stores carry over — your uploaded/indexed files don't need re-ingestion; point file_search at the same store IDs.
  • Replace thread lifecycle with response chaining — store the latest response_id per user where you previously stored a thread ID. (Decide your retention policy: chaining stores state with OpenAI; passing full messages yourself keeps state in your DB.)
  • Delete the polling code. Streaming replaces it; your latency improves for free.
  • Run both paths behind a flag for one release cycle, diff outputs on real traffic, then cut over before the sunset date.
  • Consult OpenAI's official migration guide for the current sunset timeline — dates have shifted before.

    Bigger-picture lesson

    The Assistants→Responses transition is a reminder that managed-state APIs are vendor surfaces, not standards — worth abstracting behind your own service layer so the next migration is a module change, not an app rewrite. If you're rebuilding anyway, it's a natural moment to evaluate alternatives: self-managed RAG (pgvector + semantic search) for control, framework orchestration (LangGraph) for complex agents, or the Claude API's equivalents if you're diversifying providers.

    FAQ

    Do I have to migrate immediately? No — there's a deprecation window with continued support. But new feature work lands on Responses only, so the cost of waiting compounds.

    Is Responses cheaper? Same per-token model pricing; you save the wasted polling calls and gain prompt-caching-friendly patterns. Tool usage (file search storage, code interpreter sessions) bills similarly — verify rates on the pricing page.

    What about the ChatGPT "GPTs" I built? Different product surface (consumer ChatGPT), unaffected by this API migration.


    *Last updated: June 2026. Verify the current deprecation timeline and Responses API parameters against OpenAI's official docs — this transition has moving dates.*

    Also available in 中文.