OpenAI Batch vs Standard API: Side-by-Side Comparison
Cost and throughput tradeoffs in OpenAI API modes — comparing batch processing across openai and python
OpenAI Batch vs Standard API: Side-by-Side Comparison
One sentence: if the work doesn't need an answer within the hour, the Batch API does the same inference at half the price — the trade is synchronous seconds versus a completion window of up to 24 hours. For embeddings backfills, nightly classification, dataset generation, and bulk summarization, it's the easiest 50% cost cut in AI engineering.
At a glance
Two non-obvious wins beyond price: batch quotas are separate from your interactive rate limits (a backfill won't starve your production traffic), and the file-based flow forces idempotent, resumable job design that ad-hoc scripts usually lack.
The complete flow
python
import json, time
from openai import OpenAIclient = OpenAI()
1. Build the JSONL — one request per line, custom_id is how you join results back
with open('batch.jsonl', 'w') as f:
for doc in documents:
f.write(json.dumps({
'custom_id': doc['id'],
'method': 'POST',
'url': '/v1/chat/completions',
'body': {
'model': 'gpt-4o-mini',
'messages': [{'role': 'user', 'content': f'Classify sentiment: {doc["text"]}'}],
'max_tokens': 8,
},
}) + '\n')2. Upload + create the batch
batch_file = client.files.create(file=open('batch.jsonl', 'rb'), purpose='batch')
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint='/v1/chat/completions',
completion_window='24h',
)3. Poll (or just check back later — webhooks aren't provided)
while (batch := client.batches.retrieve(batch.id)).status not in ('completed', 'failed', 'expired'):
time.sleep(60)4. Download and join on custom_id
results = {}
for line in client.files.content(batch.output_file_id).text.splitlines():
r = json.loads(line)
results[r['custom_id']] = r['response']['body']['choices'][0]['message']['content']
Details that bite in practice:
custom_id is mandatory and your only join key — results come back in arbitrary order. Use your own stable IDs.error_file_id. Always check it and design the job to resubmit only the failed custom_ids.expired can happen under load — unfinished requests are returned (and you're only billed for completed ones); resubmit the remainder.When each wins
Standard: anything user-facing, agent/tool loops where call N+1 depends on call N, anything needing streaming.
Batch: nightly/offline classification and tagging, embeddings for a whole corpus, synthetic-data and eval-set generation, re-summarizing an archive, scheduled report generation. The mental shift: if your code path is "cron job → loop over rows → call the API", that loop should almost certainly be a batch file instead — and the async-with-semaphore pattern (sync vs async calls) is only the right tool when you need those results *today*.
The hybrid that serious pipelines converge on: batch for the bulk backfill, standard API for the trickle of new items that can't wait for the next nightly run.
Same idea at other providers
This pattern is industry-standard now: Anthropic's Message Batches API offers the same 50% discount with a similar submit-and-poll flow, and Gemini's batch mode likewise. If you're multi-provider, an abstraction like LiteLLM helps keep batch plumbing uniform — see LiteLLM vs Portkey for LLM gateways. Pricing and caps move — verify current numbers on each provider's pricing page before committing a budget forecast.
FAQ
Is prompt caching still relevant in batches? Provider-dependent; with a shared system prompt across 50K requests it matters — structure prompts stable-prefix-first regardless (same principle as in our KV cache deep dive).
Can I cancel a running batch? Yes — client.batches.cancel(id); completed requests are billed, the rest stop.
How fast in practice? Often well under the 24h window — but it's a window, not an SLA. If "usually 2 hours, occasionally 23" breaks your pipeline, you need the standard API.
*Last updated: June 2026. Verify current pricing and limits against OpenAI's docs.*
Also available in 中文.