OpenAI Batch vs Standard API: Side-by-Side Comparison

Cost and throughput tradeoffs in OpenAI API modes — comparing batch processing across openai and python

By AI Skill Navigation Editorial TeamPublished June 12, 2026

OpenAI Batch vs Standard API: Side-by-Side Comparison

One sentence: if the work doesn't need an answer within the hour, the Batch API does the same inference at half the price — the trade is synchronous seconds versus a completion window of up to 24 hours. For embeddings backfills, nightly classification, dataset generation, and bulk summarization, it's the easiest 50% cost cut in AI engineering.

At a glance

Standard APIBatch API

LatencySeconds, synchronousUp to 24h window (often much faster, not guaranteed) PriceList price50% off inputs and outputs (per OpenAI's pricing page) InterfaceOne HTTPS request per callUpload JSONL file of requests → poll → download results Rate limitsYour tier's RPM/TPMSeparate, much higher per-batch quotas (per-model caps apply) Streaming✅❌ Scale per submission1 requestUp to 50,000 requests / 200 MB per file

Two non-obvious wins beyond price: batch quotas are separate from your interactive rate limits (a backfill won't starve your production traffic), and the file-based flow forces idempotent, resumable job design that ad-hoc scripts usually lack.

The complete flow

python
import json, time
from openai import OpenAI
client = OpenAI()
1. Build the JSONL — one request per line, custom_id is how you join results back
with open('batch.jsonl', 'w') as f:
    for doc in documents:
        f.write(json.dumps({
            'custom_id': doc['id'],
            'method': 'POST',
            'url': '/v1/chat/completions',
            'body': {
                'model': 'gpt-4o-mini',
                'messages': [{'role': 'user', 'content': f'Classify sentiment: {doc["text"]}'}],
                'max_tokens': 8,
            },
        }) + '\n')
2. Upload + create the batch
batch_file = client.files.create(file=open('batch.jsonl', 'rb'), purpose='batch')
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint='/v1/chat/completions',
    completion_window='24h',
)
3. Poll (or just check back later — webhooks aren't provided)
while (batch := client.batches.retrieve(batch.id)).status not in ('completed', 'failed', 'expired'):
    time.sleep(60)
4. Download and join on custom_id
results = {}
for line in client.files.content(batch.output_file_id).text.splitlines():
    r = json.loads(line)
    results[r['custom_id']] = r['response']['body']['choices'][0]['message']['content']

Details that bite in practice:

custom_id is mandatory and your only join key — results come back in arbitrary order. Use your own stable IDs.

Partial failure is normal: per-line errors land in a separate error_file_id. Always check it and design the job to resubmit only the failed custom_ids.

expired can happen under load — unfinished requests are returned (and you're only billed for completed ones); resubmit the remainder.

Supported endpoints include chat completions, embeddings, and responses — embeddings backfills are arguably the killer app.

When each wins

Standard: anything user-facing, agent/tool loops where call N+1 depends on call N, anything needing streaming.

Batch: nightly/offline classification and tagging, embeddings for a whole corpus, synthetic-data and eval-set generation, re-summarizing an archive, scheduled report generation. The mental shift: if your code path is "cron job → loop over rows → call the API", that loop should almost certainly be a batch file instead — and the async-with-semaphore pattern (sync vs async calls) is only the right tool when you need those results *today*.

The hybrid that serious pipelines converge on: batch for the bulk backfill, standard API for the trickle of new items that can't wait for the next nightly run.

Same idea at other providers

This pattern is industry-standard now: Anthropic's Message Batches API offers the same 50% discount with a similar submit-and-poll flow, and Gemini's batch mode likewise. If you're multi-provider, an abstraction like LiteLLM helps keep batch plumbing uniform — see LiteLLM vs Portkey for LLM gateways. Pricing and caps move — verify current numbers on each provider's pricing page before committing a budget forecast.

FAQ

Is prompt caching still relevant in batches? Provider-dependent; with a shared system prompt across 50K requests it matters — structure prompts stable-prefix-first regardless (same principle as in our KV cache deep dive).

Can I cancel a running batch? Yes — client.batches.cancel(id); completed requests are billed, the rest stop.

How fast in practice? Often well under the 24h window — but it's a window, not an SLA. If "usually 2 hours, occasionally 23" breaks your pipeline, you need the standard API.

*Last updated: June 2026. Verify current pricing and limits against OpenAI's docs.*

Also available in 中文.

OpenAI Batch vs Standard API: Side-by-Side Comparison

OpenAI Batch vs Standard API: Side-by-Side Comparison

At a glance

The complete flow

1. Build the JSONL — one request per line, custom_id is how you join results back

2. Upload + create the batch

3. Poll (or just check back later — webhooks aren't provided)

4. Download and join on custom_id

When each wins

Same idea at other providers

FAQ

Documentation

Getting Started

Learn more