← Back to tutorials

LLM Intent Classification: Practical Tutorial

Classifying user intent for routing in AI applications

LLM Intent Classification: Practical Tutorial (2026)

Intent classification routes a user's message to the right handler — "billing question" vs "bug report" vs "feature request" — and is the front door of most chatbots and agents. LLMs make this fast to build and robust to phrasing, but the trick is doing it reliably and cheaply. This guide shows the practical pattern.

The pattern: constrained classification

Give a small/cheap model the fixed list of intents and force it to return one of them as structured output — don't let it free-form.

python
from openai import OpenAI
from pydantic import BaseModel
from typing import Literal
client = OpenAI()

class Intent(BaseModel): intent: Literal["billing", "bug_report", "feature_request", "other"] confidence: float

def classify(text: str) -> Intent: r = client.chat.completions.create( model="gpt-4o-mini", messages=[ {"role":"system","content":"Classify the user message into one intent."}, {"role":"user","content":text}], response_format={"type":"json_object"}, ) import json return Intent(**json.loads(r.choices[0].message.content))

Use a strict schema so the output is always one of your intents — see structured outputs / Pydantic AI vs Instructor. The mechanism behind forced selection is function/tool calling.

Getting it reliable and cheap

  • Use a small model (gpt-4o-mini / Haiku) — classification rarely needs a frontier model. See GPT-4o mini vs Claude Haiku.
  • Constrain the label set with an enum/Literal; add an explicit "other" for the unknown bucket.
  • Few-shot the ambiguous cases in the system prompt.
  • Return confidence and route low-confidence messages to a fallback or human.
  • Consider embeddings for very high volume: classify by nearest-neighbor to labeled examples (cheaper than an LLM call) — see semantic search.
  • When to fine-tune

    If volume is huge and intents are stable, a fine-tuned mini or an embedding classifier cuts cost and latency further. Start with prompted classification; graduate to fine-tuning only when volume justifies it.

    FAQ

    Which model? A small one — classification is easy; save frontier models for the task itself. How to guarantee a valid label? Enum/Literal + structured output, with an "other" catch-all. Embeddings or LLM? LLM for flexibility; embeddings for high-volume, cost-sensitive routing. How to handle uncertainty? Return confidence and route low-confidence cases to fallback/human.

    Summary

    LLM intent classification = a cheap model + a constrained label set + structured output, with confidence-based fallback. Use Literal/enums to guarantee valid labels, few-shot the tricky cases, and move to embeddings or a fine-tuned model when volume demands. It's the reliable front door for chatbots and agents.


    *Last updated: June 2026. Verify APIs against the OpenAI docs.*

    Also available in 中文.