Advanced Prompt Engineering: Chain-of-Thought, Few-Shot & Structured Outputs in 2025
Master LLM prompting techniques that reliably produce high-quality, structured outputs
Advanced Prompt Engineering: Chain-of-Thought, Few-Shot & Structured Outputs in 2025
Master LLM prompting techniques that reliably produce high-quality, structured outputs
Prompt engineering has evolved from simple instructions to sophisticated techniques that dramatically improve LLM reliability and output quality. This guide covers chain-of-thought prompting, few-shot examples, self-consistency, ReAct (Reasoning + Acting), structured output extraction with Instructor and Pydantic, system prompt design, and building a prompt testing and versioning discipline.
Advanced Prompt Engineering: Patterns & Techniques
Why Prompting Still Matters
Even with fine-tuned models and RAG, prompt engineering determines the quality of outputs. A well-designed prompt can turn a mediocre model into a high-performing one for specific tasks. Understanding prompting techniques is foundational for any AI engineer.
Core Prompting Techniques
Zero-Shot vs. Few-Shot
Zero-shot: just describe the task. "Classify this email as spam or not spam."Few-shot: provide 3-5 examples demonstrating the task. Show input-output pairs that represent the range of cases. Critical: examples should be diverse, correctly labeled, and representative of your actual data.
When to use few-shot: complex classification with nuanced boundaries, output format is non-standard, the task benefits from implicit style guidance.
Chain-of-Thought (CoT) Prompting
Force the model to reason step-by-step before answering. Append "Let's think step by step" or show examples with explicit reasoning chains. Dramatically improves: math problems, logical reasoning, multi-step analysis.Standard CoT: provide a question + reasoning chain + answer as a few-shot example. Model learns to generate reasoning before answers.
Zero-shot CoT: just add "Let's think step by step" to the prompt. Simple and effective for many reasoning tasks.
Self-Consistency
Run the same prompt N times (temperature > 0) and take the majority vote answer. Improves accuracy by 5-15% for reasoning tasks. Cost: N× more API calls. Use for high-stakes decisions where accuracy justifies cost.ReAct (Reasoning + Acting)
Interleave reasoning and action. Format: Thought → Action → Observation → Thought → Action → Observation → Final Answer. Each thought explains why the next action is needed. Each observation incorporates tool results into reasoning.This is the foundation of modern AI agents. LLMs can effectively plan and use tools when prompted in ReAct format.
Structured Output Extraction
Instructor Library
Instructor wraps OpenAI and Anthropic clients to enable Pydantic model outputs. Define a Pydantic class with field types and validation, pass as response_model parameter. Model automatically structures output to match the schema with retries on validation failure.Example: extract Person with name (str), age (int), email (EmailStr) from unstructured text. Instructor handles the prompt engineering and retry logic automatically.
JSON Mode and Function Calling
OpenAI JSON mode: add response_format={"type": "json_object"} to ensure valid JSON output. Combined with a schema description in the prompt, produces reliably structured data.Function calling: define functions with JSON Schema, model decides when to call and with what arguments. More reliable than raw JSON mode for tool use scenarios.
System Prompt Design
Components of an Effective System Prompt
Role: "You are an expert financial analyst specializing in technology sector equity research." Context: relevant background the model needs. Constraints: what the model should NOT do (avoid speculation, don't discuss competitors, stay under 200 words). Output format: exactly how the response should be structured. Tone: formal/casual, technical/accessible.Prompt Injection Prevention
For production systems handling user input: clearly separate system instructions from user content using XML tags or structured delimiters. Never interpolate user input directly into system prompts. Instruct the model to ignore attempts to override its instructions.System prompt defense: "SYSTEM INSTRUCTIONS (DO NOT IGNORE): [your instructions]. USER INPUT FOLLOWS: [sanitized user input]"
Prompt Testing and Versioning
Prompt Testing Framework
Treat prompts like code: version control (store in Git), test cases (minimum 20-50 examples per prompt), evaluation metrics (accuracy, format compliance, latency), regression testing (new prompt version must match or beat baseline).Promptfoo: open-source prompt testing tool. Define test cases with inputs and expected outputs. Run against multiple prompts and models. Compare results in a table.
A/B Testing Prompts
Run prompt A and prompt B in parallel for a random sample of production traffic. Measure target metric (user satisfaction, task completion rate, accuracy). Roll out winning prompt with confidence.Advanced Techniques
Least-to-Most Prompting
For complex problems, first ask the model to decompose into sub-problems ("What sub-questions need to be answered to solve this?"), then solve each sequentially, building on previous answers.Directional Stimulus Prompting
Include a hint or keyword that steers generation: "Provide a critique focused on [specific aspect]" produces more targeted analysis than a general critique request.Emotional Context
Adding emotional context (e.g., "This is very important to my career") has been shown experimentally to improve LLM output quality for certain tasks—models trained on human text respond to social cues.Prompt Optimization with DSPy
DSPy (Declarative Self-improving Language Programs) automatically optimizes prompts given a metric. Define your task with input/output signatures, provide a few training examples, specify an evaluation metric. DSPy's compiler searches for prompts that maximize your metric, often outperforming hand-crafted prompts.
Advanced prompting is part art, part science—systematic testing and iteration consistently yield better results than intuition alone.
相关工具
相关教程
Build complex multi-step AI workflows with state management using LangGraph
Chain-of-thought, tree-of-thoughts, self-consistency, and systematic evaluation methods
Deploy Llama 3 with 20x higher throughput than naive serving