微调 GPT-4 与 Claude：何时选择微调 vs RAG（2026版）

做出正确的架构决策：为您的 LLM 应用选择微调还是 RAG

返回教程列表

高级约 22 分钟

微调 GPT-4 与 Claude：何时选择微调 vs RAG（2026版）

做出正确的架构决策：为您的 LLM 应用选择微调还是 RAG

关于 LLM 应用中选择微调还是 RAG 的全面指南。涵盖 GPT-4o mini 微调、使用 Hugging Face 进行 LoRA 训练、成本对比以及用例决策框架。

fine-tuning rag gpt-4 lora llm ai engineering

微调 vs RAG：2026 决策指南

AI 应用中最常见的架构决策之一：应该微调模型还是使用 RAG？以下是一个包含真实示例的框架。

核心区别

RAG — 在查询时检索相关文档，注入到上下文中微调 — 将知识和行为直接嵌入模型权重

决策框架

使用 RAG 当： ✅ 知识频繁更新（每日/每周） ✅ 需要引用来源 ✅ 数据是机密的（不希望嵌入模型权重） ✅ 需要快速实现和迭代 ✅ 知识库很大（超过 1M tokens）

使用微调当： ✅ 需要一致的特定输出格式/风格 ✅ 知识是稳定的（法律条文、产品目录） ✅ 需要减少提示长度（需要更少的示例） ✅ 想要移除/减少模型的默认行为 ✅ 延迟是关键（无需检索步骤）

微调 GPT-4o Mini

python
from openai import OpenAI
import json
client = OpenAI()
准备训练数据（JSONL 格式）
training_examples = [
    {
        'messages': [
            {'role': 'system', 'content': '你是一名 TechCorp 的客户支持代理。始终保持简洁。'},
            {'role': 'user', 'content': '如何重置我的密码？'},
            {'role': 'assistant', 'content': '前往 设置 > 安全 > 重置密码。输入你的邮箱，检查收件箱中的重置链接。链接 24 小时内有效。'}
        ]
    },
    {
        'messages': [
            {'role': 'system', 'content': '你是一名 TechCorp 的客户支持代理。始终保持简洁。'},
            {'role': 'user', 'content': '你们的退款政策是什么？'},
            {'role': 'assistant', 'content': '我们为所有套餐提供 30 天退款。请将订单号发送至 support@techcorp.com。退款处理时间为 3-5 个工作日。'}
        ]
    }
    # ... 至少 50-100 个示例
]
保存训练数据
with open('training_data.jsonl', 'w') as f:
    for example in training_examples:
        f.write(json.dumps(example) + '\n')
上传训练文件
with open('training_data.jsonl', 'rb') as f:
    response = client.files.create(file=f, purpose='fine-tune')
    file_id = response.id
print(f'文件已上传：{file_id}')
创建微调任务
job = client.fine_tuning.jobs.create(
    training_file=file_id,
    model='gpt-4o-mini',
    hyperparameters={
        'n_epochs': 3,
        'batch_size': 4,
        'learning_rate_multiplier': 1.8
    },
    suffix='customer-support-v1'
)
print(f'任务已创建：{job.id}')
监控训练
import time
while True:
    status = client.fine_tuning.jobs.retrieve(job.id)
    print(f'状态：{status.status}')
    if status.status in ['succeeded', 'failed']:
        break
    time.sleep(30)if status.status == 'succeeded':
    print(f'模型 ID：{status.fine_tuned_model}')
    # 使用微调后的模型
    response = client.chat.completions.create(
        model=status.fine_tuned_model,
        messages=[{'role': 'user', 'content': '如何取消我的订阅？'}]
    )
    print(response.choices[0].message.content)

使用 Hugging Face 进行 LoRA 微调

python
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
from datasets import Dataset
import torch
加载基础模型
MODEL = 'meta-llama/Meta-Llama-3.1-8B-Instruct'
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    torch_dtype=torch.bfloat16,
    device_map='auto'
)
LoRA 配置
lora_config = LoraConfig(
    r=16,                       # 秩 - 越高参数越多
    lora_alpha=32,              # 缩放因子
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
    lora_dropout=0.05,
    bias='none',
    task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
可训练参数：83M || 全部参数：8B || 可训练比例：1.03%
准备数据集
def format_prompt(example):
    return {'text': f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{example['system']}<|eot_id|><|start_header_id|>user<|end_header_id|>
{example['input']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{example['output']}<|eot_id|>"""}
dataset = Dataset.from_list(training_examples).map(format_prompt)
训练
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field='text',
    args=TrainingArguments(
        output_dir='./lora-model',
        num_train_epochs=3,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=2e-4,
        bf16=True,
        save_strategy='epoch'
    )
)
trainer.train()
保存并合并
model.save_pretrained('./lora-adapter')
推理时，加载基础模型 + 适配器

成本对比

方法设置成本每次查询成本更新成本

RAG (Pinecone)$200-500$0.003-0.01$0-50 微调 GPT-4o-mini$50-200$0.001-0.005$50-200 LoRA (Llama 3)GPU 租赁 $50-200$0（自托管）$50-200

混合方法（2026 最佳实践）

python
class HybridRAGFinetuned:
    """
    微调模型用于风格/格式 + RAG 用于动态知识
    两全其美：一致的输出风格 + 最新信息
    """
    
    def __init__(self, fine_tuned_model_id: str):
        self.model = fine_tuned_model_id  # 微调用于输出风格
        self.vector_store = VectorStore()  # 用于动态知识
    
    def query(self, question: str) -> str:
        # 检索相关上下文（RAG）
        context = self.vector_store.search(question, k=5)
        
        # 使用微调后的模型（一致的输出格式）
        return call_openai(
            model=self.model,  # 微调后的模型
            system='你是一名 TechCorp 的支持代理。',
            user=f'上下文：{context}\n\n问题：{question}'
        )

结论

对于大多数 2026 年的应用，从 RAG 开始——它实现和更新更快。当你需要一致的输出格式、语气，或者已经确定了需要强化的特定行为时，再添加微调。混合方法可以让你两全其美。

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

微调 GPT-4 与 Claude：何时选择微调 vs RAG（2026版）

微调 vs RAG：2026 决策指南

核心区别

决策框架

微调 GPT-4o Mini

准备训练数据（JSONL 格式）

保存训练数据

上传训练文件

创建微调任务

监控训练

使用 Hugging Face 进行 LoRA 微调

加载基础模型

LoRA 配置

可训练参数：83M || 全部参数：8B || 可训练比例：1.03%

准备数据集

训练

保存并合并

推理时，加载基础模型 + 适配器

成本对比

混合方法（2026 最佳实践）

结论

Documentation

Getting Started

Learn more