LLM Fine-Tuning in 2025: When to Fine-Tune vs. RAG vs. Prompting (With Cost Analysis)
Senior AI engineers explain the decision framework for choosing between fine-tuning, RAG, and prompt engineering
LLM Fine-Tuning in 2025: When to Fine-Tune vs. RAG vs. Prompting (With Cost Analysis)
Senior AI engineers explain the decision framework for choosing between fine-tuning, RAG, and prompt engineering
Decision framework and technical guide for LLM customization — comparing fine-tuning vs. RAG vs. prompting for different use cases, with real cost analysis and step-by-step fine-tuning with OpenAI and LoRA.
LLM Fine-Tuning vs. RAG vs. Prompting: The Decision Framework
The Core Question
When should you fine-tune? Most teams default to fine-tuning when prompt engineering would have worked, wasting time and money. This guide gives you the decision framework.
Decision Tree
Do you need real-time/current information?
├── Yes → RAG (not fine-tuning)
└── No ↓Is it a style/format/tone issue?
├── Yes → Prompt engineering first
└── No ↓
Do you have 500+ labeled examples?
├── No → More prompt engineering, generate synthetic data
└── Yes ↓
Is inference cost/speed critical?
├── Yes → Fine-tuning (smaller model)
└── Maybe → Fine-tuning for consistency
When Fine-Tuning Wins
When RAG Wins
When Prompting Wins
Fine-Tuning with OpenAI API
Data Preparation
python
Format: JSONL with messages structure
import jsontraining_data = [
{
"messages": [
{"role": "system", "content": "You are a customer support agent for Acme Corp."},
{"role": "user", "content": "How do I reset my password?"},
{"role": "assistant", "content": "To reset your password, go to Settings > Security > Reset Password. You'll receive an email within 2 minutes."}
]
}
]
with open("training.jsonl", "w") as f:
for item in training_data:
f.write(json.dumps(item) + "\n")
Training
python
from openai import OpenAI
client = OpenAI()Upload training file
file = client.files.create(
file=open("training.jsonl", "rb"),
purpose="fine-tune"
)Create fine-tuning job
job = client.fine_tuning.jobs.create(
training_file=file.id,
model="gpt-4o-mini-2024-07-18",
hyperparameters={"n_epochs": 3}
)print(f"Job ID: {job.id}")
Cost Estimation
gpt-4o-mini fine-tuning:
Training: $0.003/1K tokens
Inference: $0.0003/1K input + $0.0012/1K output 1000 training examples × 500 tokens avg = $1.50 to train
Vs. GPT-4o inference at $0.005/1K = 10x cheaper inference post-fine-tune
LoRA Fine-Tuning with Open Source Models
When to Use LoRA
python
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLMmodel = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B")
lora_config = LoraConfig(
r=16, # Rank
lora_alpha=32, # Scale
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
trainable params: 4,194,304 || all params: 8,034,877,440 || trainable%: 0.05
Evaluation Best Practices
相关工具
相关教程
Replace expensive photo shoots with AI-generated product backgrounds and lifestyle shots
From customer support bots to internal knowledge bases — how to build GPTs your team actually uses
Engineering teams share real productivity gains and workflows after one year of Copilot Enterprise