LLM Fine-Tuning Practical Guide 2026: From Data Preparation to Deployment, a Complete Model Customization Workflow
When Fine-Tuning Is Worth It and When Prompt Engineering Is Enough
Many people ask, "Can I fine-tune a model to make it understand my business better?" — the answer is usually "Yes, but you probably don't need to."
First, let's clarify when you should fine-tune and when prompt engineering is sufficient.
1. Fine-Tuning vs Prompt Engineering: How to Choose
When Fine-Tuning Is Needed
When Fine-Tuning Is Not Needed
2. Efficient Fine-Tuning: Unsloth + LoRA
The most popular fine-tuning approach in 2026: Unsloth (training acceleration) + LoRA (parameter-efficient fine-tuning)
2.1 Environment Setup
bash
Recommended environment: NVIDIA GPU 16GB+, or Google Colab A100
pip install unsloth transformers datasets trl accelerateOr use Unsloth's one-click install
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
2.2 Data Preparation (The Most Critical Step)
Fine-tuning quality = data quality, not data quantity.
python
Data format: ShareGPT format (recommended)
training_data = [
{
"conversations": [
{"from": "human", "value": "User question"},
{"from": "assistant", "value": "Expected model response"}
]
},
# ... more samples
]Minimum data needed:
- Format/style fine-tuning: 100-500 samples
- Domain knowledge injection: 500-2000 samples
- Complete behavior change: 2000+ samples
Data quality checklist
✅ Is every sample high quality? (Better fewer but better)
✅ Is the data distribution balanced? (Don't overrepresent one type of question)
✅ Are there any contradictory samples? (Different answers to the same type of question)
✅ Is there any data leakage? (Don't use test set as training set)
2.3 Unsloth Fine-Tuning Code
python
from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import DatasetLoad base model (choose the size that fits your needs)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Qwen2.5-7B-Instruct", # 7B is good for beginners
max_seq_length=2048,
load_in_4bit=True, # 4-bit quantization to save memory
)Add LoRA adapter
model = FastLanguageModel.get_peft_model(
model,
r=16, # LoRA rank, higher = better effect, more memory
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
)Prepare dataset
dataset = Dataset.from_list(training_data)Start training
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
num_train_epochs=3, # Number of epochs; too many can cause overfitting
learning_rate=2e-4,
fp16=True,
output_dir="./output",
save_strategy="epoch",
),
)trainer.train()
3. Evaluating Fine-Tuning Results
3.1 Quantitative Evaluation
python
from evaluate import loadFor generation quality evaluation
rouge = load("rouge")
results = rouge.compute(
predictions=model_outputs,
references=reference_outputs
)
print(results) # ROUGE-1, ROUGE-2, ROUGE-L scoresFor specific tasks
Classification accuracy, F1 score, etc.
3.2 Qualitative Evaluation (More Important)
Create a human evaluation set (50-100 typical questions) and compare:
Score each dimension (accuracy/format compliance/relevance) to see if fine-tuning improved performance.
4. Deploying the Fine-Tuned Model
4.1 Saving and Loading
python
Save LoRA weights (very small, usually < 100MB)
model.save_pretrained("my-finetuned-model")
tokenizer.save_pretrained("my-finetuned-model")Merge weights (optional, for deployment)
model.save_pretrained_merged(
"merged-model",
tokenizer,
save_method="merged_16bit"
)
4.2 Deployment Options
Local Inference (Ollama):
bash
Convert to GGUF format
python llama.cpp/convert.py merged-model --outtype f16
Import into Ollama
ollama create my-model -f Modelfile
Cloud API (Together AI / Fireworks AI): Both platforms support uploading custom models and provide OpenAI-compatible APIs, suitable for production deployment.
Further Reading
Also available in 中文.