Fine-Tuning LLMs with LoRA and QLoRA: Complete Practical Guide 2025

Hugging Face PEFT, dataset preparation, training and deployment on consumer hardware

高级约 40 分钟

Fine-Tuning LLMs with LoRA and QLoRA: Complete Practical Guide 2025

Hugging Face PEFT, dataset preparation, training and deployment on consumer hardware

Step-by-step guide to fine-tuning large language models using LoRA and QLoRA with Hugging Face PEFT library, covering dataset preparation, training, evaluation, and deployment.

fine-tuningLoRAQLoRALLMPEFTHugging Face

Fine-tuning LLMs is now accessible with consumer hardware thanks to LoRA and QLoRA. LoRA injects trainable rank decomposition matrices (r=16 typically) into frozen base model weights, reducing trainable parameters by 99%+. QLoRA adds 4-bit quantization via bitsandbytes, enabling 7B model fine-tuning on a single RTX 3090. Setup: load model with BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4"), apply prepare_model_for_kbit_training(), then add LoraConfig targeting attention projection layers. Format training data with proper chat template. Use SFTTrainer from trl with gradient_accumulation_steps. After training, merge LoRA weights with base model using merge_and_unload(). Hardware requirements: QLoRA reduces 7B from 160GB to 12GB VRAM.

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

Fine-Tuning LLMs with LoRA and QLoRA: Complete Practical Guide 2025

Documentation

Getting Started

Learn more