大模型微调(PEFT)
- PEFT(Parameter-Efficient Fine-Tuning)
- 一、PEFT 核心方法
- 1. LoRA(Low-Rank Adaptation)
- 2. Adapter
- 3. Prefix Tuning
- 4. Prompt Tuning
- 5. QLoRA(Quantized LoRA)
- 二、PEFT vs 全参数微调
- 三、微调大模型示例代码
- 四、加载微调后的大模型
- 1. Lora
- 2. prefix tuning
大模型微调方法描述
PEFT(Parameter-Efficient Fine-Tuning)
PEFT(参数高效微调)是一类用于大幅降低大模型微调成本的技术,核心思想是仅微调少量参数,而非整个模型。以下是系统化的解析:
一、PEFT 核心方法
1. LoRA(Low-Rank Adaptation)
- 原理:
- 在原始权重旁添加低秩矩阵(
W = W₀ + BA
),仅训练B
和A
。
- 在原始权重旁添加低秩矩阵(
- 适用场景:文本生成、对话系统
- 代码示例:
r
(秩)通常为4~64,参数量减少90%+
from peft import LoraConfig, get_peft_modelconfig = LoraConfig(r=8, # 秩lora_alpha=32, # 缩放系数target_modules=["q_proj", "v_proj"], # 作用模块lora_dropout=0.05,bias="none", ) model = get_peft_model(model, config) # 原始模型+LoRA
2. Adapter
- 原理:
- 在Transformer层间插入小型全连接网络,仅训练Adapter层。
- 参数量占比约0.5%~5%
- 适用场景:多任务学习
- 结构示例:
Transformer Layer → Adapter(Down→ReLU→Up) → Residual→ LayerNorm
3. Prefix Tuning
-
原理:
- 在输入前添加可学习的“虚拟token”(prefix),引导模型生成。
- 完全不修改原始参数
-
适用场景:生成任务(如GPT)
-
结构示例:
import torch import torch.nn as nn from transformers import AutoModelForCausalLM, AutoTokenizer# 加载预训练模型和分词器 model_name = "gpt2" # 可替换为你想要使用的模型名称 model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)# 定义Prefix Tuning模块 class PrefixTuning(nn.Module):def __init__(self, num_virtual_tokens, hidden_size):super(PrefixTuning, self).__init__()self.prefix_embeddings = nn.Embedding(num_virtual_tokens, hidden_size)nn.init.normal_(self.prefix_embeddings.weight, mean=0, std=0.02)def forward(self, input_ids, attention_mask):batch_size = input_ids.shape[0]prefix = self.prefix_embeddings.weight.unsqueeze(0).repeat(batch_size, 1, 1)new_input_ids = torch.cat([torch.full((batch_size, prefix.shape[1]), tokenizer.pad_token_id).to(input_ids.device), input_ids], dim=1)new_attention_mask = torch.cat([torch.ones((batch_size, prefix.shape[1])).to(attention_mask.device), attention_mask], dim=1)return new_input_ids, new_attention_mask
4. Prompt Tuning
-
原理:
- 在输入层加入prompt tokens。
- 完全不修改原始参数,简化版的Prefix Tuning,无需MLP调整,随着模型规模增大,效果接近full fine-tuning。
-
结构示例:
prompt = "请回答以下问题:" prompt_ids = tokenizer.encode(prompt, return_tensors="pt").to(input_ids.device) new_input_ids = torch.cat([prompt_ids.repeat(batch_size, 1), input_ids], dim=1)
5. QLoRA(Quantized LoRA)
- 原理:
4-bit量化基础模型 + LoRA微调,显存需求降低70% - 代码示例:
from transformers import BitsAndBytesConfigbnb_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModel.from_pretrained("Llama-3-8B", quantization_config=bnb_config)
二、PEFT vs 全参数微调
指标 | PEFT | 全参数微调 |
---|---|---|
显存占用 | 极低(可单卡微调70B) | 极高(需多卡) |
训练速度 | 快(仅更新少量参数) | 慢 |
效果 | 接近全参数微调 | 最优但差异<5% |
部署便利性 | 需合并适配器 | 直接部署 |
三、微调大模型示例代码
注意:使用model.save_pretrained("fine_tuned_internvl_3")
保存经过 PEFT(如 LoRA 或其他 Adapter 微调)后的模型时,保存的权重通常不包含基础模型(base_model)的原始权重,仅保存微调过程中可训练的部分。
import math
import pandas as pd
import numpy as np
import torch
import torchvision.transforms as T
from decord import VideoReader, cpu
from PIL import Image
from torchvision.transforms.functional import InterpolationMode
from transformers import AutoModel, AutoTokenizer, AutoConfig, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model
import os# 模型加载
path = 'InternVL3'
device_map = split_model(path)
model = AutoModel.from_pretrained(path,torch_dtype=torch.bfloat16,load_in_8bit=True,low_cpu_mem_usage=True,use_flash_attn=True,trust_remote_code=True,device_map=device_map).eval()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)# 配置LoRA
lora_config = LoraConfig(r=8,lora_alpha=16,target_modules=["q_proj", "v_proj"],lora_dropout=0.1,bias="none",task_type="CAUSAL_LM"
)model = get_peft_model(model, lora_config)
model.print_trainable_parameters()# 读取数据集
data_path = 'data'
df = pd.read_parquet(data_path)
dataset = CustomDataset(df, tokenizer)# 训练参数设置
training_args = TrainingArguments(output_dir='./results',num_train_epochs=3,per_device_train_batch_size=4,gradient_accumulation_steps=4,save_steps=10_000,save_total_limit=2,evaluation_strategy="no",logging_steps=10,fp16=True
)# 创建Trainer
trainer = Trainer(model=model,args=training_args,train_dataset=dataset
)# 开始训练
trainer.train()# 保存微调后的模型
model.save_pretrained("fine_tuned_internvl3")
四、加载微调后的大模型
1. Lora
- 示例代码:
from transformers import AutoModel from peft import PeftModel# 加载基础的预训练模型 base_model_path = "base_model_path" # 替换为基础预训练模型的路径 base_model = AutoModel.from_pretrained(base_model_path)# 加载微调后的适配器 adapter_path = "fine_tuned_adapter_path" # 替换为微调后适配器的保存路径 model = PeftModel.from_pretrained(base_model, adapter_path)
2. prefix tuning
- 示例代码:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer import torch.nn as nn# 定义 Prefix Tuning 模块 class PrefixTuning(nn.Module):def __init__(self, num_virtual_tokens, hidden_size):super(PrefixTuning, self).__init__()self.prefix_embeddings = nn.Embedding(num_virtual_tokens, hidden_size)def forward(self, input_ids, attention_mask):batch_size = input_ids.shape[0]prefix = self.prefix_embeddings.weight.unsqueeze(0).repeat(batch_size, 1, 1)new_input_ids = torch.cat([torch.full((batch_size, prefix.shape[1]), tokenizer.pad_token_id).to(input_ids.device),input_ids], dim=1)new_attention_mask = torch.cat([torch.ones((batch_size, prefix.shape[1])).to(attention_mask.device),attention_mask], dim=1)return new_input_ids, new_attention_mask# 加载基础的预训练模型和分词器 model_name = "gpt2" # 可替换为实际的模型名称 model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)# 初始化 Prefix Tuning 模块 num_virtual_tokens = 10 # 替换为实际的虚拟 token 数量 hidden_size = model.config.hidden_size prefix_tuning = PrefixTuning(num_virtual_tokens, hidden_size)# 加载 Prefix Tuning 的参数 try:prefix_tuning.load_state_dict(torch.load("path/to/prefix_tuning_weights.pth")) except FileNotFoundError:print("错误:未找到 Prefix Tuning 参数文件,请检查路径。")exit(1)# 将模型和 Prefix Tuning 模块移动到 GPU(如果可用) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) prefix_tuning.to(device)# 输入文本 input_text = "Once upon a time" input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device) attention_mask = torch.ones_like(input_ids).to(device)# 使用 Prefix Tuning 处理输入 new_input_ids, new_attention_mask = prefix_tuning(input_ids, attention_mask)# 进行推理 with torch.no_grad():outputs = model(new_input_ids, attention_mask=new_attention_mask)logits = outputs.logits# 生成文本 generated_ids = torch.argmax(logits, dim=-1) generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True