接上回接着说,前面我们通过分析源码,了解了大模型推理的详细流程,包括提示词从输入,到对话模版包装,到tokenID转换,到Embedding词向量转换;通过大模型推理,再将大模型输出进行最后一个token-tensor提取;再到模型的后处理,又包括多次Softmax计算、Top_p采样、解码为文本数据等;中间还穿插了对模型结构的分析;这样一套下来,我们对大模型就有了一个比较清晰的认识,那接下来我们就进入下一个环节——探索LLM外挂LoRA的推理流程。
说到LoRA,想必大家也不陌生,甚至还可以说出它的基本原理和实现的思路,比如它可以降低训练成本是因为它减少了可训练参数,可训练参数可减少又是因为矩阵的向量冗余,可用低阶矩阵来代替;再比如它可以外挂在指定的预训练权重上不改变原模型的权重;但具体它是如何外挂的、哪些权重层可以外挂、如何将原权重和其对应的LoRA外挂一一对应呢?对于在写这篇博客之前的我来说真的是一头雾水,有点纸上谈兵,真刀真枪干的时候就拉垮了。
为了弄清LoRA在代码上的具体实现,而不只是知道其原理,其优点;还是以Minimind项目为基础,从代码入手,探索其具体的实现流程。
一、LoRA
此博客的主题是LoRA,就先从原理上‘纸上谈兵’,分享一下我对LoRA的一些浅显的认识,如有不准确、不恰当、不对的地方,还请各位大神指正;
首先,模型的预训练权重其实就是一堆标量组成的向量矩阵,LoRA也是一样,它也是矩阵;说到矩阵肯定就离不开线性代数,我们大学线代里学了一个词叫极大线性无关组,说的就是一个矩阵中所包含的向量不一定是相互独立的,可能某几个向量的线性组合就可以表示另外一个向量,我们需要在这个矩阵中找到一组向量,使得这组向量不可以被矩阵中的其他向量表示出来,我们就说这组向量是该矩阵的一个极大线性无关组,也就是矩阵的秩,秩的个数就是极大线性无关组的向量个数,同样也可以反过来说这组极大线性无关组可以表示该矩阵,这个就是LoRA的核心;
然后,我们再说回LoRA,LoRA由A、B两个低阶矩阵组成,用来替代原高阶权重矩阵,同时有一个超参叫rank,这个参数定义的是A、B矩阵较小的那个维度,就类似于原权重矩阵秩的概念,原理同上,就是原高阶权重矩阵(k,k)其实它内部的向量或者信息大概率是冗余的,存在一个极大线性无关组可以替代,假设这个极大线性无关组个数为n(n<k),但又要保持该层的输入输出维度保持不变,所以可以由两个(k,n)维的矩阵相乘来替换,然后针对该层的可训练参数就又原来的k的平方变为2倍的k*n,若使得2*k*n小于k方,n小于二分之一k即可,而在一般的权重层,其维度都是四位数,但rank的超参一般指定8或者16等小数值,远小于其一半,所以可以起到降低训练参数的目的,同时结合极大线性无关的概念,其代替的效果也不会交原权重有较大的差异;
二、具体实现
上面我们说了一下LoRA实现的原理及背后简单的数学逻辑,下面就以Minimind项目为基础,通过代码来探索它的具体实现,该项目的环境安装和预训练模型权重下载大家自行完成,这里就不再演示了;
2.1 代码解析
我们先简单看一下代码的架构,从eval_model.py中,我们可以找到在init_model(args)函数中有LoRA的身影,这里有两个函数调用,apply_lora()和oad_lora()分别用来添加LoRA层和加载LoRA权重,我们分别来看这两个函数。
# eval_model.py的init_model()函数def init_model(args):tokenizer = AutoTokenizer.from_pretrained('./model/minimind_tokenizer')if args.load == 0:moe_path = '_moe' if args.use_moe else ''modes = {0: 'pretrain', 1: 'full_sft', 2: 'rlhf', 3: 'reason'}# ckp = f'./{args.out_dir}/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'ckp = f'../Weights/MiniMind2-PyTorch/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'model = MiniMindLM(LMConfig(dim=args.dim,n_layers=args.n_layers,max_seq_len=args.max_seq_len,use_moe=args.use_moe))state_dict = torch.load(ckp, map_location=args.device)model.load_state_dict({k: v for k, v in state_dict.items() if 'mask' not in k}, strict=True)if args.lora_name != 'None':apply_lora(model)load_lora(model, f'./{args.out_dir}/lora/{args.lora_name}_{args.dim}.pth')else:transformers_model_path = '../Weights/MiniMind2'tokenizer = AutoTokenizer.from_pretrained(transformers_model_path)model = AutoModelForCausalLM.from_pretrained(transformers_model_path, trust_remote_code=True)print(f'MiniMind模型参数量: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M(illion)')return model.eval().to(args.device), tokenizer
2.1.1 apply_lora()
# model/model_lora.pydef apply_lora(model, rank=16):for name, module in model.named_modules():if isinstance(module, nn.Linear) and module.weight.shape[0] == module.weight.shape[1]:lora = LoRA(module.weight.shape[0], module.weight.shape[1], rank=rank).to(model.device)setattr(module, "lora", lora)original_forward = module.forward# 显式绑定def forward_with_lora(x, layer1=original_forward, layer2=lora):return layer1(x) + layer2(x)module.forward = forward_with_lora
这个API的目的将是在model对象里面添加lora结构,代码很简单,首先该API接收model对象参数传递:
1)通过named_modules()遍历模型中的所有模块(包括子模块),并获取它们的名称(name)和对应的模块对象(module);
2)对模型对象符合nn.Linear类别且权重为方形矩阵的层进行LoRA层添加;
3)接下来就是定义LoRA层,这个我们在2.1.2详解;
4)通过setattr()函数为该层添加lora属性;
5)获取该层的前向推理original_forward;
6)通过forward_with_lora()函数将原forward和添加的lora层并行运行,输出结果叠加;
2.1.2 class LoRA(nn.Module)
# model/model_lora.py# 定义Lora网络结构
class LoRA(nn.Module):def __init__(self, in_features, out_features, rank):super().__init__()self.rank = rank # LoRA的秩(rank),控制低秩矩阵的大小self.A = nn.Linear(in_features, rank, bias=False) # 低秩矩阵Aself.B = nn.Linear(rank, out_features, bias=False) # 低秩矩阵B# 矩阵A高斯初始化self.A.weight.data.normal_(mean=0.0, std=0.02)# 矩阵B全0初始化self.B.weight.data.zero_()def forward(self, x):return self.B(self.A(x))
这是一个通过pytorch自定义的一个标准的LoRA层,先看它包括哪些子层:
1)可以看到定义了两个全链接层A、B;
2)同时指定了rank的大小,就是第一节中的超参rank;
3)下面采用不同方式初始化了A、B;
接着我们在看LoRA层的推理流程:
1)通过forward()函数,我们可以看到输入张量先经过A矩阵,再接结果输入到B矩阵,最后返回LoRA层的输出结果;
2.1.3 load_lora()
# model/model_lora.pydef load_lora(model, path):state_dict = torch.load(path, map_location=model.device)for name, module in model.named_modules():if hasattr(module, 'lora'):lora_state = {k.replace(f'{name}.lora.', ''): v for k, v in state_dict.items() if f'{name}.lora.' in k}module.lora.load_state_dict(lora_state)
这段API目的就是将已经训练好的LoRA权重文件读取并加载到model中对应的层上,步骤也很简单,API接收添加LoRA结构的model对象和本地LoRA权重文件路径:
1) 通过torch.load()函数将权重文件加载到内存,state_dict为一个字典迭代器;
2)遍历model,获取model中的层名以及层对象;
3)通过hasattr()函数判断module子层对象有没有lora属性;
4)对于有lora属性的子层,通过其层名称找到对应state_dict中的权重张量;
5)通过load_state_dict()函数将权重张量赋值到lora层;
这样我们就可以得到包含lora结构以及加载上lora权重的model对象啦;
2.2 具体实现
和之前一样,依然在eval_model.py同级目录创建一个jujupyterNotebook文件,我们自己实操一下LoRA的添加过程;
2.2.1 导包
代码:
import argparse
import random
import time
import numpy as np
import torch
import warnings
from transformers import AutoTokenizer, AutoModelForCausalLM
from model.model import MiniMindLM
from model.LMConfig import LMConfig
from model.model_lora import *warnings.filterwarnings('ignore')
2.2.2 定义超参
代码:
class ARG():def __init__(self):self.lora_name = 'None'self.out_dir = 'out'self.temperature = 0.85self.top_p = 0.85self.device = 'cpu'self.dim = 512self.n_layers = 8self.max_seq_len = 8192self.use_moe = Falseself.history_cnt = 0self.stream = Trueself.load = 0self.model_mode = 1
2.2.3 定义模型初始化API
代码:
def init_model(args):if args.load == 0:moe_path = '_moe' if args.use_moe else ''modes = {0: 'pretrain', 1: 'full_sft', 2: 'rlhf', 3: 'reason'}# ckp = f'./{args.out_dir}/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'ckp = f'../Weights/MiniMind2-PyTorch/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'model = MiniMindLM(LMConfig(dim=args.dim,n_layers=args.n_layers,max_seq_len=args.max_seq_len,use_moe=args.use_moe))state_dict = torch.load(ckp, map_location=args.device)model.load_state_dict({k: v for k, v in state_dict.items() if 'mask' not in k}, strict=True)else:transformers_model_path = '../Weights/MiniMind2'tokenizer = AutoTokenizer.from_pretrained(transformers_model_path)model = AutoModelForCausalLM.from_pretrained(transformers_model_path, trust_remote_code=True)print(f'MiniMind模型参数量: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M(illion)')return model
2.2.4 初始化超参、模型
从遍历出来的model的子层中我们可以清晰的看到模型的结果,以及原始的model对象中不包含lora层;
代码:
args = ARG()
model = init_model(args)
# 遍历模型中的所有模块(包括子模块),并获取它们的名称和对应的模块对象
for name, module in model.named_modules():print(f"Module Name: {name}")print(f"Module Type: {type(module)}")print(hasattr(module, 'lora'))if hasattr(module, 'weight') and module.weight is not None:print(f" Weight Shape: {module.weight.shape}")if hasattr(module, 'bias') and module.bias is not None:print(f" Bias Shape: {module.bias.shape}")print("-" * 40)
输出结果:
MiniMind模型参数量: 25.83M(illion)
Module Name:
Module Type: <class 'model.model.MiniMindLM'>
False
----------------------------------------
Module Name: tok_embeddings
Module Type: <class 'torch.nn.modules.sparse.Embedding'>
FalseWeight Shape: torch.Size([6400, 512])
----------------------------------------
Module Name: dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers
Module Type: <class 'torch.nn.modules.container.ModuleList'>
False
----------------------------------------
Module Name: layers.0
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.0.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.0.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.0.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.0.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.1.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.1.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.1.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.1.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.1.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.1.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.1.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.1.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.1.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.1.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.1.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.1.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
.....
2.2.5 定义LoRA层
代码:
# 定义Lora网络结构
class LoRA(nn.Module):def __init__(self, in_features, out_features, rank):super().__init__()self.rank = rank # LoRA的秩(rank),控制低秩矩阵的大小self.A = nn.Linear(in_features, rank, bias=False) # 低秩矩阵Aself.B = nn.Linear(rank, out_features, bias=False) # 低秩矩阵B# 矩阵A高斯初始化self.A.weight.data.normal_(mean=0.0, std=0.02)# 矩阵B全0初始化self.B.weight.data.zero_()def forward(self, x):return self.B(self.A(x))
2.2.6 定义LoRA添加API
代码:
def apply_lora(model, rank=16):for name, module in model.named_modules():if isinstance(module, nn.Linear) and module.weight.shape[0] == module.weight.shape[1]:lora = LoRA(module.weight.shape[0], module.weight.shape[1], rank=rank).to(model.device)setattr(module, "lora", lora)original_forward = module.forward# 显式绑定def forward_with_lora(x, layer1=original_forward, layer2=lora):return layer1(x) + layer2(x)module.forward = forward_with_lora
2.2.7 添加LoRA层
当添加完LoRA层在此遍历model子层,我们可以看到在transformer模块的Query权重层添加了LoRA层,因为Query权重层符合nn.Linear类别并且它的权重尺寸是512*512的方形;
代码:
apply_lora(model)
# 遍历模型中的所有模块(包括子模块),并获取它们的名称和对应的模块对象
for name, module in model.named_modules():print(f"Module Name: {name}")print(f"Module Type: {type(module)}")print(hasattr(module, 'lora'))if hasattr(module, 'weight') and module.weight is not None:print(f" Weight Shape: {module.weight.shape}")if hasattr(module, 'bias') and module.bias is not None:print(f" Bias Shape: {module.bias.shape}")print("-" * 40)
输出结果:
Module Name:
Module Type: <class 'model.model.MiniMindLM'>
False
----------------------------------------
Module Name: tok_embeddings
Module Type: <class 'torch.nn.modules.sparse.Embedding'>
FalseWeight Shape: torch.Size([6400, 512])
----------------------------------------
Module Name: dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers
Module Type: <class 'torch.nn.modules.container.ModuleList'>
False
----------------------------------------
Module Name: layers.0
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.0.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.0.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
TrueWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wq.lora
Module Type: <class '__main__.LoRA'>
False
----------------------------------------
Module Name: layers.0.attention.wq.lora.A
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([16, 512])
----------------------------------------
Module Name: layers.0.attention.wq.lora.B
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 16])
----------------------------------------
Module Name: layers.0.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
TrueWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wo.lora
Module Type: <class '__main__.LoRA'>
False
----------------------------------------
Module Name: layers.0.attention.wo.lora.A
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([16, 512])
----------------------------------------
Module Name: layers.0.attention.wo.lora.B
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 16])
----------------------------------------
Module Name: layers.0.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.0.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.0.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
......
2.2.8 加载LoRA权重
由于我本地没有训练好的适配的LoRA权重文件,所以这里就先保留;截止到此模型中的LoRA层的权重还是初始化的权重张量;
三、总结
至此,我们就获得到了添加了LoRA层的模型对象;通过此博客,我们回顾了LoRA的本质、原理以及其背后的数学逻辑,并通过代码完成了LoRA层的定义、通过属性添加完成了指定层的lora属性添加、通过显式绑定完成了指定层的推理流程重定义等操作,可以说不管是理论还是实践,我们都加深了对LoRA的理解;如果此博客对你有所帮助,请点一个赞,如果还想继续跟着这个系列一起学习更多大模型的实践,可以点一个关注,我将不断学习,持续更新该系列内容,谢谢~