手撕LLM（二）：从源码出发，探索LoRA加载、推理全流程

接上回接着说，前面我们通过分析源码，了解了大模型推理的详细流程，包括提示词从输入，到对话模版包装，到tokenID转换，到Embedding词向量转换；通过大模型推理，再将大模型输出进行最后一个token-tensor提取；再到模型的后处理，又包括多次Softmax计算、Top_p采样、解码为文本数据等；中间还穿插了对模型结构的分析；这样一套下来，我们对大模型就有了一个比较清晰的认识，那接下来我们就进入下一个环节——探索LLM外挂LoRA的推理流程。

说到LoRA，想必大家也不陌生，甚至还可以说出它的基本原理和实现的思路，比如它可以降低训练成本是因为它减少了可训练参数，可训练参数可减少又是因为矩阵的向量冗余，可用低阶矩阵来代替；再比如它可以外挂在指定的预训练权重上不改变原模型的权重；但具体它是如何外挂的、哪些权重层可以外挂、如何将原权重和其对应的LoRA外挂一一对应呢？对于在写这篇博客之前的我来说真的是一头雾水，有点纸上谈兵，真刀真枪干的时候就拉垮了。

为了弄清LoRA在代码上的具体实现，而不只是知道其原理，其优点；还是以Minimind项目为基础，从代码入手，探索其具体的实现流程。

一、LoRA

此博客的主题是LoRA，就先从原理上‘纸上谈兵’，分享一下我对LoRA的一些浅显的认识，如有不准确、不恰当、不对的地方，还请各位大神指正；

首先，模型的预训练权重其实就是一堆标量组成的向量矩阵，LoRA也是一样，它也是矩阵；说到矩阵肯定就离不开线性代数，我们大学线代里学了一个词叫极大线性无关组，说的就是一个矩阵中所包含的向量不一定是相互独立的，可能某几个向量的线性组合就可以表示另外一个向量，我们需要在这个矩阵中找到一组向量，使得这组向量不可以被矩阵中的其他向量表示出来，我们就说这组向量是该矩阵的一个极大线性无关组，也就是矩阵的秩，秩的个数就是极大线性无关组的向量个数，同样也可以反过来说这组极大线性无关组可以表示该矩阵，这个就是LoRA的核心；

然后，我们再说回LoRA，LoRA由A、B两个低阶矩阵组成，用来替代原高阶权重矩阵，同时有一个超参叫rank，这个参数定义的是A、B矩阵较小的那个维度，就类似于原权重矩阵秩的概念，原理同上，就是原高阶权重矩阵（k，k）其实它内部的向量或者信息大概率是冗余的，存在一个极大线性无关组可以替代，假设这个极大线性无关组个数为n（n<k），但又要保持该层的输入输出维度保持不变，所以可以由两个（k，n）维的矩阵相乘来替换，然后针对该层的可训练参数就又原来的k的平方变为2倍的k*n，若使得2*k*n小于k方，n小于二分之一k即可，而在一般的权重层，其维度都是四位数，但rank的超参一般指定8或者16等小数值，远小于其一半，所以可以起到降低训练参数的目的，同时结合极大线性无关的概念，其代替的效果也不会交原权重有较大的差异；

二、具体实现

上面我们说了一下LoRA实现的原理及背后简单的数学逻辑，下面就以Minimind项目为基础，通过代码来探索它的具体实现，该项目的环境安装和预训练模型权重下载大家自行完成，这里就不再演示了；

2.1 代码解析

我们先简单看一下代码的架构，从eval_model.py中，我们可以找到在init_model(args)函数中有LoRA的身影，这里有两个函数调用，apply_lora()和oad_lora()分别用来添加LoRA层和加载LoRA权重，我们分别来看这两个函数。

# eval_model.py的init_model（）函数def init_model(args):tokenizer = AutoTokenizer.from_pretrained('./model/minimind_tokenizer')if args.load == 0:moe_path = '_moe' if args.use_moe else ''modes = {0: 'pretrain', 1: 'full_sft', 2: 'rlhf', 3: 'reason'}# ckp = f'./{args.out_dir}/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'ckp = f'../Weights/MiniMind2-PyTorch/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'model = MiniMindLM(LMConfig(dim=args.dim,n_layers=args.n_layers,max_seq_len=args.max_seq_len,use_moe=args.use_moe))state_dict = torch.load(ckp, map_location=args.device)model.load_state_dict({k: v for k, v in state_dict.items() if 'mask' not in k}, strict=True)if args.lora_name != 'None':apply_lora(model)load_lora(model, f'./{args.out_dir}/lora/{args.lora_name}_{args.dim}.pth')else:transformers_model_path = '../Weights/MiniMind2'tokenizer = AutoTokenizer.from_pretrained(transformers_model_path)model = AutoModelForCausalLM.from_pretrained(transformers_model_path, trust_remote_code=True)print(f'MiniMind模型参数量: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M(illion)')return model.eval().to(args.device), tokenizer

2.1.1 apply_lora()

# model/model_lora.pydef apply_lora(model, rank=16):for name, module in model.named_modules():if isinstance(module, nn.Linear) and module.weight.shape[0] == module.weight.shape[1]:lora = LoRA(module.weight.shape[0], module.weight.shape[1], rank=rank).to(model.device)setattr(module, "lora", lora)original_forward = module.forward# 显式绑定def forward_with_lora(x, layer1=original_forward, layer2=lora):return layer1(x) + layer2(x)module.forward = forward_with_lora

这个API的目的将是在model对象里面添加lora结构，代码很简单，首先该API接收model对象参数传递：
1）通过named_modules()遍历模型中的所有模块（包括子模块），并获取它们的名称（name）和对应的模块对象（module）；
2）对模型对象符合nn.Linear类别且权重为方形矩阵的层进行LoRA层添加；
3）接下来就是定义LoRA层，这个我们在2.1.2详解；
4）通过setattr()函数为该层添加lora属性；
5）获取该层的前向推理original_forward；
6）通过forward_with_lora（）函数将原forward和添加的lora层并行运行，输出结果叠加；

2.1.2 class LoRA(nn.Module)

# model/model_lora.py# 定义Lora网络结构
class LoRA(nn.Module):def __init__(self, in_features, out_features, rank):super().__init__()self.rank = rank  # LoRA的秩（rank），控制低秩矩阵的大小self.A = nn.Linear(in_features, rank, bias=False)  # 低秩矩阵Aself.B = nn.Linear(rank, out_features, bias=False)  # 低秩矩阵B# 矩阵A高斯初始化self.A.weight.data.normal_(mean=0.0, std=0.02)# 矩阵B全0初始化self.B.weight.data.zero_()def forward(self, x):return self.B(self.A(x))

这是一个通过pytorch自定义的一个标准的LoRA层，先看它包括哪些子层：
1）可以看到定义了两个全链接层A、B；
2）同时指定了rank的大小，就是第一节中的超参rank；
3）下面采用不同方式初始化了A、B；

接着我们在看LoRA层的推理流程：
1）通过forward（）函数，我们可以看到输入张量先经过A矩阵，再接结果输入到B矩阵，最后返回LoRA层的输出结果；

2.1.3 load_lora()

# model/model_lora.pydef load_lora(model, path):state_dict = torch.load(path, map_location=model.device)for name, module in model.named_modules():if hasattr(module, 'lora'):lora_state = {k.replace(f'{name}.lora.', ''): v for k, v in state_dict.items() if f'{name}.lora.' in k}module.lora.load_state_dict(lora_state)

这段API目的就是将已经训练好的LoRA权重文件读取并加载到model中对应的层上，步骤也很简单，API接收添加LoRA结构的model对象和本地LoRA权重文件路径：
1）通过torch.load()函数将权重文件加载到内存，state_dict为一个字典迭代器；
2）遍历model，获取model中的层名以及层对象；
3）通过hasattr()函数判断module子层对象有没有lora属性；
4）对于有lora属性的子层，通过其层名称找到对应state_dict中的权重张量；
5）通过load_state_dict（）函数将权重张量赋值到lora层；

这样我们就可以得到包含lora结构以及加载上lora权重的model对象啦；

2.2 具体实现

和之前一样，依然在eval_model.py同级目录创建一个jujupyterNotebook文件，我们自己实操一下LoRA的添加过程；

2.2.1 导包

代码：

import argparse
import random
import time
import numpy as np
import torch
import warnings
from transformers import AutoTokenizer, AutoModelForCausalLM
from model.model import MiniMindLM
from model.LMConfig import LMConfig
from model.model_lora import *warnings.filterwarnings('ignore')

2.2.2 定义超参

代码：

class ARG():def __init__(self):self.lora_name = 'None'self.out_dir = 'out'self.temperature = 0.85self.top_p = 0.85self.device = 'cpu'self.dim = 512self.n_layers = 8self.max_seq_len = 8192self.use_moe = Falseself.history_cnt = 0self.stream = Trueself.load = 0self.model_mode = 1

2.2.3 定义模型初始化API

代码：

def init_model(args):if args.load == 0:moe_path = '_moe' if args.use_moe else ''modes = {0: 'pretrain', 1: 'full_sft', 2: 'rlhf', 3: 'reason'}# ckp = f'./{args.out_dir}/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'ckp = f'../Weights/MiniMind2-PyTorch/{modes[args.model_mode]}_{args.dim}{moe_path}.pth'model = MiniMindLM(LMConfig(dim=args.dim,n_layers=args.n_layers,max_seq_len=args.max_seq_len,use_moe=args.use_moe))state_dict = torch.load(ckp, map_location=args.device)model.load_state_dict({k: v for k, v in state_dict.items() if 'mask' not in k}, strict=True)else:transformers_model_path = '../Weights/MiniMind2'tokenizer = AutoTokenizer.from_pretrained(transformers_model_path)model = AutoModelForCausalLM.from_pretrained(transformers_model_path, trust_remote_code=True)print(f'MiniMind模型参数量: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M(illion)')return model

2.2.4 初始化超参、模型

从遍历出来的model的子层中我们可以清晰的看到模型的结果，以及原始的model对象中不包含lora层；

代码：

args = ARG()
model = init_model(args)
# 遍历模型中的所有模块（包括子模块），并获取它们的名称和对应的模块对象
for name, module in model.named_modules():print(f"Module Name: {name}")print(f"Module Type: {type(module)}")print(hasattr(module, 'lora'))if hasattr(module, 'weight') and module.weight is not None:print(f"  Weight Shape: {module.weight.shape}")if hasattr(module, 'bias') and module.bias is not None:print(f"  Bias Shape: {module.bias.shape}")print("-" * 40)

输出结果：

MiniMind模型参数量: 25.83M(illion)
Module Name: 
Module Type: <class 'model.model.MiniMindLM'>
False
----------------------------------------
Module Name: tok_embeddings
Module Type: <class 'torch.nn.modules.sparse.Embedding'>
FalseWeight Shape: torch.Size([6400, 512])
----------------------------------------
Module Name: dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers
Module Type: <class 'torch.nn.modules.container.ModuleList'>
False
----------------------------------------
Module Name: layers.0
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.0.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.0.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.0.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.0.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.1.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.1.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.1.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.1.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.1.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.1.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.1.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.1.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.1.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.1.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.1.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.1.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.1.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
.....

2.2.5 定义LoRA层

代码：

# 定义Lora网络结构
class LoRA(nn.Module):def __init__(self, in_features, out_features, rank):super().__init__()self.rank = rank  # LoRA的秩（rank），控制低秩矩阵的大小self.A = nn.Linear(in_features, rank, bias=False)  # 低秩矩阵Aself.B = nn.Linear(rank, out_features, bias=False)  # 低秩矩阵B# 矩阵A高斯初始化self.A.weight.data.normal_(mean=0.0, std=0.02)# 矩阵B全0初始化self.B.weight.data.zero_()def forward(self, x):return self.B(self.A(x))

2.2.6 定义LoRA添加API

代码：

def apply_lora(model, rank=16):for name, module in model.named_modules():if isinstance(module, nn.Linear) and module.weight.shape[0] == module.weight.shape[1]:lora = LoRA(module.weight.shape[0], module.weight.shape[1], rank=rank).to(model.device)setattr(module, "lora", lora)original_forward = module.forward# 显式绑定def forward_with_lora(x, layer1=original_forward, layer2=lora):return layer1(x) + layer2(x)module.forward = forward_with_lora

2.2.7 添加LoRA层

当添加完LoRA层在此遍历model子层，我们可以看到在transformer模块的Query权重层添加了LoRA层，因为Query权重层符合nn.Linear类别并且它的权重尺寸是512*512的方形；

代码：

apply_lora(model)
# 遍历模型中的所有模块（包括子模块），并获取它们的名称和对应的模块对象
for name, module in model.named_modules():print(f"Module Name: {name}")print(f"Module Type: {type(module)}")print(hasattr(module, 'lora'))if hasattr(module, 'weight') and module.weight is not None:print(f"  Weight Shape: {module.weight.shape}")if hasattr(module, 'bias') and module.bias is not None:print(f"  Bias Shape: {module.bias.shape}")print("-" * 40)

输出结果：

Module Name: 
Module Type: <class 'model.model.MiniMindLM'>
False
----------------------------------------
Module Name: tok_embeddings
Module Type: <class 'torch.nn.modules.sparse.Embedding'>
FalseWeight Shape: torch.Size([6400, 512])
----------------------------------------
Module Name: dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers
Module Type: <class 'torch.nn.modules.container.ModuleList'>
False
----------------------------------------
Module Name: layers.0
Module Type: <class 'model.model.MiniMindBlock'>
False
----------------------------------------
Module Name: layers.0.attention
Module Type: <class 'model.model.Attention'>
False
----------------------------------------
Module Name: layers.0.attention.wq
Module Type: <class 'torch.nn.modules.linear.Linear'>
TrueWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wq.lora
Module Type: <class '__main__.LoRA'>
False
----------------------------------------
Module Name: layers.0.attention.wq.lora.A
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([16, 512])
----------------------------------------
Module Name: layers.0.attention.wq.lora.B
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 16])
----------------------------------------
Module Name: layers.0.attention.wk
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wv
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([128, 512])
----------------------------------------
Module Name: layers.0.attention.wo
Module Type: <class 'torch.nn.modules.linear.Linear'>
TrueWeight Shape: torch.Size([512, 512])
----------------------------------------
Module Name: layers.0.attention.wo.lora
Module Type: <class '__main__.LoRA'>
False
----------------------------------------
Module Name: layers.0.attention.wo.lora.A
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([16, 512])
----------------------------------------
Module Name: layers.0.attention.wo.lora.B
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 16])
----------------------------------------
Module Name: layers.0.attention.attn_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention.resid_dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
Module Name: layers.0.attention_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.ffn_norm
Module Type: <class 'model.model.RMSNorm'>
FalseWeight Shape: torch.Size([512])
----------------------------------------
Module Name: layers.0.feed_forward
Module Type: <class 'model.model.FeedForward'>
False
----------------------------------------
Module Name: layers.0.feed_forward.w1
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.w2
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([512, 1408])
----------------------------------------
Module Name: layers.0.feed_forward.w3
Module Type: <class 'torch.nn.modules.linear.Linear'>
FalseWeight Shape: torch.Size([1408, 512])
----------------------------------------
Module Name: layers.0.feed_forward.dropout
Module Type: <class 'torch.nn.modules.dropout.Dropout'>
False
----------------------------------------
......

2.2.8 加载LoRA权重

由于我本地没有训练好的适配的LoRA权重文件，所以这里就先保留；截止到此模型中的LoRA层的权重还是初始化的权重张量；

三、总结

至此，我们就获得到了添加了LoRA层的模型对象；通过此博客，我们回顾了LoRA的本质、原理以及其背后的数学逻辑，并通过代码完成了LoRA层的定义、通过属性添加完成了指定层的lora属性添加、通过显式绑定完成了指定层的推理流程重定义等操作，可以说不管是理论还是实践，我们都加深了对LoRA的理解；如果此博客对你有所帮助，请点一个赞，如果还想继续跟着这个系列一起学习更多大模型的实践，可以点一个关注，我将不断学习，持续更新该系列内容，谢谢～