大模型_alpaca-lora微调及推理部署

文章目录

lora是什么
- 重要相关参数
- LoRA 的优势
微调部署
- 下载项目
- 切换到项目目录下
- 切换conda环境
- - 模型下载
  - 微调数据集下载
- 启动微调
- - 失败1
  - - 原因
    - 分析
  - 失败2
  - - 修改前
    - 修改后
推理部署

lora是什么

在这里插入图片描述

重要相关参数

lora_rank(int,optional): LoRA 微调中的秩大小。
lora_alpha(float,optional): LoRA 微调中的缩放系数。
lora_dropout(float,optional): LoRA 微调中的 Dropout 系数。

LoRA 的优势

LoRA 的最大优势是速度更快，使用的内存更少，因此可以在消费级硬件上运行。
在多卡训练时，Lora也是效率很高的，在多卡训练中，LoRA的速度优势主要体现在两个方面：

计算效率：由于LoRA只需要计算和优化注入的低秩矩阵，因此它的计算效率比完全微调更高。在多卡训练中，LoRA可以将注入矩阵的计算和优化分配到多个GPU上，从而加速训练过程。
通信效率：在多卡训练中，通信效率通常是一个瓶颈。由于LoRA只需要通信注入矩阵的参数，因此它的通信效率比完全微调更高。在多卡训练中，LoRA可以将注入矩阵的参数分配到多个GPU上，从而减少通信量和通信时间。因此，LoRA在多卡训练中通常比完全微调更快。具体来说，LoRA可以将硬件门槛降低多达3倍，从而提高训练的效率。

微调部署

下载项目

git clone https://github.com/tloen/alpaca-lora.git

在这里插入图片描述

切换到项目目录下

cd alpaca-lora

切换conda环境

source activate
conda activate alpaca-lora

模型下载

https://huggingface.co/decapoda-research/llama-7b-hf

模型放在：/data/sim_chatgpt/llama-7b-hf

微调数据集下载

该数据基于斯坦福alpca数据进行了清洗，但至于具体清洗流程并不知

https://huggingface.co/datasets/yahma/alpaca-cleaned

微调数据放在：/data/datasets/alpaca-cleaned

启动微调

nohup python -u finetune.py \--base_model '/data/sim_chatgpt/llama-7b-hf' \--data_path '/data/datasets/alpaca-cleaned' \--output_dir './lora-alpaca' \>> log.out 2>&1 &

失败1

原因

Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model.

查看log.out日志，发现是GPU不够
在这里插入图片描述

分析

nvidia-smi，查看内存使用情况
在这里插入图片描述

失败2

RuntimeError：expected scaler type Half but found Float

修改前

在这里插入图片描述

修改后

在finetune.py文件上，加上"with torch.autocast(“cuda”):"，并注意下一行缩进问题

with torch.autocast("cuda"):trainer.train(resume_from_checkpoint=resume_from_checkpoint)

在这里插入图片描述
再次启动微调即可
注：微调时间很长，需要等待，具体微调日志可见log.out文件

推理部署

在generate.py文件，将share=True，便于公网访问。

python generate.py \--load_8bit \--base_model '/data/sim_chatgpt/llama-7b-hf' \--lora_weights './lora-alpaca/checkpoint-2000'

注意：/lora-alpaca文件有，比如checkpoint-800、checkpoint-1000、checkpoint-2000，可自由选择

如果报错，不能创建链接，降低下gradio版本即可，如：pip install gradio==3.13
在这里插入图片描述
一两分钟后看到公网网址

将公网网址放到浏览器上提问：
根据"https://huggingface.co/datasets/yahma/alpaca-cleaned"，instruction（string）里的一个问题"Give three tips for staying healthy."进行提问，发现网页输出的结果跟output差不多，说明模型进行微调学习得不错。