【深度学习】LLaMA-Factory 大模型微调工具, 大模型GLM-4-9B Chat ，微调与部署 (2)

文章目录

数据准备
chat
评估模型
导出模型
部署
总结

资料：
https://github.com/hiyouga/LLaMA-Factory/blob/main/README_zh.md
https://www.53ai.com/news/qianyanjishu/2015.html

代码拉取：

git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

build镜像和执行镜像：


cd /ssd/xiedong/glm-4-9b-xd/LLaMA-Factorydocker build -f ./docker/docker-cuda/Dockerfile \--build-arg INSTALL_BNB=false \--build-arg INSTALL_VLLM=false \--build-arg INSTALL_DEEPSPEED=false \--build-arg INSTALL_FLASHATTN=false \--build-arg PIP_INDEX=https://pypi.org/simple \-t llamafactory:latest .docker run -dit --gpus=all \-v ./hf_cache:/root/.cache/huggingface \-v ./ms_cache:/root/.cache/modelscope \-v ./data:/app/data \-v ./output:/app/output \-v /ssd/xiedong/glm-4-9b-xd:/ssd/xiedong/glm-4-9b-xd \-p 9998:7860 \-p 9999:8000 \--shm-size 16G \llamafactory:latestdocker exec -it  a2b34ec1 bashpip install bitsandbytes>=0.37.0

我构建好的镜像是：kevinchina/deeplearning:llamafactory-0.8.3，可以直接执行：


cd /ssd/xiedong/glm-4-9b-xd/LLaMA-Factory
docker run -dit --gpus '"device=0,1,2,3"' \-v ./hf_cache:/root/.cache/huggingface \-v ./ms_cache:/root/.cache/modelscope \-v ./data:/app/data \-v ./output:/app/output \-v /ssd/xiedong/glm-4-9b-xd:/ssd/xiedong/glm-4-9b-xd \-p 9998:7860 \-p 9999:8000 \--shm-size 16G \kevinchina/deeplearning:llamafactory-0.8.3

快速开始

下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA 微调、推理和合并。

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

高级用法请参考 examples/README_zh.md（包括多 GPU 微调）。

Tip

使用 llamafactory-cli help 显示帮助信息。

LLaMA Board 可视化微调（由 Gradio 驱动）

llamafactory-cli webui

看一点资料：https://www.cnblogs.com/lm970585581/p/18140564

在这里插入图片描述

数据准备

数据准备的官方说明:

https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README_zh.md

偏好数据集是用在奖励建模阶段的。

Alpaca 格式数据集格式：

[{"instruction": "人类指令（必填）","input": "人类输入（选填）","output": "模型回答（必填）","system": "系统提示词（选填）","history": [["第一轮指令（选填）", "第一轮回答（选填）"],["第二轮指令（选填）", "第二轮回答（选填）"]]}
]

在指令监督微调数据集（Alpaca 格式）中，几个主要列分别有以下作用：

instruction（人类指令，必填）:
- 这一列包含了人类发出的具体指令或问题。这是模型根据指令生成回答的主要输入。
- 例子: “请解释一下量子力学的基本概念。”
input（人类输入，选填）:
- 这一列包含了与指令相关的额外输入信息，可以为空。如果填写，则与指令一起构成人类的完整输入。
- 例子: 如果指令是"请解释以下内容：“，input 可以是"量子力学的基本概念。”
output（模型回答，必填）:
- 这一列包含了模型生成的回答或反应。这个是模型在接收到指令和输入后应产生的输出。
- 例子: “量子力学是一门研究微观粒子行为的物理学分支，其基本概念包括波粒二象性、测不准原理等。”
system（系统提示词，选填）:
- 这一列提供了给模型的系统级提示词，帮助设置对话的上下文或对话的语境。如果没有特定的系统提示词，可以为空。
- 例子: “你是一位物理学家，擅长解释复杂的科学概念。”
history（历史对话，选填）:
- 这一列包含了历史对话记录，是由多个字符串二元组构成的列表，每个二元组代表一轮对话的指令和回答。这些历史记录可以帮助模型理解当前对话的上下文。
- 例子:
```
[["什么是相对论？", "相对论是由爱因斯坦提出的理论，分为狭义相对论和广义相对论。"],["狭义相对论的核心概念是什么？", "狭义相对论的核心概念是光速不变和时间空间的相对性。"]
]
```

综上所述，这些列在数据集中的作用是：

instruction 和 input 一起构成人类给模型的完整输入。
output 是模型在接收到输入后生成的回答。
system 为模型提供额外的上下文或提示。
history 提供对话的历史记录，帮助模型理解和生成更加连贯的回答。

我现在要微调一个领域任务。这个任务是这样的：会有很长一段材料，要模型给出材料分类、材料里写的负责人名字。我要如何构建数据集?下面是例子：
数据集结构可以这么给：

[{"instruction": "请对以下材料进行分类，并找出材料中的负责人名字。","input": "材料内容","output": "分类: 材料分类; 负责人: 负责人名字","system": "你是一位文本分类和信息提取专家。",}
]

样例数据：

[{"instruction": "请对以下材料进行分类，并找出材料中的负责人名字。","input": "本公司2024年第一季度财报显示，收入增长了20%。财务负责人是张三。","output": "分类: 财务报告; 负责人: 张三","system": "你是一位文本分类和信息提取专家。",},{"instruction": "请对以下材料进行分类，并找出材料中的负责人名字。","input": "根据最新的市场调研报告，本季度市场份额有显著提升。市场部负责人李四表示，对未来市场充满信心。","output": "分类: 市场调研报告; 负责人: 李四","system": "你是一位文本分类和信息提取专家。",}
]

dataset_info.json这么加：

 "数据集名称": {"file_name": "data.json","columns": {"prompt": "instruction","query": "input","response": "output","system": "system",}}

本次微调选择了开源项目数据集，地址如下：
https://github.com/KMnO4-zx/huanhuan-chat/blob/master/dataset/train/lora/huanhuan.json
下载后，将json文件存放到LLaMA-Factory的data目录下。

修改data目录下dataset_info.json 文件。

直接增加以下内容即可：

 "huanhuan": {"file_name": "huanhuan.json"}，

如图：
在这里插入图片描述

进入容器打开webui：

llamafactory-cli webui

网页打开页面：
http://10.136.19.26:9998/

在这里插入图片描述

在这里插入图片描述
webui训练老报错，可以把指令弄下来去容器里执行：

llamafactory-cli train \--stage sft \--do_train True \--model_name_or_path /ssd/xiedong/glm-4-9b-xd/glm-4-9b-chat \--preprocessing_num_workers 16 \--finetuning_type lora \--quantization_method bitsandbytes \--template glm4 \--flash_attn auto \--dataset_dir data \--dataset huanhuan \--cutoff_len 1024 \--learning_rate 5e-05 \--num_train_epochs 3.0 \--max_samples 100000 \--per_device_train_batch_size 2 \--gradient_accumulation_steps 8 \--lr_scheduler_type cosine \--max_grad_norm 1.0 \--logging_steps 5 \--save_steps 100 \--warmup_steps 0 \--optim adamw_torch \--packing False \--report_to none \--output_dir saves/GLM-4-9B-Chat/lora/train_2024-07-23-04-22-25 \--bf16 True \--plot_loss True \--ddp_timeout 180000000 \--include_num_input_tokens_seen True \--lora_rank 8 \--lora_alpha 32 \--lora_dropout 0.1 \--lora_target all

在这里插入图片描述
训练完:

***** train metrics *****
epoch = 2.9807
num_input_tokens_seen = 741088
total_flos = 36443671GF
train_loss = 2.5584
train_runtime = 0:09:24.59
train_samples_per_second = 19.814
train_steps_per_second = 0.308

chat

在这里插入图片描述

评估模型

40G显存空余才行，这模型太大。

类似，看指令，然后命令行执行:

CUDA_VISIBLE_DEVICES=1,2,3 llamafactory-cli train \--stage sft \--model_name_or_path /ssd/xiedong/glm-4-9b-xd/glm-4-9b-chat \--preprocessing_num_workers 16 \--finetuning_type lora \--quantization_method bitsandbytes \--template glm4 \--flash_attn auto \--dataset_dir data \--eval_dataset huanhuan \--cutoff_len 1024 \--max_samples 100000 \--per_device_eval_batch_size 2 \--predict_with_generate True \--max_new_tokens 512 \--top_p 0.7 \--temperature 0.95 \--output_dir saves/GLM-4-9B-Chat/lora/eval_2024-07-23-04-22-25 \--do_predict True \--adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-07-23-04-22-25

数据集有点大，没执行完我就停止了，结果可能是存这里：/app/saves/GLM-4-9B-Chat/lora/eval_2024-07-23-04-22-25

在这里插入图片描述

导出模型

填导出路径进行导出/ssd/xiedong/glm-4-9b-xd/export_test0723。
在这里插入图片描述

在这里插入图片描述

部署

LLaMA-Factory可以直接部署模型，给参数就可以。

https://github.com/hiyouga/LLaMA-Factory/blob/main/src/api.py

比如：


llamafactory-cli api  --model_name_or_path /ssd/xiedong/glm-4-9b-xd/export_0725_yingyong --template glm4 --finetuning_type lora --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-07-26-02-14-58

请求：

curl -X 'POST' \'http://10.136.19.26:9999/v1/chat/completions' \-H 'accept: application/json' \-H 'Content-Type: application/json' \-d '{"model": "gpt-4","messages": [{"role": "user","content": "你是谁?"}],"do_sample": true,"temperature": 0.7,"top_p": 0.9,"n": 1,"max_tokens": 150,"stop": null,"stream": false
}
'

python请求：

import requestsurl = 'http://10.136.19.26:9999/v1/chat/completions'
headers = {'accept': 'application/json','Content-Type': 'application/json'
}
data = {"model": "gpt-4","messages": [{"role": "system","content": "你是一位文本分析专家，现在需要分析文本的所属应用类别。"},{"role": "user",# user_input+ocr_ret"content": "贷款"}],"do_sample": True,"temperature": 0.7,"top_p": 0.9,"n": 1,"max_tokens": 150,"stop": None,"stream": False
}response = requests.post(url, headers=headers, json=data)print(response.json()['choices'][0]['message']['content'].replace('\n', ''))

webUi：

llamafactory-cli webchat  --model_name_or_path /ssd/xiedong/glm-4-9b-xd/export_0725_yingyong --template glm4 --finetuning_type lora --adapter_name_or_path saves/GLM-4-9B-Chat/lora/train_2024-07-26-02-14-58 llamafactory-cli webchat  --model_name_or_path /ssd/xiedong/glm-4-9b-xd/glm-4-9b-chat --template glm4 --finetuning_type lora

总结

这么看下来，这个文档的含金量很高：
https://github.com/hiyouga/LLaMA-Factory/tree/main/examples

为了方便使用，推送了这个镜像:

docker push kevinchina/deeplearning:llamafactory-0.8.3