Visualglm-6b-CSDN博客文章浏览阅读1.3k次。【官方教程】XrayGLM微调实践,(加强后的GPT-3.5)能力媲美4.0,无次数限制。_visualglm-6bhttps://blog.csdn.net/u012193416/article/details/131074962?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522171478876716800184169034%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=171478876716800184169034&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-1-131074962-null-null.nonecase&utm_term=visualglm&spm=1018.2226.3001.4450本次主要是在本地微调visualglm-6b的几个小坑,这玩意环境是麻烦啊,一堆问题。
1.数据问题
记得模型路径别改错。
2.模型加载问题
visualglm的模型加载非常的麻烦,首先字huggingface上开源了hf版本的权重,但是这个权重只能用来推理,无法微调,因此要下微调版本的权重:
可以下这个在Xray上微调的版本,原始的清华云盘已经失效了
https://huggingface.co/wangrongsheng/XrayGLM-300/tree/main
下载完成之后,注意路径
然后在代码更改代码:
model_type = "/home/image_team/image_team_docker_home/lgd/e_commerce_lmm/weights/THUDM_Visualglm6b/"model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args)
AttributeError 'FakeTokenizer' object has no attribute 'encode'
改一下代码,是chtaglm的tokenizer的加载问题
def create_dataset_function(path, args):# tokenizer = get_tokenizer("args")tokenizer = AutoTokenizer.from_pretrained("/home/image_team/image_team_docker_home/lgd/e_commerce_lmm/weights/visualglm-6b/", trust_remote_code=True)image_processor = BlipImageEvalProcessor(224)dataset = FewShotDataset(path, image_processor, tokenizer, args)return dataset
3.版本问题
升级gcc 5.4
sudo rpm -ivh gcc-5.4.0-1.el7.centos.x86_64.rpm
export CC=/usr/local/bin/x86_64-unknown-linux-gnu-gcc
export CXX=/usr/local/bin/x86_64-unknown-linux-gnu-g++
为CentOS 6、7升级gcc至4.8、4.9、5.2、6.3、7.3等高版本 - VPS侦探CentOS 7虽然已经出了很多年了,但依然会有很多人选择安装CentOS 6,CentOS 6有些依赖包和软件都比较老旧,如今天的主角gcc编译器,CentOS 6的gcc版本为4.4,CentOS 7为4.8。gcc 4.8最主要的一个特性就是全面支持C++11,如果不清楚什么用的也没关系,简单说一些C++11标准的程序都需要gcc 4.8以上版本的gcc编译器编译,如MySQL 8.0版本(8.0.16以上版本是C++14标准,需gcc 5.3以上版本)。https://www.vpser.net/manage/centos-6-upgrade-gcc.html5.4版本不行,后续在编译c++17时会报错。
直接执行pip install -r requirement.txt
SwissArmyTransformer>=0.3.6
transformers==4.27.1
bitsandbytes==0.39.0
deepspeed==0.14.0
这其中bitsandbytes这个库和量化有关,可能要升级你的gcc等,比较麻烦
遇到了yum卡主了不动的情况
rpm和yum卡住 - 知乎问题描述今天使用yum 安装 一个软件的时候,发现一只卡在yum 这一步, 没有任何报错信息, 且无法 ctrl + c 终止, 只能通过后台 kill -9 杀死 问题解决1. 通过添加 -vv 查看日志 yum -vv + 软件包会发现卡在 loading …https://zhuanlan.zhihu.com/p/358154111一直编译fused_adam失败
deepspeed-ninja报错解决 - Be With youdeepspeed训练模型时ninja报错解决1、报错如下:12345678910111213141516171819202122232425[1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\"https://johnson7788.github.io/2023/08/23/deepspeed-ninja%E6%8A%A5%E9%94%99%E8%A7%A3%E5%86%B3/c++17 nvcc版本不支持c++17编译,nvcc是11.8,gcc7.3就可以支持了
/usr/local/cuda-12.1/bin/nvcc -ccbin /usr/local/bin/x86_64-unknown-linux-gnu-gcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/miniconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/miniconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/miniconda3/envs/visualglm/lib/python3.10/site-packages/torch/include -isystem /home/miniconda3/envs/visualglm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/miniconda3/envs/visualglm/lib/python3.10/site-packages/torch/include/TH -isystem /home/miniconda3/envs/visualglm/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-11.1/include -isystem /home/miniconda3/envs/visualglm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -std=c++14 -c /home/miniconda3/envs/visualglm/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
4.微调
#! /bin/bash
NUM_WORKERS=1
NUM_GPUS_PER_WORKER=4
MP_SIZE=1script_path=$(realpath $0)
script_dir=$(dirname $script_path)
main_dir=$(dirname $script_dir)
MODEL_TYPE="visualglm-6b"
MODEL_ARGS="--max_source_length 64 \--max_target_length 256 \--lora_rank 10 \--layer_range 0 14 \--pre_seq_len 4"# OPTIONS_SAT="SAT_HOME=$1" #"SAT_HOME=/raid/dm/sat_models"
OPTIONS_NCCL="NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2"
HOST_FILE_PATH="hostfile"
HOST_FILE_PATH="hostfile_single"train_data="/home/image_team/image_team_docker_home/lgd/e_commerce_lmm/data/fewshot-data/dataset.json"
eval_data="/home/image_team/image_team_docker_home/lgd/e_commerce_lmm/data/fewshot-data/dataset.json"gpt_options=" \--experiment-name finetune-$MODEL_TYPE \--model-parallel-size ${MP_SIZE} \--mode finetune \--train-iters 300 \--resume-dataloader \$MODEL_ARGS \--train-data ${train_data} \--valid-data ${eval_data} \--distributed-backend nccl \--lr-decay-style cosine \--warmup .02 \--checkpoint-activations \--save-interval 300 \--eval-interval 10000 \--save "/home/image_team/image_team_docker_home/lgd/e_commerce_lmm/results/visualglm_6b_xray" \--split 1 \--eval-iters 10 \--eval-batch-size 8 \--zero-stage 1 \--lr 0.0001 \--batch-size 4 \--skip-init \--fp16 \--use_lora
"run_cmd="${OPTIONS_NCCL} ${OPTIONS_SAT} deepspeed --master_port 16666 --hostfile ${HOST_FILE_PATH} finetune_visualglm.py ${gpt_options}"
echo ${run_cmd}
eval ${run_cmd}set +x