模型:Llama2-chat-13B-Chinese-50W
01 下载安装 Llama2 模型
Huggingface在国内是访问不了的,需要使用代理。在这里推荐使用 clash-for-liunx 配置代理。
-
安装 git-lfs,用于大文件下载
sudo apt-get install git-lfs git lfs install
-
Huggingface 下载 Llama2 模型
git clone https://huggingface.co/RicardoLee/Llama2-chat-13B-Chinese-50W
这里会下载很久,耐心等待吧。另外,可能会有文件下载失败,需要自己手动下载:
wget --no-check-certificate https://huggingface.co/RicardoLee/Llama2-chat-13B-Chinese-50W/resolve/main/pytorch_model-00001-of-00003.bin wget --no-check-certificate https://huggingface.co/RicardoLee/Llama2-chat-13B-Chinese-50W/resolve/main/pytorch_model-00002-of-00003.bin wget --no-check-certificate https://huggingface.co/RicardoLee/Llama2-chat-13B-Chinese-50W/resolve/main/pytorch_model-00003-of-00003.bin
环境配置
安装依赖库:
python -m pip install torch==2.0.0
python -m pip install transformers==4.30.0
python -m pip install sentencepiece==0.1.97python -m pip install peft==0.10.0
python -m pip install gradio==3.50.0
python -m pip install bitsandbytes
python -m pip install accelerate
python -m pip install scipy
同时将把 gradio 里的gradio_demo.py下载到服务器
wget https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/inference/gradio_demo.py
mv gradio_demo.py ~/Workspace/Llama2/
部署
- 查看GPU状态
nvidia-smi
- 命令行启动
python gradio_demo.py --base_model Llama2-chat-13B-Chinese-50W --tokenizer_path Llama2-chat-13B-Chinese-50W --load_in_8bit --gpus 0