Langchain-Chatchat 部署实践

LangChain-Chatchat (原 Langchain-ChatGLM)，基于 ChatGLM 等大语言模型与 Langchain 等应用框架实现，开源、可离线部署的检索增强生成(RAG)大模型知识库项目。

一种利用 langchain 思想实现的基于本地知识库的问答应用，目标期望建立一套对中文场景与开源模型支持友好、可离线运行的知识库问答解决方案。

受 GanymedeNil 的项目 document.ai 和 AlexZhangji 创建的 ChatGLM-6B Pull Request 启发，建立了全流程可使用开源模型实现的本地知识库问答应用。本项目的最新版本中通过使用 FastChat 接入 Vicuna, Alpaca, LLaMA, Koala, RWKV 等模型，依托于 langchain 框架支持通过基于 FastAPI 提供的 API 调用服务，或使用基于 Streamlit 的 WebUI 进行操作。

依托于本项目支持的开源 LLM 与 Embedding 模型，本项目可实现全部使用开源模型离线私有部署。与此同时，本项目也支持 OpenAI GPT API 的调用，并将在后续持续扩充对各类模型及模型 API 的接入。

一、AutoDL 镜像部署

codewithgpu镜像及安装步骤

二、Docker 镜像部署

以下均以Ubuntu系统为例。

准备NVIDIA驱动及工具包

不需要在主机系统上安装 CUDA 工具包，但需要安装 NVIDIA Driver 以及 NVIDIA Container Toolkit。

检查是否安装NVIDIA Driver、NVIDIA Container Toolkit

nvidia-smi

如果安装了NVIDIA驱动程序，nvidia-smi 命令将显示GPU的状态信息，包括驱动程序版本、CUDA版本等。如果没有安装驱动程序，该命令会报错或者显示不是内部或外部命令。
另一个选项是使用lspci命令来查找NVIDIA GPU的信息：

lspci | grep -i nvidia

如果系统中存在NVIDIA GPU，并且安装了驱动程序，这个命令将输出有关NVIDIA GPU的信息。如果没有输出，说明可能没有安装NVIDIA驱动程序，或者GPU不是NVIDIA制造的。

检查是否安装NVIDIA Container Toolkit

可以使用以下命令：

docker volume ls -q -f driver=nvidia-docker | wc -l

如果安装了NVIDIA Container Toolkit，该命令会返回一个非零数值。如果没有安装，则不会返回任何输出。

安装NVIDIA Container Toolkit

Ubuntu系统使用Apt安装，其他系统请参考：NVIDIA官网

安装工具包

1、Configure the production repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Optionally, configure the repository to use experimental packages:

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

2、Update the packages list from the repository:

sudo apt-get update

3、Install the NVIDIA Container Toolkit packages:

sudo apt-get install -y nvidia-container-toolkit

配置工具包

配置Docker容器

Configure the container runtime by using the nvidia-ctk command:

sudo nvidia-ctk runtime configure --runtime=docker

The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.

Restart the Docker daemon:

sudo systemctl restart docker

Rootless mode
To configure the container runtime for Docker running in Rootless mode, follow these steps:

Configure the container runtime by using the nvidia-ctk command:

nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json

Restart the Rootless Docker daemon:

systemctl --user restart docker

Configure /etc/nvidia-container-runtime/config.toml by using the sudo nvidia-ctk command:

sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place

检查是否安装成功

在安装和配置工具包以及安装NVIDIA GPU驱动程序之后，可以通过运行一个样本工作负载来验证安装。

在Docker容器内跑一个样本工作负载：

Run a sample CUDA container:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Your output should resemble the following output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10    Driver Version: 535.86.10    CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

在运行样本工作负载时可能出现异常：

Error response from daemon: Get “https://registry-1.docker.io/v2

这是Docker pull拉取镜像报错，大概率是拉取镜像超时。

解决办法：
1、检查daemon.json 文件：

cat /etc/docker/daemon.json

2、若不配置镜像，拉取速度会很慢，因此就会报超时的错误。可以增加中科院的和阿里云的镜像：

{"registry-mirrors":["https://6kx4zyno.mirror.aliyuncs.com","https://docker.mirrors.ustc.edu.cn"]
}

3、重启服务

systemctl daemon-reload
systemctl restart docker

部署Docker镜像

Docker镜像

同时支持 DockerHub、阿里云、腾讯云镜像源：

docker run -d --gpus all -p 80:8501 isafetech/chatchat:0.2.10
docker run -d --gpus all -p 80:8501 ccr.ccs.tencentyun.com/chatchat/chatchat:0.2.10
docker run -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.10

1、该版本镜像大小 50.1GB，使用 v0.2.10，以 nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 为基础镜像；
2、该版本为正常版本，非轻量化版本；
3、该版本内置并默认启用一个 Embedding 模型：bge-large-zh-v1.5，内置并默认启用 ChatGLM3-6B；
4、该版本目标为方便一键部署使用，请确保您已经在 Linux 发行版上安装了 NVIDIA 驱动程序；
5、请注意，您不需要在主机系统上安装 CUDA 工具包，但需要安装 NVIDIA Driver 以及 NVIDIA Container Toolkit，请参考之前的步骤；
6、首次拉取和启动均需要一定时间，首次启动时请查看日志：