视频演示
【xinference】(15):在compshare上,使用docker-compose运行xinference和chatgpt-web项目,配置成功!!!
1,安装docker方法:
#!/bin/shdistribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-docker2echo "install docker finish ."sudo curl -L "https://github.com/docker/compose/releases/download/v2.28.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod 755 /usr/local/bin/docker-compose echo "install docker-compose finish ."# 把当前用户加入到 docker 组;
sudo gpasswd -a $USER docker
# 更新docker组
newgrp docker
# 增加自动启动
sudo systemctl enable docker
sudo systemctl restart dockerecho "add docker user finish ."
2,然后就可以启动docker-compose了
version: '3.5'services:##################### 使用xinference部署大模型 ###################### docker 文档
# https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html#docker-image
# 使用qwen2大模型
# https://inference.readthedocs.io/zh-cn/latest/models/builtin/llm/qwen2-instruct.html
# 启动 7b模型
# xinference launch --model-engine vllm --model-name qwen2-instruct --size-in-billions 7 --model-format awq --quantization Int4xinf:restart: alwayscontainer_name: xinfimage: xprobe/xinference:latest# 使用 GPU 资源。deploy:resources:reservations:devices:- driver: "nvidia"count: "all"capabilities: ["gpu"]ports:- 9997:9997environment:- XINFERENCE_MODEL_SRC=modelscopevolumes:- ./xinf-cache/:/root/.cache# 命令启动 xinference entrypoint: xinference-local -H 0.0.0.0 --log-level debug## https://github.com/Chanzhaoyu/chatgpt-webapp:image: chenzhaoyu94/chatgpt-web container_name: appports:- 8188:3002environment:# choose oneOPENAI_API_KEY: sk-xxx# choose oneOPENAI_ACCESS_TOKEN: xxx# API interface address, optional, available when OPENAI_API_KEY is setOPENAI_API_BASE_URL: http://xinf:9997/v1# API model, optional, available when OPENAI_API_KEY is set, https://platform.openai.com/docs/models# gpt-4, gpt-4-turbo-preview, gpt-4-0125-preview, gpt-4-1106-preview, gpt-4-0314, gpt-4-0613, gpt-4-32k, gpt-4-32k-0314, gpt-4-32k-0613, gpt-3.5-turbo-16k, gpt-3.5-turbo-16k-0613, gpt-3.5-turbo, gpt-3.5-turbo-0301, gpt-3.5-turbo-0613, text-davinci-003, text-davinci-002, code-davinci-002OPENAI_API_MODEL: qwen2-instruct# reverse proxy, optionalAUTH_SECRET_KEY:# maximum number of requests per hour, optional, unlimited by defaultMAX_REQUEST_PER_HOUR: 0# timeout, unit milliseconds, optionalTIMEOUT_MS: 60000networks:default:name: xinf-network
然后需要登陆到xinf 启动模型:
xinference launch --model-engine vllm --model-name qwen2-instruct --size-in-billions 7 --model-format awq --quantization Int4
3,启动成功之后就访问了
效果还不错!