ktransformers 上的 DeepSeek-R1 671B open-webui 一、下载GGUF模型 1. 创建目录 2. 魔塔下载 DeepSeek-R1-Q4_K_M 3. 安装显卡驱动和cuda 4. 显卡 NVIDIA GeForce RTX 4090 二、安装ktransformers 1. 安装依赖 2. 安装uv工具链 3. 下载源码 4. 创建python虚拟环境 三、编译ktransformers 四、运行ktransformers 五、open-webui 集成 1. 安装 open-webui 2. 运行脚本go.sh 3. 启动过程很慢,监听端口 3000 运行后可以访问web界面 参考
一、下载GGUF模型
1. 创建目录
mkdir DeepSeek-R1-GGUF
2. 魔塔下载 DeepSeek-R1-Q4_K_M
https://www.modelscope.cn/models/unsloth/DeepSeek-R1-GGUF
3. 安装显卡驱动和cuda
wget https://developer.download.nvidia.com/compute/cuda/12.6.0/local_installers/cuda_12.6.0_560.28.03_linux.run
sudo sh cuda_12.6.0_560.28.03_linux.run
4. 显卡 NVIDIA GeForce RTX 4090
NVIDIA-SMI 560.35 .05
CUDA Version: 12.6
二、安装ktransformers
1. 安装依赖
sudo apt-get install git
2. 安装uv工具链
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME /.local/bin/env
which uv
which uvx
3. 下载源码
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git checkout 94ab2de
git rev-parse --short HEAD
4. 创建python虚拟环境
Using CPython 3.11 .11
Creating virtual environment at: ./venv
Activate with: source venv/bin/activate
三、编译ktransformers
apt-get install build-essential cmake
source venv/bin/activate
uv pip install -r requirements-local_chat.txt
uv pip install setuptools wheel packaging
Using Python 3.11 .11 environment at:
Resolved 3 packages in 454ms
Prepared 1 package in 133ms
░░░░░░░░░░░░░░░░░░░░ [ 0 /2] Installing wheels.. . warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.If the cache and target directories are on different filesystems, hardlinking may not be supported.If this is intentional, set ` export UV_LINK_MODE = copy` or use ` --link-mode= copy` to suppress this warning.
Installed 2 packages in 102ms+ setuptools == 75.8 .0+ wheel == 0.45 .1
Using Python 3.11 .11 environment at:
Audited 1 package in 2ms
系统的物理CPU核心数量
export MAX_JOBS = 72
export CMAKE_BUILD_PARALLEL_LEVEL = 72
uv pip install flash_attn --no-build-isolation
export UV_LINK_MODE = copy
uv pip install flash_attn --no-build-isolation
export USE_NUMA = 1
git submodule init
git submodule update
KTRANSFORMERS_FORCE_BUILD = TRUE uv pip install . --no-build-isolation
四、运行ktransformers
PYTORCH_CUDA_ALLOC_CONF = expandable_segments:True python3 ktransformers/server/main.py \ --gguf_path /DeepSeek-R1-GGUF/DeepSeek-R1-Q4_K_M/ \ --model_path deepseek-ai/DeepSeek-R1 \ --model_name unsloth/DeepSeek-R1-GGUF \ --cpu_infer 16 \ --max_new_tokens 8192 \ --cache_lens 32768 \ --total_context 32768 \ --cache_q4 true \ --temperature 0.6 \ --top_p 0.95 \ --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml \ --force_think \ --use_cuda_graph \ --host 0.0 .0.0 \ --port 8080
1. 后台运行
nohup env PYTORCH_CUDA_ALLOC_CONF = expandable_segments:True python3 ktransformers/server/main.py \ --gguf_path /DeepSeek-R1-GGUF/DeepSeek-R1-Q4_K_M/ \ --model_path deepseek-ai/DeepSeek-R1 \ --model_name unsloth/DeepSeek-R1-GGUF \ --cpu_infer 16 \ --max_new_tokens 8192 \ --cache_lens 32768 \ --total_context 32768 \ --cache_q4 true \ --temperature 0.6 \ --top_p 0.95 \ --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml \ --force_think \ --use_cuda_graph \ --host 0.0 .0.0 \ --port 8080 >> server.log 2 >&1 &
2. API 测试
curl http://IP:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "DeepSeek-R1","messages": [{"role": "user", "content": "你是谁!"}]}'
五、open-webui 集成
1. 安装 open-webui
mkdir open-webui
cd open-webui
uv venv ./venv --python 3.11
source venv/bin/activate
uv pip install open-webui
2. 运行脚本go.sh
#!/usr/bin/env bash source venv/bin/activate
export DATA_DIR = "$( pwd ) /data"
export ENABLE_OLLAMA_API = False
export ENABLE_OPENAI_API = True
export OPENAI_API_KEY = "dont_change_this_cuz_openai_is_the_mcdonalds_of_ai"
export OPENAI_API_BASE_URL = "http://IP:8080/v1"
export WEBUI_AUTH = False
export DEFAULT_USER_ROLE = "admin"
export HOST = 0.0 .0.0
export PORT = 3000 open-webui serve \ --host $HOST \ --port $PORT
3. 启动过程很慢,监听端口 3000 运行后可以访问web界面
netstat -netlp
参考
ktransformers 上的 DeepSeek-R1 671B ktransformers