参考:
https://github.com/sgl-project/sglang
纯python写,号称比vllm、tensorRT还快
暂时支持模型
安装
可以pip、源码、docker安装,这里用的pip
注意flashinfer安装最新版,不然会可能出错误ImportError: cannot import name ‘top_k_top_p_sampling_from_probs’ from ‘flashinfer.sampling’
pip install --upgrade pip
pip install "sglang[all]"# Install FlashInfer CUDA kernels
pip install -U flashinfer