Xinference离线安装

原创已于 2025-11-21 23:54:34 修改 · 1.4k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#python #语言模型

于 2025-03-05 16:40:25 首次发布

部署运行你感兴趣的模型镜像

文章目录

1 官网
2 创建虚拟环境
3 安装
4 打包
5 在服务器上安装

1 官网

安装 — Xinference

2 创建虚拟环境

conda create -n xinference python=3.10 -y

3 安装

进入虚拟环境
```
conda activate xinference
```
安装xinference和Transformers引擎
```
pip install "xinference[transformers]"
```

安装xinference和vLLM引擎

pip install "xinference[vllm]"

# FlashInfer is optional but required for specific functionalities such as sliding window attention with Gemma 2.
# For CUDA 12.4 & torch 2.4 to support sliding window attention for gemma 2 and llama 3.1 style rope
pip install flashinfer -i https://flashinfer.ai/whl/cu124/torch2.4
# For other CUDA & torch versions, please check https://docs.flashinfer.ai/installation.html

安装xinference和SGLang引擎

pip install "xinference[sglang]"

# For CUDA 12.4 & torch 2.4 to support sliding window attention for gemma 2 and llama 3.1 style rope
pip install flashinfer -i https://flashinfer.ai/whl/cu124/torch2.4

升级transformers

pip install git+https://github.com/huggingface/transformers

4 打包

# 确保在虚拟环境xinference
# conda activate xinference

# 当前虚拟环境下安装打包工具
conda install -c conda-forge conda-pack
# 打包
conda pack -n xinference -o xinference.tar.gz

5 在服务器上安装

# 1 进入服务器的虚拟环境根目录envs
cd ~/miniconda/envs

# 2 将xinference.tar.gz到envs下，解压
tar -zxvf xinference.gz

# 3 验证
conda env list
conda activate xinference
python

您可能感兴趣的与本文相关的镜像