LMDeploy 是一个用于大型语言模型(LLMs)和视觉-语言模型(VLMs)压缩、部署和服务的 Python 库。 其核心推理引擎包括 TurboMind 引擎和 PyTorch 引擎。前者由 C++ 和 CUDA 开发,致力于推理性能的优化,而后者纯 Python 开发,旨在降低开发者的门槛。
LMDeploy 支持在 Linux 和 Windows 平台上部署 LLMs 和 VLMs,最低要求 CUDA 版本为 11.3。此外,它还与以下 NVIDIA GPU 兼容:
Volta(sm70): V100 Turing(sm75): 20 系列,T4 Ampere(sm80,sm86): 30 系列,A10, A16, A30, A100 Ada Lovelace(sm89): 40 系列
LMDeploy显存优化比vllm更好
nvitop #查看显存占用
在一个干净的conda环境下(python3.8 - 3.12),安装 lmdeploy
一、安装
**linux环境目前不推荐使用3.12的版本**,但是windows环境不报错 就很迷,但是windows环境安装的torch没有自带安装CUDA,因此启动时会报错,报错信息在下面
conda create -n lmdeploy python=3.12 -y
conda activate lmdeploy
pip install lmdeploy
二、报错1
因为在pip install lmdeploy时,下载Downloading fire-0.7.0.tar.gz报错,存在兼容性问题,这个版本的fire与12不兼容
(lmdeploy) root@dsw-942822-5c5dcbf687-85ktw:/mnt/workspace/Anaconda3/envs# pip install lmdeploy
Collecting lmdeploy
Downloading lmdeploy-0.7.2.post1-cp312-cp312-manylinux2014_x86_64.whl.metadata (17 kB)
Collecting accelerate>=0.29.3 (from lmdeploy)
Downloading accelerate-1.5.2-py3-none-any.whl.metadata (19 kB)
Collecting einops (from lmdeploy)
Downloading einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting fastapi (from lmdeploy)
Downloading fastapi-0.115.11-py3-none-any.whl.metadata (27 kB)
Collecting fire (from lmdeploy)
Downloading fire-0.7.0.tar.gz (87 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [3 lines of output]
/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.12/site-packages/_distutils_hack/__init__.py:53: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(
ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
推荐python3.8 - 3.11
conda create -n lmdeploy python=3.11 -y
conda activate lmdeploy
pip install lmdeploy
不在报错
三、启动
linux 下所下载的模型的绝对路径
lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct
四、报错2
启动过程中报错如下:
(lmdeploy) root@dsw-942822-5c5dcbf687-85ktw:/mnt/workspace/Anaconda3/envs# lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct
Traceback (most recent call last):
File "/mnt/workspace/Anaconda3/envs/lmdeploy/bin/lmdeploy", line 8, in <module>
sys.exit(run())
^^^^^
File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/entrypoint.py", line 14, in run
SubCliServe.add_parsers()
File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/serve.py", line 361, in add_parsers
SubCliServe.add_parser_api_server()
File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/serve.py", line 142, in add_parser_api_server
ArgumentHelper.tool_call_parser(parser_group)
File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/utils.py", line 375, in tool_call_parser
from lmdeploy.serve.openai.tool_parser import ToolParserManager
File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/tool_parser/__init__.py", line 2, in <module>
from .internlm2_parser import Internlm2ToolParser
File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/tool_parser/internlm2_parser.py", line 6, in <module>
import partial_json_parser
ModuleNotFoundError: No module named 'partial_json_parser'
原因:由于缺少 partial_json_parser 模块。这是 lmdeploy 的依赖项之一,但可能未自动安装。
您遇到的错误是由于缺少 partial_json_parser
模块。这是 lmdeploy
的依赖项之一,但可能未自动安装。以下是解决方案:
1. 安装缺失的依赖项
pip install partial-json-parser
2. 重新运行 lmdeploy serve
命令
lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct
再次启动不报错
openai没有安装的记得安装
pip install openai
五、代码测试(linux环境下)
#多轮对话
from openai import OpenAI
#定义多轮对话方法
def run_chat_session():
#初始化客户端
client = OpenAI(base_url="http://localhost:23333/v1/",api_key="123456")
#初始化对话历史
chat_history = []
#启动对话循环
while True:
#获取用户输入
user_input = input("用户:")
if user_input.lower() == "exit":
print("退出对话。")
break
#更新对话历史(添加用户输入)
chat_history.append({"role":"user","content":user_input})
#调用模型回答
try:
chat_complition = client.chat.completions.create(messages=chat_history,model="/mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct")
#获取最新回答
model_response = chat_complition.choices[0]
print("AI:",model_response.message.content)
#更新对话历史(添加AI模型的回复)
chat_history.append({"role":"assistant","content":model_response.message.content})
except Exception as e:
print("发生错误:",e)
break
if __name__ == '__main__':
run_chat_session()
六、windows环境安装的torch没有自带安装CUDA,因此启动时会报错,报错信息在下面
(lmdeploy) PS C:\Users\fengxinzi> lmdeploy serve api_server "D:\Program Files\python\PycharmProjects\AiStudyProject\demo06\models\Qwen\Qwen2___5-0___5B-Instruct"
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "D:\envs\lmdeploy\Scripts\lmdeploy.exe\__main__.py", line 7, in <module>
File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\cli\entrypoint.py", line 39, in run
args.run(args)
File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\cli\serve.py", line 283, in api_server
else get_max_batch_size(args.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\utils.py", line 338, in get_max_batch_size
device_name = torch.cuda.get_device_name(0).lower()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 493, in get_device_name
return get_device_properties(device).name
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 523, in get_device_properties
_lazy_init() # will define _get_device_properties
^^^^^^^^^^^^
File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 310, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
conda list 查出来也表明 没有cuda
遇到的错误表明PyTorch未正确启用CUDA支持。
因此我们需要安装cuda,版本至少11.8
# 以下二选一
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
安装好以后,再次启动,没有报错
lmdeploy serve api_server "D:\Program Files\python\PycharmProjects\AiStudyProject\demo06\models\Qwen\Qwen2___5-0___5B-Instruct"
如此就可以通过代码连接,跑起来了。