【记录一下】LMDeploy推理大模型学习笔记及遇到的问题

努力努力再努力呐

已于 2025-03-27 12:13:04 修改

阅读量950

点赞数 29

分类专栏： LMDeploy 文章标签：学习笔记 LMDeploy

于 2025-03-22 02:00:00 首次发布

本文链接：https://blog.youkuaiyun.com/lbp0123456/article/details/146428530

版权

LMDeploy 专栏收录该内容

3 篇文章

订阅专栏

LMDeploy 是一个用于大型语言模型（LLMs）和视觉-语言模型（VLMs）压缩、部署和服务的 Python 库。其核心推理引擎包括 TurboMind 引擎和 PyTorch 引擎。前者由 C++ 和 CUDA 开发，致力于推理性能的优化，而后者纯 Python 开发，旨在降低开发者的门槛。

LMDeploy 支持在 Linux 和 Windows 平台上部署 LLMs 和 VLMs，最低要求 CUDA 版本为 11.3。此外，它还与以下 NVIDIA GPU 兼容：

Volta(sm70): V100 Turing(sm75): 20 系列，T4 Ampere(sm80,sm86): 30 系列，A10, A16, A30, A100 Ada Lovelace(sm89): 40 系列

LMDeploy显存优化比vllm更好

nvitop  #查看显存占用

在一个干净的conda环境下（python3.8 - 3.12），安装 lmdeploy

一、安装

**linux环境目前不推荐使用3.12的版本**，但是windows环境不报错就很迷,但是windows环境安装的torch没有自带安装CUDA,因此启动时会报错，报错信息在下面

conda create -n lmdeploy python=3.12 -y
conda activate lmdeploy
pip install lmdeploy

二、报错1

因为在pip install lmdeploy时，下载Downloading fire-0.7.0.tar.gz报错，存在兼容性问题，这个版本的fire与12不兼容

(lmdeploy) root@dsw-942822-5c5dcbf687-85ktw:/mnt/workspace/Anaconda3/envs# pip install lmdeploy
Collecting lmdeploy
  Downloading lmdeploy-0.7.2.post1-cp312-cp312-manylinux2014_x86_64.whl.metadata (17 kB)
Collecting accelerate>=0.29.3 (from lmdeploy)
  Downloading accelerate-1.5.2-py3-none-any.whl.metadata (19 kB)
Collecting einops (from lmdeploy)
  Downloading einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting fastapi (from lmdeploy)
  Downloading fastapi-0.115.11-py3-none-any.whl.metadata (27 kB)
Collecting fire (from lmdeploy)
  Downloading fire-0.7.0.tar.gz (87 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [3 lines of output]
      /mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.12/site-packages/_distutils_hack/__init__.py:53: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
        warnings.warn(
      ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

推荐python3.8 - 3.11

conda create -n lmdeploy python=3.11 -y
conda activate lmdeploy
pip install lmdeploy

不在报错
在这里插入图片描述

三、启动

linux 下所下载的模型的绝对路径

lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct

四、报错2

启动过程中报错如下：

(lmdeploy) root@dsw-942822-5c5dcbf687-85ktw:/mnt/workspace/Anaconda3/envs# lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct
Traceback (most recent call last):
  File "/mnt/workspace/Anaconda3/envs/lmdeploy/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/entrypoint.py", line 14, in run
    SubCliServe.add_parsers()
  File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/serve.py", line 361, in add_parsers
    SubCliServe.add_parser_api_server()
  File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/serve.py", line 142, in add_parser_api_server
    ArgumentHelper.tool_call_parser(parser_group)
  File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/utils.py", line 375, in tool_call_parser
    from lmdeploy.serve.openai.tool_parser import ToolParserManager
  File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/tool_parser/__init__.py", line 2, in <module>
    from .internlm2_parser import Internlm2ToolParser
  File "/mnt/workspace/Anaconda3/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/tool_parser/internlm2_parser.py", line 6, in <module>
    import partial_json_parser
ModuleNotFoundError: No module named 'partial_json_parser'

原因：由于缺少 partial_json_parser 模块。这是 lmdeploy 的依赖项之一，但可能未自动安装。
您遇到的错误是由于缺少 partial_json_parser 模块。这是 lmdeploy 的依赖项之一，但可能未自动安装。以下是解决方案：

1. 安装缺失的依赖项

pip install partial-json-parser

2. 重新运行 `lmdeploy serve` 命令

lmdeploy serve api_server /mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct

再次启动不报错
在这里插入图片描述
openai没有安装的记得安装

pip install openai

五、代码测试（linux环境下）

#多轮对话
from openai import OpenAI

#定义多轮对话方法
def run_chat_session():
    #初始化客户端
    client = OpenAI(base_url="http://localhost:23333/v1/",api_key="123456")
    #初始化对话历史
    chat_history = []
    #启动对话循环
    while True:
        #获取用户输入
        user_input = input("用户：")
        if user_input.lower() == "exit":
            print("退出对话。")
            break
        #更新对话历史(添加用户输入)
        chat_history.append({"role":"user","content":user_input})
        #调用模型回答
        try:
            chat_complition = client.chat.completions.create(messages=chat_history,model="/mnt/workspace/llm/Qwen/Qwen2.5-0.5B-Instruct")
            #获取最新回答
            model_response = chat_complition.choices[0]
            print("AI:",model_response.message.content)
            #更新对话历史（添加AI模型的回复）
            chat_history.append({"role":"assistant","content":model_response.message.content})
        except Exception as e:
            print("发生错误：",e)
            break
if __name__ == '__main__':
    run_chat_session()

六、windows环境安装的torch没有自带安装CUDA,因此启动时会报错，报错信息在下面

(lmdeploy) PS C:\Users\fengxinzi> lmdeploy serve api_server "D:\Program Files\python\PycharmProjects\AiStudyProject\demo06\models\Qwen\Qwen2___5-0___5B-Instruct"
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\envs\lmdeploy\Scripts\lmdeploy.exe\__main__.py", line 7, in <module>
  File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\cli\entrypoint.py", line 39, in run
    args.run(args)
  File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\cli\serve.py", line 283, in api_server
    else get_max_batch_size(args.device)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\envs\lmdeploy\Lib\site-packages\lmdeploy\utils.py", line 338, in get_max_batch_size
    device_name = torch.cuda.get_device_name(0).lower()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 493, in get_device_name
    return get_device_properties(device).name
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 523, in get_device_properties
    _lazy_init()  # will define _get_device_properties
    ^^^^^^^^^^^^
  File "D:\envs\lmdeploy\Lib\site-packages\torch\cuda\__init__.py", line 310, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

conda list 查出来也表明没有cuda
在这里插入图片描述

遇到的错误表明PyTorch未正确启用CUDA支持。
因此我们需要安装cuda,版本至少11.8

# 以下二选一
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

安装好以后，再次启动，没有报错

lmdeploy serve api_server "D:\Program Files\python\PycharmProjects\AiStudyProject\demo06\models\Qwen\Qwen2___5-0___5B-Instruct"

在这里插入图片描述
如此就可以通过代码连接，跑起来了。