运行通义千问2.5-VL-32B-Instruct（Qwen/Qwen2.5-VL-32B-Instruct）时遇到的问题及解决方法合集

最新推荐文章于 2025-06-24 14:28:27 发布

yyywxk

最新推荐文章于 2025-06-24 14:28:27 发布

阅读量3.1k

点赞数 22

CC 4.0 BY-SA版权

分类专栏：软硬件使用安装调试文章标签： qwen LLM

本文链接：https://blog.youkuaiyun.com/yyywxk/article/details/147217312

软硬件使用安装调试专栏收录该内容

1 篇文章

订阅专栏

背景：

笔者在本地部署通义千问2.5-VL-32B-Instruct（Qwen/Qwen2.5-VL-32B-Instruct）时遭遇了一系列问题，特此记录。

模型使用和微调参考：Qwen2.5-VL-32B: 更聪明、更轻量!

问题

1. ValueError: ‘qwen2_5_vl’

遇到以下问题：

ValueError: The checkpoint you are trying to load has model type qwen2_5_vl but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

该错误是由于 Transformers 库版本过旧，尚未支持 qwen2_5_vl 架构。你可以通过以下命令将 Transformers 库更新到最新版本（截至20250414时最新版本为 4.51.2）：

pip install --upgrade transformers

2. ImportError: ‘Qwen2_5_VLForConditionalGeneration’

遇到以下问题：

ImportError: cannot import name ‘Qwen2_5_VLForConditionalGeneration’ from ‘modelscope’

Qwen2_5_VLForConditionalGeneration 是 modelscope 库较新版本才支持的功能。你可以通过以下命令将 modelscope 库更新到最新版本（截至20250414时最新版本为 1.25.0）：

pip install --upgrade modelscope

3. ImportError: FlashAttention2

遇到以下问题：

ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

这是因为启用了 FlashAttention2 功能，但系统中并未安装 flash_attn 包，因此无法使用该功能。

解决方法是：

不节省内存和加速了，还是用官方给的导入模型代码，即将

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)

改为

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct", torch_dtype="auto", device_map="auto"
)

安装 flash_attn 包

pip install flash-attn --no-build-isolation

4. cutlassF: no kernel found to launch!

遇到以下问题：

RuntimeError: cutlassF: no kernel found to launch!

cutlass 是基于 CUDA 实现的高性能矩阵乘法库，若 CUDA 版本与所使用的深度学习框架、flash_attn 等库不兼容，就可能找不到合适的内核。

查阅相关博客后，解决方法是在代码的开头加入：

import torch
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

参考链接：

运行 通义千问2.5-VL-32B-Instruct（Qwen/Qwen2.5-VL-32B-Instruct）时遇到的问题及解决方法合集

目录

背景：

问题

1. ValueError: ‘qwen2_5_vl’

2. ImportError: ‘Qwen2_5_VLForConditionalGeneration’

3. ImportError: FlashAttention2

4. cutlassF: no kernel found to launch!

运行通义千问2.5-VL-32B-Instruct（Qwen/Qwen2.5-VL-32B-Instruct）时遇到的问题及解决方法合集