目录
背景:
笔者在本地部署 通义千问2.5-VL-32B-Instruct(Qwen/Qwen2.5-VL-32B-Instruct) 时遭遇了一系列问题,特此记录。
模型使用和微调参考:Qwen2.5-VL-32B: 更聪明、更轻量!
问题
1. ValueError: ‘qwen2_5_vl’
遇到以下问题:
ValueError: The checkpoint you are trying to load has model type
qwen2_5_vl
but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
该错误是由于 Transformers 库版本过旧,尚未支持 qwen2_5_vl
架构。你可以通过以下命令将 Transformers 库更新到最新版本(截至20250414时最新版本为 4.51.2):
pip install --upgrade transformers
2. ImportError: ‘Qwen2_5_VLForConditionalGeneration’
遇到以下问题:
ImportError: cannot import name ‘Qwen2_5_VLForConditionalGeneration’ from ‘modelscope’
Qwen2_5_VLForConditionalGeneration
是 modelscope 库较新版本才支持的功能。你可以通过以下命令将 modelscope 库更新到最新版本(截至20250414时最新版本为 1.25.0):
pip install --upgrade modelscope
3. ImportError: FlashAttention2
遇到以下问题:
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
这是因为启用了 FlashAttention2 功能,但系统中并未安装 flash_attn
包,因此无法使用该功能。
解决方法是:
- 不节省内存和加速了,还是用官方给的导入模型代码,即将
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-32B-Instruct",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
改为
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-32B-Instruct", torch_dtype="auto", device_map="auto"
)
- 安装
flash_attn
包
pip install flash-attn --no-build-isolation
4. cutlassF: no kernel found to launch!
遇到以下问题:
RuntimeError: cutlassF: no kernel found to launch!
cutlass
是基于 CUDA 实现的高性能矩阵乘法库,若 CUDA 版本与所使用的深度学习框架、flash_attn 等库不兼容,就可能找不到合适的内核。
查阅相关博客后,解决方法是在代码的开头加入:
import torch
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)
参考链接: