【模型篇】370-M8运行Qwen2.5-Omni-7B

部署运行你感兴趣的模型镜像


前言

在这里插入图片描述
OpenAI 发布 GPT-4V 与 Gemini 1.5-Pro,Google DeepMind 推出 Flamingo 和 Gemini 系列,多模态技术逐渐成为 AI 实用化的关键路径。但多模态大模型往往意味着参数暴增、推理开销剧增、无法实时响应等现实问题。如何构建“轻量化、全模态、强泛化、可部署”的 AI 系统。
在这样的背景下,阿里通义团队开源发布的 Qwen2.5-Omni-7B,无疑是一次重量级突破。它首次在一个 70 亿参数的中小模型上,实现了统一的文本、图像、音频乃至视频输入理解,以及高质量流式语音输出。


一、平台环境选择

镜像选择:pytorch:v24.10-torch2.4.0-torchmlu1.23.1-ubuntu22.04-py310
【请注意仔细查看自己的镜像版本,老版本改法,请查阅之前文章】

卡选择:任意一款MLU3系列及以上卡

二、模型下载

Qwen2.5-Omni-7B

apt install git-lfs -y
git-lfs clone https://www.modelscope.cn/Qwen/Qwen2.5-Omni-7B.git

三、环境安装

pip uninstall transformers
pip install git+https://githubfast.com/huggingface/transformers@f742a644ca32e65758c3adb36225aef1731bd2a8
pip install accelerate
pip install qwen-omni-utils[decord]

四、模型转换

因为开源模型是使用bfloat16精度训练的,所以在使用时我们需要做一下bfloat16转float16的操作
以下是转换代码

import torch
import os
import shutil
from safetensors.torch import load_file, save_file
from safetensors import safe_open
 
def convert_dataformat(bf16_file,fp16_file):
    if not os.path.exists(bf16_file):
        raise(f"{bf16_file} do not exist!")
 
    fp16_state_dict = {}
    with safe_open(bf16_file, framework="pt", device='cpu') as f:
        for k in f.keys():
            fp16_state_dict[k] = f.get_tensor(k).to(torch.float16)
    save_file(fp16_state_dict,fp16_file, metadata={"format": "pt"})
    print("convert successed!")
 
# --- main ---
# modify the bf16_dir and fp16_dir according to your own env.
bf16_dir="/workspace/volume/guojunceshi2/Qwen2.5-Omni-7B"#修改成自己的路径
fp16_dir="/workspace/volume/guojunceshi2/Qwen2.5-Omni-7B_fp16"
 
if not os.path.exists(fp16_dir):
    os.makedirs(fp16_dir)
 
for root,dirs,files in os.walk(bf16_dir):
    for item in files:       
        if item.endswith("safetensors"):
            bf16_file=os.path.join(root,item)
            fp16_file=os.path.join(fp16_dir,item)
            convert_dataformat(bf16_file, fp16_file)
        else:
            ori_file_path = root+"/"+item
            dst_file_path = fp16_dir+"/"+item
            shutil.copyfile(ori_file_path,dst_file_path)

五、代码修改

import torch###+
import torch_mlu###+
import torch_mlu.utils.gpu_migration###+
from transformers import Qwen2_5OmniModel, AutoProcessor
from qwen_omni_utils import process_mm_info

model_path = "/workspace/volume/guojunceshi2/Qwen2.5-Omni-7B_fp16"
# You can directly insert a local file path, a URL, or a base64-encoded image into the position where you want in the text.
messages = [
    # Image
    ## Local file path
    [{"role": "user", "content": [{"type": "image", "image": "/workspace/volume/guojunceshi2/qwen2omin/pupu.jpg"}, {"type": "text", "text": "Describe this image."}]}],

]

processor = AutoProcessor.from_pretrained(model_path)
model = Qwen2_5OmniModel.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto") ###+
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
audios, images, videos = process_mm_info(messages,use_audio_in_video=False)
inputs = processor(text=text, images=images, videos=videos, audios=audios, padding=True, return_tensors="pt")
print(inputs)
inputs = inputs.to("mlu") ###+
generated_ids, generate_wav = model.generate(**inputs)
print(generated_ids)

###+是基于源码添加的步骤

接着我们直接运行

(pytorch_infer) root@deepseek14-v1-5bbcd6f65c-2zx2c:/workspace/volume/guojunceshi2/qwen2omin# export MLU_VISIBLE_DEVICES=3 && python demo.py 
/torch/venv3/pytorch_infer/lib/python3.10/site-packages/torch_mlu/utils/model_transfer/__init__.py:2: FutureWarning: `torch_mlu.utils.model_transfer` is deprecated. Please use `torch_mlu.utils.gpu_migration` instead.
  warnings.warn("`torch_mlu.utils.model_transfer` is deprecated. "
Qwen2_5OmniToken2WavModel does not support eager attention implementation, fall back to sdpa
/torch/venv3/pytorch_infer/lib/python3.10/site-packages/torch_mlu/mlu/__init__.py:379: UserWarning: Linear memory is not supported on this device. Falling back to common memory. (Triggered internally at /torch_mlu/torch_mlu/csrc/framework/core/caching_allocator.cpp:718.)
  torch_mlu._MLUC._mlu_init()
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00,  1.41s/it]
/torch/venv3/pytorch_infer/lib/python3.10/site-packages/torch_mlu/utils/gpu_migration/migration.py:288: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return fn(*args, **kwargs)
[WARNING  | root               ]: System prompt modified, audio output may not work as expected. Audio output mode only works when using default system prompt 'You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.'
{'input_ids': tensor([[151644,   8948,    198,  ..., 151644,  77091,    198]]), 'attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1]]), 'pixel_values': tensor([[1.3026, 1.3026, 1.3026,  ..., 1.5344, 1.5344, 1.5344],
        [1.3026, 1.3026, 1.3026,  ..., 1.5487, 1.5487, 1.5487],
        [1.2734, 1.2734, 1.2734,  ..., 1.5487, 1.5487, 1.5487],
        ...,
        [0.7479, 0.7187, 0.7187,  ..., 0.3399, 0.3399, 0.3399],
        [0.7917, 0.7917, 0.7917,  ..., 0.4537, 0.4110, 0.4110],
        [0.7771, 0.7771, 0.7771,  ..., 0.3684, 0.3684, 0.3684]]), 'image_grid_thw': tensor([[  1, 128,  78]])}
[2025-03-31 11:20:54.539424][CNNL][WARNING][65583][Card:0]: [cnnlFill_v3] is deprecated and will be removed in the future release, Use [cnnlFill_v4] instead.
[2025-03-31 11:20:54.564059][CNNL][WARNING][65583][Card:0]: [cnnlMasked_v4] is deprecated and will be removed in the future release, Use [cnnlMasked_v5] instead.
/torch/venv3/pytorch_infer/lib/python3.10/site-packages/torch/_tensor.py:1061: UserWarning: Casting input of dtype int64 to int32, maybe overflow! (Triggered internally at /torch_mlu/torch_mlu/csrc/aten/utils/cnnl_util.cpp:128.)
  return torch.floor_divide(self, other)
[2025-03-31 11:20:54.628056][CNNL][WARNING][65583][Card:0]: [cnnlGetConvolutionForwardAlgorithm] is deprecated and will be removed in the future release. See cnnlFindConvolutionForwardAlgo() API for replacement.
Setting `pad_token_id` to `eos_token_id`:8292 for open-end generation.
/torch/venv3/pytorch_infer/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py:4479: UserWarning: The getTensorDesc function with on_chip_type parameter will be deprecated in CNNL v2.0  (Triggered internally at /torch_mlu/torch_mlu/csrc/aten/cnnl/cnnlTensorDesc.cpp:171.)
  time = time.repeat(batch)
tensor([[151644,   8948,    198,  ...,   3550,     13, 151645]],
       device='mlu:0')
/opt/py3.10/lib/python3.10/tempfile.py:860: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpu2rp4x2w'>
  _warnings.warn(warn_message, ResourceWarning)

运行完毕

您可能感兴趣的与本文相关的镜像

Qwen3-8B

Qwen3-8B

文本生成
Qwen3

Qwen3 是 Qwen 系列中的最新一代大型语言模型,提供了一整套密集型和专家混合(MoE)模型。基于广泛的训练,Qwen3 在推理、指令执行、代理能力和多语言支持方面取得了突破性进展

### 下载并部署 Qwen/Qwen2.5-Omni 模型的详细方法 #### 1. 使用 `modelscope` 下载模型 可以通过 `modelscope` 库中的 `snapshot_download` 函数下载 Qwen/Qwen2.5-Omni 模型。此函数允许用户指定模型存储路径和下载版本。例如,Qwen2.5-Omni-7B 的名称为 `Qwen/Qwen2.5-Omni-7B`,可以通过以下代码下载[^1]: ```python from modelscope.hub.snapshot_download import snapshot_download model_dir = snapshot_download( 'Qwen/Qwen2.5-Omni-7B', cache_dir='/path/to/your/cache', revision='master' ) ``` 在上述代码中,`cache_dir` 参数指定了模型存储路径,`revision='master'` 表示下载模型的最新版本。 --- #### 2. 安装依赖库 为了使用 Vllm 部署 Qwen/Qwen2.5-Omni-7B 模型,需要安装以下依赖库[^2]: - **Vllm**:用于高性能推理。 - **Transformers**:提供对 Hugging Face 模型的支持。 - **Accelerate**:优化多 GPU 和分布式训练。 可以通过以下命令安装这些库: ```bash pip install vllm transformers accelerate ``` 如果需要特定版本的 `transformers`,可以参考开源指南下载并安装特定版本的源码[^3]。例如: ```bash unzip transformers-f742a644ca32e65758c3adb36225aef1731bd2a8.zip cd transformers-f742a644ca32e65758c3adb36225aef1731bd2a8 pip install . ``` --- #### 3. 使用 Vllm 启动模型 完成依赖安装后,可以使用 Vllm 启动 Qwen/Qwen2.5-Omni-7B 模型。以下是启动模型的示例代码[^2]: ```python import vllm # 初始化模型 model = vllm.LLM(model_dir) # 创建推理任务 output = model.generate("你好,世界", max_tokens=100, temperature=0.7) # 输出结果 print(output) ``` 在上述代码中: - `model_dir` 是通过 `snapshot_download` 函数下载模型后生成的路径。 - `max_tokens` 和 `temperature` 是生成文本的超参数。 --- #### 4. 测试部署效果 为了验证模型是否成功部署,可以运行以下测试代码[^2]: ```python prompt = "请解释什么是人工智能?" response = model.generate(prompt, max_tokens=200, temperature=0.9) print(response) ``` 如果输出符合预期,则说明模型已成功部署。 --- ### 注意事项 - 确保 GPU 环境配置正确,Vllm 支持多 GPU 推理,需确保驱动和 CUDA 版本兼容。 - 如果模型较大,可能需要调整显存分配策略,可通过 `accelerate` 提供的工具优化内存使用。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值