搭建 Qwen2-VL 接口

最新推荐文章于 2025-07-02 11:27:12 发布

weixin_44929001

最新推荐文章于 2025-07-02 11:27:12 发布

阅读量1.7k

点赞数 13

CC 4.0 BY-SA版权

分类专栏：人工智能文章标签：人工智能

本文链接：https://blog.youkuaiyun.com/weixin_44929001/article/details/145521318

搭建 Qwen2-VL 接口

Qwen2-VL 是一个多模态大模型，支持视觉和语言的理解与生成任务。它结合了视觉（Vision）和语言（Language）的能力，能够处理图像和文本的联合输入，并生成高质量的文本输出

1. 创建 `qwen-vl` 虚拟环境

使用 conda 创建一个名为 qwen-vl 的虚拟环境，并指定 Python 版本为 3.10。

conda create -n qwen-vl python=3.10

创建完成后，激活虚拟环境：

conda activate qwen-vl

2. 安装 PyTorch

安装 PyTorch 及其相关的库（torchvision 和 torchaudio），并指定 CUDA 11.8 版本：

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

注意事项：

确保你的系统上安装了与 PyTorch 兼容的 CUDA 版本（本例中为 CUDA 11.8）。
如果没有 GPU，可以省略 --index-url 参数，安装 CPU 版本的 PyTorch。

3. 安装 Python 依赖

安装项目所需的 Python 依赖包，使用清华大学的镜像源以加速下载：

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

`requirements.txt` 内容

qwen-vl-utils[decord]==0.0.8
modelscope
accelerate>=0.26.0
bitsandbytes==0.45.2
Flask==2.2.2
Werkzeug==2.2.2

安装 `transformers`

由于 transformers 安装较慢，可以下载压缩包并手动安装：

解压 transformers-main.zip：

unzip dist/transformers-main.zip -d dist/

安装解压后的 transformers：

pip install dist/transformers-main/ -i https://pypi.tuna.tsinghua.edu.cn/simple

4. 下载模型文件

使用 modelscope 下载 Qwen2.5-VL-7B-Instruct 模型文件，并将其缓存到当前目录：

modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct --cache_dir ./

注意事项：

下载的模型文件会存储在 --cache_dir 指定的目录中，确保路径正确。
如果下载速度较慢，可以尝试使用代理或更换网络环境。

5. Qwen-VL API 接口

`qwen-vl_app.py` 代码

以下是完整的 API 接口代码：

from datetime import datetime
import os
import torch
import gc
from flask import Flask, request, jsonify
from PIL import Image
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
from qwen_vl_utils import process_vision_info
from modelscope import snapshot_download

# 设置环境变量以避免内存碎片化
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# 初始化 Flask 应用
app = Flask(__name__)

# 清理未使用的缓存
torch.cuda.empty_cache()

# 使用已经微调的预训练模型
model_dir = "Qwen/Qwen2.5-VL-7B-Instruct"

# 配置 4-bit 量化
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4"