万字干货分享！基于PaddleMIX实现InternVL2多模态模型推理

本文主要介绍如何基于 PaddleMIX 实现 InternVL2 多模态模型推理，转载自飞桨PaddlePaddle公众号。

InternVL 开源链接：

https://github.com/OpenGVLab/InternVL

作者介绍：

企鹅火烈鸟：北京邮电大学研二、书生大模型开源社区核心贡献者、飞桨核心框架贡献者俱乐部成员、百度飞桨(厦门)人工智能产业赋能中心核心开发者

散步：书生大模型开源社区核心贡献者、飞桨开发者技术专家、飞桨核心框架贡献者俱乐部成员、百度飞桨(厦门)人工智能产业赋能中心特聘专家

PaddleMIX 介绍

PaddleMIX 是基于飞桨框架的多模态大模型开发套件,包含常见的 SOTA 多模态模型训练、推理组件。随着多模态在实际场景理解与逻辑推理能力的逐渐提高，已逐渐具备产业实践落地的潜力。结合百度飞桨（厦门）人工智能产业赋能中心的实际落地需求，我们决定一起在 PaddleMIX 套件上支持 InternVL2 多模态模型的推理工作，将业内先进的多模态模型带给飞桨生态开发者，提升开发者在不同落地场景下对于视觉感知、图文检索、图生文和多模态对话的实际体验效果。

基于 PaddleMIX 套件实现 InternVL2 多模态模型推理

在进行 PaddleMIX 模型的新增工作时，可遵循以下三个关键步骤：

1. 权重转换：将其他框架的模型推理权重转换为飞桨框架的推理模型权重，这一步能确保模型在飞桨环境中能够被正常解析加载和使用，过程中需要注意到特定数据结构的规则对齐与正确转换。

2. 模型组网转换：利用飞桨框架提供的 Paconvert 工具进行模型组网的转换。除此步骤外，还需要对模型的非常见API以及相关实现进行检查，以保证模型结构完整可推理；在 InternVL2 的合入工作中，主要难点在于预处理以及 transformers 接口部分的完整迁移推理。

3. 训练与推理精度对齐：在成功进行训练或推理前，需要对 PaddleMIX 套件中的模型结果进行精度对齐，以确保模型的推理结果与预期相符；对于可训练模型，我们则需要将对齐模型训练时 loss 下降的精度，以保证模型结果一致性。

权重转换

要想支持任意其他框架模型在飞桨框架上的新增工作，都需要先进行基础权重的适配转换。首先在 huggingface 上获取 InternVL-8B 的 safetensors 基础权重，合并整合为bin的单一文件格式，最后进入到进行飞桨框架权重的正式转换流程。值得注意的是， huggingface 上的多数模型存储为 bfloat16 的数据格式，但由于权重转化需要经过 numpy 的中间过程，考虑到 numpy 对bfloat16 的支持有限，在权重数据转换时需要对 bfloat16 数据格式权重进行额外转换处理：

is_bf16 = False

if value.dtype == torch.bfloat16:
    is_bf16 = True

if is_bf16 is True:
    # paddle 权重转换回 bf16
    print("is_bf16")
    paddle_tensor = paddle.to_tensor(tensor.cpu().numpy())
    paddle_tensor = paddle_tensor.astype(paddle.bfloat16)

在权重转换至飞桨框架权重的过程中，由于不同深度学习框架在线性层权重矩阵的行列顺序上存在差异，因此在权重的转换过程中，我们仍需要对线性层权重矩阵进行额外的转置操作，以保证最后结果对齐：

if ('linear' in torch_key) or ('proj' in torch_key) or ('vocab' in torch_key and 'weight' in torch_key) \
   or ("attn.qkv" in torch_key) or ("mlp.fc1" in torch_key) or ("mlp.fc2" in torch_key) \
   or ("feed_forward.w1" in torch_key) or ("feed_forward.w2" in torch_key) \
   or ("feed_forward.w3" in torch_key) or ("language_model.output.weight" in torch_key):

    if tensor.ndim >= 2:
        tensor = tensor.transpose(0, 1)

对于卷积层，需要额外对卷积形状进行转置处理，已适配飞桨框架对应数据格式进行推理：

elif 'conv' in torch_key and 'weight' in torch_key:
    # 对于卷积层，需要转置 weight 张量的形状
    if tensor.ndim == 4:  # 2D 卷积
        tensor = tensor.transpose(2, 3, 1, 0)
    elif tensor.ndim == 5:  # 3D 卷积
        tensor = tensor.transpose(2, 3, 4, 1, 0)

完整处理代码如下所示，运行后可以得到初始推理所需的 model_state.pdparams 模型文件。

for torch_key, value in pytorch_state_dict.items():
    is_bf16 = False
    paddle_key = torch_key
    tensor = value.clone().detach()
    
    if value.dtype == torch.bfloat16:
        is_bf16 = True
        tensor = tensor.to(torch.float32)
    
    if ('linear' in torch_key) or ('proj' in torch_key) or ('vocab' in torch_key and 'weight' in torch_key) \
       or ("attn.qkv" in torch_key) or ("mlp.fc1" in torch_key) or ("mlp.fc2" in torch_key) \
       or ("feed_forward.w1" in torch_key) or ("feed_forward.w2" in torch_key) \
       or ("feed_forward.w3" in torch_key) or ("language_model.output.weight" in torch_key):
        if tensor.ndim >= 2:
            tensor = tensor.transpose(0, 1)
    
    elif 'conv' in torch_key and 'weight' in torch_key:
        # 对于卷积层，需要转置 weight 张量的形状
        if tensor.ndim == 4:  # 2D 卷积
            tensor = tensor.transpose(2, 3, 1, 0)
        elif tensor.ndim == 5:  # 3D 卷积
            tensor = tensor.transpose(2, 3, 4, 1, 0)
    
    elif 'attention' in torch_key and 'weight' in torch_key:
        if tensor.ndim >= 2:
            tensor = tensor.transpose(0, 1)
    
    if is_bf16 is True:
        # 转换回 bf16
        print("is_bf16")
        paddle_tensor = paddle.to_tensor(tensor.cpu().numpy())
        paddle_tensor = paddle_tensor.astype(paddle.bfloat16)
    else:
        paddle_tensor = paddle.to_tensor(tensor.cpu().numpy())
    
    paddle_state_dict[paddle_key] = paddle_tensor

模型组网

在获取模型文件之后，可直接进入到正式的模型组网流程。我们推荐使用飞桨开发团队推出的一栈式组网转换工具—— Paconvert 进行模型组网的快速转换。Paconvert 能够自动将其深度学习框架训练或推理的代码，转换为飞桨框架的代码，方便快速自动地模型代码迁移。其中代码行数平均转换率约为90+%，Llama 等大语言模型推理转换率可达100%。利用官方转换工具，我们能够在飞桨框架中顺利进行模型迁移与组网工作。

Paconvert 转换时会尽量保持原代码的风格与结构，这就方便我们进行模型的组网工作，只需要一行 API，我们就可以很容易得到完整的文件：

python paconvert/main.py --in_dir torch_project [--out_dir paddle_project] [--exclude_dirs exclude_dirs] [--log_dir log_dir] [--log_level "INFO"] [--run_check 1]

转换完成后，工具将会打印转换总结，包含总 API 数、成功转换 API 数、不支持转换 API 数、转换率。

对于成功转换的 API ：代码风格会略有变化，会补全 API 全名、补全参数关键字、移除注释、移除多余空行。因为代码在扫描识别的过程中，注释、空行等无法识别，将被移除。

对于不支持转换的 API ：将补全为 API 全名，同时在行前通过>>>>>>的形式加以标记，用户需要对该API进行人工手动转换，然后删除>>>>>>标记。

===========================================  
PyTorch to Paddle Convert Start ------>:  
===========================================  
Start convert file: /workspace/PaConvert1/test_code.py --> /workspace/PaConvert1/paddle_project/test_code.py  
[test_code.py:1] remove 'import torch'  
[test_code.py:2] remove 'import torch.nn as nn'  
[test_code.py:3] remove 'import torch.optim as optim'  
[test_code.py:4] remove 'import torch.nn.functional as F'  
[test_code.py:5] remove 'from torch.nn import Linear'  
[test_code.py:6] remove 'import mmcv'  
[test_code.py] add 'import paddle' in line 1  
[test_code.py:1] [Success] Convert torch.nn.Module to Paddle  
[test_code.py:13] [Not Support] convert mmcv.cnn.ConvModule to Paddle is not supported currently  
[test_code.py:14] [Success] Convert torch.nn.MaxPool2d to Paddle  
[test_code.py:16] [Success] Convert torch.nn.Linear to Paddle  
[test_code.py:17] [Success] Convert torch.nn.Linear to Paddle  
[test_code.py:18] [Success] Convert torch.nn.Linear to Paddle  
[test_code.py:24] [Success] Convert torch.flatten to Paddle  
[test_code.py:27] [Success] Convert torch.add to Paddle  
[test_code.py:31] [Success] Convert Class Method: torch.nn.Module.parameters to Paddle  
[test_code.py:31] [Success] Convert torch.optim.SGD to Paddle  
[test_code.py:32] [Success] Convert torch.optim.lr_scheduler.MultiStepLR to Paddle  
[test_code.py:35] [Success] Convert torch.rand to Paddle  
[test_code.py:36] [Success] Convert Class Method: torch.Tensor.sum to Paddle  
[test_code.py:38] [Success] Convert Class Method: torch.nn.Module.zero_grad to Paddle  
[test_code.py:39] [Success] Convert Class Method: torch.Tensor.backward to Paddle  
[test_code.py:40] [Success] Convert Class Method: torch.optim.Optimizer.step to Paddle, just remain the same  Finish convert /workspace/PaConvert1/test_code.py --> /workspace/PaConvert1/paddle_project/test_code.py      

===========================================  
Convert Summary:  
===========================================  
There are 16 Pytorch APIs in this Project:   
15  Pytorch APIs have been converted to Paddle successfully!   
1  Pytorch APIs are not supported to convert to Paddle currently!   
Convert Rate is: 93.750%    

For these 1 Pytorch APIs that currently do not support to convert, which have been marked by >>> before the line,  
please refer to [https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/model_convert/convert_from_pytorch/pytorch_api_mapping_cn.html]  
and convert it by yourself manually. In addition, these APIs will be supported in future.    
Thank you to use Paddle Code Convert Tool. You can make any suggestions  to us by submitting issues to [https://github.com/PaddlePaddle/PaConvert].    
****************************************************************  
______      _____                          _ 
 | ___ \    / ____|                        | | 
 | |_/ /_ _| |     ___  _ ____   _____ _ __| |_ 
 |  __/ _  | |    / _ \| \_ \ \ / / _ \ \__| __| 
 | | | (_| | |___| (_) | | | \ V /  __/ |  | |_ 
 \_|  \__,_|\_____\___/|_| |_|\_/ \___|_|   \__| 
***************************************************************  ¸

通过 Paconvert 在 API 层面快速转换映射，可以很容易得到模型视觉编码器paddlemix/models/internvl2/internvl_chat/modeling_intern_vit.py、文本编码器 paddlemix/models/internvl2/internlm2/modeling_internlm2.pypaddlemix/models/internvl2/phi3/modeling_phi3.py 等模型的对应飞桨框架实现。其中核心组网模块例如 InternVisionModel InternLM2ForCausalLM Phi3ForCausalLM；

# Modified from transformers.model.llama.modeling_llama.LlamaForCausalLM
class InternLM2ForCausalLM(InternLM2PretrainedModel):
    _auto_class = 'AutoModelForCausalLM'
    
    _tied_weights_keys = ['output.weight']
    
    def __init__(self, config):
        super().__init__(config)
        self.model = InternLM2Model(config)
        self.vocab_size = config.vocab_size
        self.output = nn.Linear(config.hidden_size, config.vocab_size, bias_attr=False)
    
    # ...... 后续省略

对于语言模块的子核心模块组网涉及到具体不同 DecoderLayer ，在组网过程中需要详细检查各子模块是否能够进行单元测试和正常推理，替换部分无法使用 Paconvert 转换的对应 API 至飞桨框架3.0最新 API 。

class InternLM2DecoderLayer(nn.Layer):
    def __init__(self, config: InternLM2Config):
        super().__init__()
        self.hidden_size = config.hidden_size
        
        self.attention = INTERNLM2_ATTENTION_CLASSES[config.attn_implementation](config=config)
        self.feed_forward = InternLM2MLP(config)
        self.attention_norm = InternLM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
        self.ffn_norm = InternLM2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)

    def forward(
        self,
        hidden_states: paddle.Tensor,
        attention_mask: Optional[paddle.Tensor] = None,
        position_ids: Optional[paddle.Tensor] = None,
        past_key_value: Optional[Tuple[paddle.Tensor]] = None,
        output_attentions: Optional[bool] = False,
        use_cache: Optional[bool] = False,
        **kwargs
    ) -> Tuple[paddle.Tensor, Optional[Tuple[paddle.Tensor, paddle.Tensor]]]:
        """
        Args:
            hidden_states (`paddle.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
            attention_mask (`paddle.Tensor`, *optional*):
                attention mask of size `(batch_size, sequence_length)` if flash attention is used or `(batch_size, 1, 
                query_sequence_length, key_sequence_length)` if default attention is used.
            output_attentions (`bool`, *optional*):
                Whether or not to return the attentions tensors of all attention layers. See `attentions` under
                returned tensors for more detail.
            use_cache (`bool`, *optional*):
                If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
                (see `past_key_values`).
            past_key_value (`Tuple(paddle.Tensor)`, *optional*): cached past key and value projection states
        """

除了飞桨框架集层面的组网对齐，还需要手动修改适配 PaddleNLP 相关的 paddlenlp.transformers 模块API，支持对应配置文件以及模型相关工具链、flash-attention实现的正常调用。PaddleNLP 3.0 升级总计涵盖了 80+ 业界主流的开源大语言模型，覆盖了大模型推理所需的大部分 transformers api 支持，只需要在前置加上 PaddleNLP 即可一键替换原 API ，即可获得完整支持。

from paddlenlp.transformers import LlamaForCausalLM
from paddlenlp.transformers.model_outputs import CausalLMOutputWithPast
from paddlenlp.transformers.model_utils import PretrainedModel

对于形状转换部分，需要注意飞桨框架的 Shape 默认使用list进行表达，而不是元组的形式，有关形状推导部分需要详细检查：

if attn_output.shape != [bsz, self.num_heads, q_len, self.head_dim]:
    raise ValueError(
        f'`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is '
        f'{attn_output.shape}'
    )

除了基础的飞桨框架所支持的 paddle.nn.functional.flash_attention 注意力实现替换，对于该模型组网还需额外 flash-attention 组件的支持 paddlemix/models/internvl2/bert_padding.py ，该文件主要实现了一种高效的序列填充 (padding) 和解填充 (unpadding) 方法，提高训练效率。

为了快速验证整体组网结果，可显式指定 attn_implementation='eager'，该命令将使用最基础 Attention 模式进行推理，便于快速组网后跑通验证。

除了主要组网流程，为了支持 PaddleMIX 的 auto 模式推理，还需要在对应涉及注册表文件下注册新增InternVL2模型：

paddlemix/auto/modeling.py paddlemix/auto/processing.pypaddlemix/auto/tokenizer_mapping.yamlpaddlemix/trainer/trainer.py

以及在最后增加对应代码的单元测试覆盖，确保效果可复现与稳定性tests/models/test_internvl2.py。

def create_and_check_model(self, pixel_values):
    model = InternVLChatModel(config=self.get_config())
    model.eval()
    
    generation_config = dict(max_new_tokens=1024, do_sample=False)
    
    with paddle.no_grad():
        result = model.chat(
            tokenizer=self.tokenizer,
            pixel_values=pixel_values,
            question='Who are you?',
            generation_config=generation_config,
        )
    
    self.parent.assertIsNotNone(result)

最后，只需综合配置多种文本编码视觉编码器调用入口，即可快速切换不同配置情况下的 InternVL2 模型推理实现。

class InternVLChatModel(MixPretrainedModel):
    config_class = InternVLChatConfig
    main_input_name = 'pixel_values'
    _no_split_modules = ['InternVisionModel', 'LlamaDecoderLayer', 'InternLM2DecoderLayer',
                         'Phi3DecoderLayer', 'Qwen2DecoderLayer']
    _supports_flash_attn_2 = True

    def __init__(self, config: InternVLChatConfig, vision_model=None, language_model=None):
        super().__init__(config)
        
        image_size = config.force_image_size or config.vision_config.image_size
        patch_size = config.vision_config.patch_size
        self.patch_size = patch_size
        self.select_layer = config.select_layer
        self.template = config.template
        self.num_image_token = int((image_size // patch_size) ** 2 * (config.downsample_ratio ** 2))
        self.downsample_ratio = config.downsample_ratio
        self.ps_version = config.ps_version

        logger.info(f'num_image_token: {self.num_image_token}')
        logger.info(f'ps_version: {self.ps_version}')

        if vision_model is not None:
            self.vision_model = vision_model
        else:
            self.vision_model = InternVisionModel(config.vision_config)

        if language_model is not None:
            self.language_model = language_model
        else:
            if config.llm_config.architectures[0] == 'LlamaForCausalLM':
                self.language_model = LlamaForCausalLM(config.llm_config)
            elif config.llm_config.architectures[0] == 'InternLM2ForCausalLM':
                self.language_model = InternLM2ForCausalLM(config.llm_config)  # [2048, 92553]
            elif config.llm_config.architectures[0] == 'Phi3ForCausalLM':
                self.language_model = Phi3ForCausalLM(config.llm_config)
            elif config.llm_config.architectures[0] == 'Qwen2ForCausalLM':
                self.language_model = Qwen2ForCausalLM(config.llm_config)  # [151655, 896]
            else:
                raise NotImplementedError(f'{config.llm_config.architectures[0]} is not implemented.')

在完成权重转换和初步组网工作后，通过简单的预处理即可测试 InternVL2 模型的效果。InternVL2 在动态分辨率支持方面表现更为出色，但预处理过程比其他 VLM 模型复杂，需要严格处理对齐推理结果，以确保之后推理精度对齐工作顺利进行。注意到 InternVL2 的预处理相关部分paddlemix/datasets/internvl_dataset.py， InternVL2 模型存在与其他VLM模型不同的预处理模块：dynamic_preprocess。该模块支持动态处理输入图片，并能均匀打散成 image_size*image_size 的子块进行处理。由于这个模块的特殊性，模型预处理的适配更加复杂。在常规的 PaddleMIX 预处理流程中并不涉及这个模块，我们需要手动加入这个模块到视觉预处理流程中，并保证前向反向不被此模块影响。

def dynamic_preprocess(self, image, min_num=1, max_num=6, image_size=448):
    orig_width, orig_height = image.size  # 获取图像的宽度和高度
    aspect_ratio = orig_width / orig_height

    # calculate the existing image aspect ratio
    target_ratios = set(
        (i, j)
        for n in range(min_num, max_num + 1)
        for i in range(1, n + 1)
        for j in range(1, n + 1)
        if i * j <= max_num and i * j >= min_num
    )
    target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])

    # find the closest aspect ratio to the target
    target_aspect_ratio = self.find_closest_aspect_ratio(
        aspect_ratio, target_ratios, orig_width, orig_height, image_size
    )

    # calculate the target width and height
    target_width = image_size * target_aspect_ratio[0]
    target_height = image_size * target_aspect_ratio[1]
    blocks = target_aspect_ratio[0] * target_aspect_ratio[1]

    # resize the image
    resized_img = image.resize((target_width, target_height))
    processed_images = []

    for i in range(blocks):
        box = (
            (i % (target_width // image_size)) * image_size,
            (i // (target_width // image_size)) * image_size,
            ((i % (target_width // image_size)) + 1) * image_size,
            ((i // (target_width // image_size)) + 1) * image_size,
        )
        # split the image
        split_img = resized_img.crop(box)
        processed_images.append(split_img)

    assert len(processed_images) == blocks

    if self.use_thumbnail and len(processed_images) != 1:
        thumbnail_img = image.resize((image_size, image_size))
        processed_images.append(thumbnail_img)

    return processed_images

至此，模型组网与运行测试基本完成，后续进入到模型预处理与模型精度对齐阶段。

精度对齐

精度对齐是飞桨框架模型适配中的最后一步，在进行更严格的对比前，我们可以用不同组件进行快速实例化精度对比测试，待组件测试通过后，最后再结合文本与视觉编码器，形成整体模型组网对齐推理与训练精度。

if __name__ == "__main__":
    model = Phi3DecoderLayer(Phi3Config(), 32)
    model = model.astype("float16")
    dummy_tensor = paddle.randn([1, 1024, 3072]).cuda()
    dummy_tensor = dummy_tensor.astype("float16")
    out = model(dummy_tensor)

if __name__ == "__main__":
    model = InternLM2DecoderLayer(InternLM2Config())
    model = model.astype("float16")
    dummy_tensor = paddle.randn([1, 1024, 4096]).cuda()
    dummy_tensor = dummy_tensor.astype("float16")
    out = model(dummy_tensor)

if __name__ == "__main__":
    model = InternVisionEncoder(InternVisionConfig())
    input = paddle.randn([1, 196, 3200])
    out = model(input)

这里飞桨官方还提供了一项便于使用的对照工具：Padiff，我们可以很容易通过 auto_diff 接口，自动对不同框架的组网后模型，进行推理精度的自动对齐，如果出现问题，将会提示精度diff第一次出现的位置：

from padiff import auto_diff
import torch
import paddle

class SimpleModule(torch.nn.Module):
    def __init__(self):
        super(SimpleModule, self).__init__()
        self.linear1 = torch.nn.Linear(100, 10)

    def forward(self, x):
        x = self.linear1(x)
        return x

class SimpleLayer(paddle.nn.Layer):
    def __init__(self):
        super(SimpleLayer, self).__init__()
        self.linear1 = paddle.nn.Linear(100, 10)

    def forward(self, x):
        x = self.linear1(x)
        return x

module = SimpleModule()
layer = SimpleLayer()

inp = paddle.rand((100, 100)).numpy().astype("float32")
inp = ({'x': torch.as_tensor(inp)},
       {'x': paddle.to_tensor(inp)})

auto_diff(module, layer, inp, atol=1e-4, auto_init=True)

不仅限于推理精度对齐，Padiff 覆盖了模型在多种运行情况下的精度对齐，并支持自定义的 Loss 函数以及优化器参与精度对齐工作，大大减轻了模型精度对齐工作的难度：

1．权重拷贝阶段（当设置参数 auto_weights 为 True 时）

2．模型前反向对齐阶段

3．模型权重&梯度对齐阶段

如果运行过程中每一层网络的精度都正常对齐，最后的分析结果和对应报告将如下列结果所示：

[AutoDiff] Your options:
{
    atol: `0.0001`
    auto_init: `True`
    single_step: `False`
    use_loss: `False`
    use_opt: `False`
    rtol: `1e-07`
    compare_mode: `mean`
}

[AutoDiff] Assign weight success !!!

[AutoDiff] check cfg {'atol': 0.0001, 'rtol': 1e-07, 'compare_mode': 'mean'}

[AutoDiff] Checking report in /workspace/PaDiff/padiff_dump/SimpleModule(base_model)/auto_diff
and /workspace/PaDiff/padiff_dump/SimpleLayer(raw_model)/auto_diff

[AutoDiff] Check grads cfg: {'atol': 0.0001, 'rtol': 1e-07, 'compare_mode': 'mean'}

[AutoDiff] Checking grads in /workspace/PaDiff/padiff_dump/SimpleModule(base_model)/auto_diff
and /workspace/PaDiff/padiff_dump/SimpleLayer(raw_model)/auto_diff

[AutoDiff] grads compared.

[AutoDiff] SUCCESS !!!

此外，Padiff 还支持细粒度更小的精度对齐模式，对于一些较小的精度差别积累，可能会导致精度差别 diff 结果层与实际精度误差层不同。例如：auto_diff 接口发现了精度 diff ，但 log 信息中定位到的位置却是 Linear 等常见的 API ，检查后未发现 Linear 存在 diff 。此时可以在接口参数中打开 single_step 模式的开关，进行对齐检查的同时不断调整 atol 、 rtol 等参数，最后调整可定位到真正的模型层精度偏移位置。

合理运用 Padiff ，可加快任意框架模型在飞桨框架3.0下的精确转换，在精度一致的模型下进行正常训练与推理工作。

结语

总的来说，进行 PaddleMIX 模型新增时，需遵循三步骤：首先，进行飞桨权重转换，确保模型能在飞桨中正常解析使用；其次，利用飞桨的 Paconvert 工具进行模型组网转换，针对非常见 API 以及相关实现进行定制修改，确保预处理正确，视觉模型与语言模型组网结构完整可推理；最后，进行精度对齐，确保推理和训练结果与预期目标相符。

关注OpenGVLab 获取通用视觉团队最新资讯

🔗开源主页：https://github.com/OpenGVLab

📮官方邮箱：opengvlab@pjlab.org.cn