PyTorch量化教程：深入理解BackendConfig配置-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00413/article/details/148440281

PyTorch量化教程：深入理解BackendConfig配置

tutorials PyTorch tutorials. 项目地址: https://gitcode.com/gh_mirrors/tuto/tutorials

前言

在深度学习模型部署过程中，量化技术是优化模型推理性能的重要手段。PyTorch提供了强大的量化工具链，而BackendConfig API则是连接PyTorch量化与特定硬件后端的桥梁。本文将深入讲解如何使用BackendConfig为自定义后端配置量化支持。

BackendConfig基础概念

BackendConfig是PyTorch量化框架中的一个关键组件，它允许开发者：

定义后端支持的量化操作模式
指定数据类型配置(DTypeConfig)
配置操作融合模式
设置量化模式观察类型

目前BackendConfig主要支持FX图模式量化，未来可能会扩展到其他量化模式。

实战：为自定义后端配置量化

假设我们有一个自定义后端，仅支持两种量化操作：量化线性层和量化Conv-ReLU组合。下面我们将分步骤展示如何配置。

1. 定义参考模式

对于量化线性层，后端期望的参考模式为：

[dequant - fp32_linear - quant]

实际实现时需要先插入quant-dequant操作对，形成完整参考模型：

quant1 - [dequant1 - fp32_linear - quant2] - dequant2

类似地，对于Conv-ReLU组合，参考模型为：

quant1 - [dequant1 - fp32_conv_relu - quant2] - dequant2

2. 配置数据类型约束

我们需要定义DTypeConfig来指定量化参数约束：

quint8_with_constraints = DTypeWithConstraints(
    dtype=torch.quint8,
    quant_min_lower_bound=0,
    quant_max_upper_bound=255,
    scale_min_lower_bound=2 ** -12,
)

weighted_int8_dtype_config = DTypeConfig(
    input_dtype=quint8_with_constraints,
    output_dtype=quint8_with_constraints,
    weight_dtype=torch.qint8,
    bias_dtype=torch.float)

这里我们为quint8类型设置了量化范围和缩放因子的约束。

3. 配置Conv-ReLU融合

原始模型中的Conv和ReLU是分开的，我们需要先融合它们：

def fuse_conv2d_relu(is_qat, conv, relu):
    return torch.ao.nn.intrinsic.ConvReLU2d(conv, relu)

4. 完整BackendConfig配置

现在我们可以组合所有配置：

# 线性层配置
linear_config = BackendPatternConfig() \
    .set_pattern(torch.nn.Linear) \
    .set_observation_type(ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT) \
    .add_dtype_config(weighted_int8_dtype_config) \
    .set_root_module(torch.nn.Linear) \
    .set_qat_module(torch.nn.qat.Linear) \
    .set_reference_quantized_module(torch.ao.nn.quantized.reference.Linear)

# Conv-ReLU融合配置
conv_relu_config = BackendPatternConfig() \
    .set_pattern((torch.nn.Conv2d, torch.nn.ReLU)) \
    .set_fused_module(torch.ao.nn.intrinsic.ConvReLU2d) \
    .set_fuser_method(fuse_conv2d_relu)

# 量化ConvReLU配置
fused_conv_relu_config = BackendPatternConfig() \
    .set_pattern(torch.ao.nn.intrinsic.ConvReLU2d) \
    .set_observation_type(ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT) \
    .add_dtype_config(weighted_int8_dtype_config) \
    .set_root_module(torch.nn.Conv2d) \
    .set_qat_module(torch.ao.nn.intrinsic.qat.ConvReLU2d) \
    .set_reference_quantized_module(torch.ao.nn.quantized.reference.Conv2d)

# 组合成完整配置
backend_config = BackendConfig("my_backend") \
    .set_backend_pattern_config(linear_config) \
    .set_backend_pattern_config(conv_relu_config) \
    .set_backend_pattern_config(fused_conv_relu_config)

5. 配置QConfigMapping

QConfig必须满足后端的数据类型约束：

activation_observer = MinMaxObserver.with_args(quant_min=0, quant_max=127, eps=2 ** -12)
qconfig = QConfig(activation=activation_observer, weight=default_weight_observer)

qconfig_mapping = QConfigMapping() \
    .set_object_type(torch.nn.Linear, qconfig) \
    .set_object_type(torch.nn.Conv2d, qconfig) \
    .set_object_type(torch.nn.BatchNorm2d, qconfig) \
    .set_object_type(torch.nn.ReLU, qconfig)

6. 模型量化实现

定义并量化一个简单模型：

class MyModel(torch.nn.Module):
    def __init__(self, use_bn: bool):
        super().__init__()
        self.linear = torch.nn.Linear(10, 3)
        self.conv = torch.nn.Conv2d(3, 3, 3)
        self.bn = torch.nn.BatchNorm2d(3)
        self.relu = torch.nn.ReLU()
        self.sigmoid = torch.nn.Sigmoid()
        self.use_bn = use_bn

    def forward(self, x):
        x = self.linear(x)
        x = self.conv(x)
        if self.use_bn:
            x = self.bn(x)
        x = self.relu(x)
        x = self.sigmoid(x)
        return x

# 量化流程
example_inputs = (torch.rand(1, 3, 10, 10, dtype=torch.float),)
model = MyModel(use_bn=False)
prepared = prepare_fx(model, qconfig_mapping, example_inputs, backend_config=backend_config)
prepared(*example_inputs)  # 校准
converted = convert_fx(prepared, backend_config=backend_config)