使用Distiller项目实现PyTorch模型的后训练量化转换-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00617/article/details/148526910

使用Distiller项目实现PyTorch模型的后训练量化转换

distiller 项目地址: https://gitcode.com/gh_mirrors/di/distiller

概述

在深度学习模型部署过程中，模型量化是优化推理性能的重要手段。本文将详细介绍如何使用Distiller项目实现PyTorch模型的后训练量化(Post-Train Quantization)以及如何将量化后的模型转换为PyTorch原生量化模型。

量化技术背景

模型量化是指将浮点模型转换为低精度(通常是8位整数)表示的过程，这可以显著减少模型大小和计算资源需求。PyTorch从1.3版本开始内置了量化功能，而Distiller项目则提供了独立的量化实现。

Distiller的量化实现有以下特点：

在FP32环境下模拟量化操作
支持GPU加速
提供灵活的量化配置选项

PyTorch原生量化的优势在于：

在CPU上提供优化的8位执行
支持导出到GLOW框架
更轻量的运行时依赖

环境准备

首先需要安装必要的依赖包：

import torch
import matplotlib.pyplot as plt
import os
import math
import torchnet as tnt
from copy import deepcopy
from collections import OrderedDict

import distiller
from distiller.models import create_model
import distiller.quantization as quant

模型创建与量化

加载预训练模型

model = create_model(pretrained=True, dataset='imagenet', arch='resnet18', parallel=True)

数据加载器配置

为GPU和CPU分别创建数据加载器，确保使用相同的测试数据子集：

# GPU数据加载器
batch_size_gpu = 256
num_workers_gpu = 10
_, _, test_loader_gpu, _ = distiller.apputils.load_data(
    dataset, arch, dataset_path, 
    batch_size_gpu, num_workers_gpu,
    effective_test_size=subset_size, fixed_subset=True, test_only=True)

# CPU数据加载器
batch_size_cpu = 44
num_workers_cpu = 10
_, _, test_loader_cpu, _ = distiller.apputils.load_data(
    dataset, arch, dataset_path, 
    batch_size_cpu, num_workers_cpu,
    effective_test_size=subset_size, fixed_subset=True, test_only=True)

后训练量化实现

Distiller提供了灵活的后训练量化配置：

quant_mode = {'activations': 'ASYMMETRIC_UNSIGNED', 'weights': 'SYMMETRIC'}
stats_file = "resnet18_quant_stats.yaml"  # 量化统计文件
dummy_input = distiller.get_dummy_input(input_shape=model.input_shape)

quantizer = quant.PostTrainLinearQuantizer(
    deepcopy(model), bits_activations=8, bits_parameters=8, mode=quant_mode,
    model_activation_stats=stats_file, overrides=None
)
quantizer.prepare_model(dummy_input)

模型转换与性能对比

转换为PyTorch原生量化模型

pyt_model = quantizer.convert_to_pytorch(dummy_input)
print('Distiller模型设备:', distiller.model_device(quantizer.model))
print('PyTorch模型设备:', distiller.model_device(pyt_model))

性能评估

我们分别评估三种场景下的模型性能：

Distiller量化模型在GPU上的性能
Distiller量化模型在CPU上的性能
PyTorch原生量化模型在CPU上的性能

评估函数实现：

def eval_model(data_loader, model, device, print_freq=10):
    criterion = torch.nn.CrossEntropyLoss().to(device)
    loss = tnt.meter.AverageValueMeter()
    classerr = tnt.meter.ClassErrorMeter(accuracy=True, topk=(1, 5))
    
    model.eval()
    for step, (inputs, target) in enumerate(data_loader):
        with torch.no_grad():
            inputs, target = inputs.to(device), target.to(device)
            output = model(inputs)
            loss.add(criterion(output, target).item())
            classerr.add(output.data, target)

技术细节解析

量化层实现差异

输入量化处理：
- Distiller在量化模块内部处理输入量化
- PyTorch原生量化模块假设输入已经量化
ReLU融合：
- Distiller通过clip_half_range属性标识
- PyTorch直接使用QuantizedConvReLU2d模块类型

自动量化/反量化处理

Distiller为每个量化模块添加了输入量化和输出反量化操作。转换为PyTorch时会移除冗余操作，仅在必要时保留：

# 示例：查看ResNet基本块结构
print(pyt_model.layer2[0])

混合精度支持

可以指定某些层保持FP32精度，系统会自动处理量化/反量化操作：

overrides = OrderedDict(
    [('layer2.0.downsample.0', OrderedDict([('bits_activations', None), ('bits_weights', None)]))]
)

模型保存与加载

保存量化模型

distiller.apputils.save_checkpoint(0, 'resnet18', quantizer.model)

加载并转换量化模型

loaded_model = create_model(False, dataset='imagenet', arch='resnet18', parallel=True)
loaded_model = distiller.apputils.load_lean_checkpoint(loaded_model, 'checkpoint.pth.tar')
loaded_pyt_model = distiller.quantization.convert_distiller_ptq_model_to_pytorch(loaded_model, dummy_input)