Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

本文探讨了一种仅使用整数运算的量化方法,用于提高神经网络推理效率。通过特定的量化方案、整数矩阵乘法和训练策略,确保量化后的模型在保持高精度的同时,降低计算延迟。实验表明,该方法尤其适用于MobileNets,在保持良好准确率的同时,优化了设备上的推理速度。

摘要

作者提出了一种只使用整数运算的quantization方式,比起float point运算效率更高。同时提出了一种相应的训练方式来保证quantization之后的准确率。这篇文章的方法提升了accuracy和on-device latency之间的trade off,并且可以在MobileNets上使用。

1 introduction

作者总结了目前有效将庞大的神经网络应用在资源更为有限的移动端设备上的两种主流方式:1.神经全新的体量更小的神经网络,eg:MobileNets,SquueezeNet,ShuffleNet和DenseNet等等;2.将32-bit或64-bit的weights或activations缩小到精度更小的bit单元上,eg:8-bit。

在这篇文章当中,作者主要通过提升了在MobileNets上的accuracy和on-device latency之间的tradeoff来解决这个问题。作者主要采用的方法如下:

1. quantization scheme:将wights和activations全部quantize到8-bit,以及将部分的参数保留在32-bit,例如:bias vector;

2. quantized inference framework:可以在只能进行整数运算的硬件上高效运行,eg:Qualcomm Hexagon;

3.quantized training framework:和quantized inference framework相辅相成,减少quantization带来的accuracy loss;

4.提升了MobileNets在ARM CPUs上的accuracy和ond-device之间的tradeoff。

2 quantized inference

2.1 quantization scheme

quantization scheme就如同是从整数q到实数r的仿射变化: 

其中S和Z被称为quantization parameters。并且,同一个weights、activations array中的值使用相同的quantization paramete

### PyTorch Quantization Aware Training (QAT) with YOLOv8 Implementation #### Overview of QAT in PyTorch Quantization Aware Training (QAT) is a technique that simulates the effects of post-training quantization during training, allowing models to be trained directly for lower precision inference. This approach helps mitigate accuracy loss when converting floating-point models to integer-based representations like INT8[^2]. For implementing QAT specifically within the context of YOLOv8 using PyTorch, several key steps need attention: #### Preparation Steps Before Applying QAT on YOLOv8 Model Before applying QAT, ensure the environment setup includes necessary libraries such as `torch`, `torchvision` along with specific versions compatible with your hardware and software stack. Ensure the model architecture supports QAT by verifying compatibility or making adjustments where required. Some layers might not support direct quantization; hence modifications may be needed before proceeding further. ```python import torch from ultralytics import YOLO model = YOLO('yolov8n.pt') # Load pre-trained YOLOv8 nano model ``` #### Configuring the Model for QAT To prepare the YOLOv8 model for QAT, configure it according to PyTorch guidelines provided in official documentation[^1]. The configuration involves setting up observers which will collect statistics about activations and weights throughout different stages of forward passes. ```python # Prepare model for QAT model.train() model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm') torch.quantization.prepare_qat(model, inplace=True) for name, module in model.named_modules(): if isinstance(module, torch.nn.Conv2d): torch.quantization.fuse_modules( model, [name], inplace=True ) ``` #### Fine-Tuning Process During QAT Phase During fine-tuning phase under QAT mode, continue training while periodically validating performance metrics against validation datasets. Adjust learning rates carefully since aggressive changes could negatively impact convergence properties observed earlier without quantization constraints applied. Monitor both original float32 evaluation scores alongside their corresponding int8 counterparts generated through simulated low-bit operations introduced via inserted fake_quant modules across network paths. ```python optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9) criterion = torch.nn.CrossEntropyLoss() for epoch in range(num_epochs): train_one_epoch(model, criterion, optimizer, data_loader_train, device=device) validate(model, criterion, data_loader_val, device=device) ``` #### Exporting Post-QAT Trained Models into ONNX Format Once satisfied with achieved accuracies after completing sufficient epochs count, export resulting optimized graph structure together with learned parameters ready for deployment onto target platforms supporting efficient execution over reduced bit-width arithmetic units. Exported files should contain explicit instructions regarding how each tensor gets transformed between full-range floats versus narrow-scaled integers at runtime boundaries defined inside exported protocol buffers specification documents adhering closely enough so they remain interoperable among diverse ecosystem components involved from development until production phases inclusive. ```python dummy_input = torch.randn(1, 3, 640, 640).to(device) torch.onnx.export( model.eval(), dummy_input, 'qat_yolov8.onnx', opset_version=13, do_constant_folding=True, input_names=['input'], output_names=['output'] ) ``` #### Best Practices When Implementing QAT on YOLOv8 Adopting best practices ensures successful application of QAT techniques leading towards effective utilization of computational resources available today's edge devices capable running deep neural networks efficiently even constrained environments characterized limited power supply options present mobile phones cameras drones etcetera. - **Start Simple**: Begin experimentation process utilizing smaller variants first e.g., Nano version instead larger ones initially. - **Validate Early & Often**: Regularly check intermediate results ensuring no significant drop occurs compared baseline configurations prior introducing any form approximation schemes whatsoever. - **Adjust Learning Rate Carefully**: Gradually decrease step sizes especially near end iterations avoiding abrupt shifts causing instability issues otherwise avoided altogether following systematic reduction schedules designed maintain stability throughout entire procedure duration spanned multiple rounds optimization cycles executed sequentially ordered fashion preserving overall integrity final product delivered customers hands ultimately.
评论 3
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值