[实用超分网络大赏]《Real-Time Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2021 Challenge》

本文链接：https://blog.youkuaiyun.com/u010087277/article/details/117065584

摘要：图像超分是最流行的计算机视觉问题之一，它在移动手机设备上有许多重要的应用。虽然针对该任务已经出现了许多解决方案，这些方案通常不能直接在智能手机的AI硬件上使用，更不用提有更多限制的智能TV平台，这种平台通常只支持INT8推理。为了解决这个问题，我们介绍第一届移动AI挑战赛，挑战赛的目标是发展一个端到端的基于学习的图像超分解决方案，该方案能够在移动手机或者网络NPUs上取得实时的性能表现。因此，所有参赛选手将获得一份DIV2K数据集，训练一个量化模型来高效地3倍上采样一副图像。所有模型将在Synaptics VS680智能家居平板上评估运行时间，该平板有一个专用的可以加速神经网络的NUP。所提解决方案需要兼容所有主流的移动手机AI加速器和在40-60ms内重建全高清图像，同时保证高保真度结果。本文将详细介绍在本次挑战赛中提出的所有模型。

1. Introduction

图像超分是一个经典的计算机视觉问题，它的目标是基于降采样图像重建出原始图像，补充遗失的高频信号和丰富纹理细节。在过去的几年，图像超分越来越流行，因为它能够直接应用于智能手机的相机中，做图像处理、低分辨率媒体数据增强以及上采样图像和视频到显示平板的目标高分辨率。在过去几年，大量的传统的和基于深度学习的方法被提出来解决这个问题。这些方法的最大限制是它们的主要目标是实现高保真度分数，但是不能适应计算效率和移动设备相关的限制。对于想在移动设备上现实超分或者其他相关的图像处理和增强任务来说，移动设备相关的限制是非常关键的。在这次挑战赛中，我们在解决这个问题上更进一步，使用常用的DIV2K图像超分数据集以及在解决方案中增加额外的效率相关的限制。当谈到移动设备上的基于AI解决方案的部署时，设计人员需要考虑移动NPU和DSP的独特性能来设计高效的模型。论文[29, 27]提供了一份关于智能AI加速器和它们表现的全面综述。根据这些论文中报告的结果，最新的移动NPUs已经接近不久前发布的中端桌面GPUs的结果。但是现在仍然存在着两个主要问题，阻止神经网络在移动设备上直接部署:a.限制的内存容量RAM；b.大多数深度学习层和操作不能有效支持，移动设备上支持的层和算子非常有限。这两个问题让标准神经网络模型处理高分辨率数据变得不可行。因此，需要对每一个框架做谨慎的适配，以应对不同移动AI设备硬件上的限制。这样的优化包括模型裁剪和压缩[8, 23, 40,42, 45]，16-bit/8-bit[8, 38, 37, 58]和low-bit[7, 54, 34, 43]量化，独特设备或者独特NPU的适配和基于平台的网络框架搜索[15,47, 56, 55]。

虽然很多旨在设计高效深度学习模型的竞赛和工作最近被提出来，但是我们通常在桌面CPUs或者GPUs上评估这些已经获得的解决方案。基于前面的分析，这些解决方案并不实用。为了解决这个问题，我们介绍第一届移动AI挑战赛，所有深度学习解决方案在真实移动设备上运行和评估。在这次的竞赛中，参赛队伍使用DIV2K数据集（包含各种各样的2K分辨率RGB图像）训练他们的3倍超分模型。更重要的是，由于许多移动和智能TV平台只能加速INT8模型，所有提交的解决方案不得不完全量化。在挑战赛中，所有参赛选手在Synaptics Dolphin平台（包含一个能够高效加速INT8神经网络的专用NPU）评估运行时间和调整模型。每个提交方案的最终分数基于运行时间和保真度结果，因此所提交方案会在图像复原质量和模型效率上取得平衡。最后，所有提出的解决方案必须完全兼容TensorFlow Lite框架。因此，这些解决方案能够在任何移动平台（包含使用Android Neural Networks API (NNAPI)或者定制TFLite代理的 AI加速器）上被部署和加速。

这次竞赛是MAI 2021 研讨会和挑战赛的一部分，该挑战赛包含

2. challenge

2.1 Local Runtime Evaluation

当为一个移动设备开发AI解决方案，很重要的一点是能够测试所设计的模型和在可用设备上本地调试遇到的问题。鉴于此，我们将给参赛选手提供一个AI Benchmark 应用，这个应用允许下载任何定制的TensorFlow Lite模型和在任何支持加速选项的安卓设备上运行。这个工具包含最新版本的Android NNAPI, TFLite GPU, Hexagon NN, Samsung Eden and MediaTek Neuron delegates。因此支持所有当前的移动平台，让用户能够在智能手机NPUs, APUs, DSPs, GPUs and CPUs上执行神经网络。

为了加载和运行一个定制的Tensorflow Lite 模型，参赛人员需要按照以下步骤：

1. 从官网或者谷歌商店下载AI Benchmark，然后运行它的标准测试例程。

2. 在上面测试的最后阶段，进入RO Mode，然后选择定制模型标签。

3. 重新命名导出的TFLite model 为 model.tflite，然后将它放到设备的下载目录下面。

4. 选择模型类型（INT8, INT16或者FP32），期望的加速器和推理选项，然后运行模型。

这些步骤的示例如图2所示：

2.3 Runtime Evaluation on the Target Platform

在这次挑战赛中，我们使用Synaptics VS680 Edge AI SoC [35] Evaluation Kit 作为我们的目标运行评估平台。VS680 Edge AI SoC已经被整合到智能家居解决方案中，它的特色是拥有一个由VeriSilicon设计的强大NPU和能够加速量化模型（up to 7 TOPS）。它支持安卓系统和能够通过NNAPI执行NN推理，证明INT8 AI Benchmark分数接近中端智能手机芯片组的分数。在本次挑战赛中，参赛选手能够上传他们的TFLite模型到一个外部服务器上，获得关于他们模型速度的反馈：他们的模型在上述NPUs上的推理时间，如果所提交网络结构包含一个不兼容的操作或者不合适的量化，将反馈一个错误日志。参赛选手的模型将由Synaptics’ TFLite代理来解析和加速。这个代理能够动态解析，将所提供的模型的关于神经网络层和操作的高级表达形式映射到等价的内部二进制表达形式（能够在VS680’s NPU上高效整合和执行使用的表达形式）。在最后的运行时间评估阶段中使用相同的设置。我们会给参赛选手提供一份额外的清单，该清单包含平板所支持的操作ops和模型优化指导手册来完全使用NPUs的卷积和tensor处理资源。

3. Challenge Results

在180名注册选手中，12支队伍进入最后阶段。他们提交了模型结果、TFLite models、codes、可执行文件和说明书。表1 总结了最终的比赛结果和报告了有效解决方案在最终测试集和目标评价平台上的PSNR, SSIM和运行时间。所提出的方法在Section4中介绍，队伍成员和附属机构在Appendix A中列出。

3.1 Results and Disscussion

在本次竞赛中考虑的问题是非常有挑战性的，因为所提解决方案必须在目标智能家居平台上使用和全量化。虽然有许多工作提出不同的权重量化技术，主流方法还是使用floating激活：在这种情况下，结果模型与INT8 NPU不兼容。网络量化的全部思想丧失了，因为在实际硬件平台上无法获得实益。因此，在这次竞赛中参赛选手必须执行全模型量化包括输入、权重、卷积和激活------使用浮点op的网络是不能接受的。从表1可以看到，只有6支队伍能够超越简单的bicubic图像上采样baseline。在其他的所有队伍中量化之后的精度损失是巨大的，模型只能产生破损的输出。大多数参赛选手遇到的一个主要问题是避免使用浮点型输出量化：没有这个模块，量化输出通常被错误地归一化，产生错误的网络输出值缩放比例。处理这个问题的一个简单方法是在模型顶部添加一个截断的ReLU层，这样输出被线性映射到[0, 255]区间。使用quantized-aware训练也能够极大的帮助获得好的保真度结果。

大部分所提方案展现出非常高的效率，在目标Synaptics VS680平板上以60-80ms的运行速度将640x360输入图像超分到全高清分辨率。因为不是所有的LFLite操作能够等价的在目标平台上使用，参赛选手需要依赖推荐的操作集合来构建他们的模型，以设计一个能够最大化NUP利用率的解决方案。队伍Aselsan Research是本次竞赛的获胜者。作者使用一个相对小的拥有分组卷积的模型实现了高的保真度和好的运行时间。虽然队伍MCG取得了更好的结果，不幸的是，它只能在竞赛的最后解决跟模型量化相关的所有问题。这支队伍和Noah TerminalVision和ALONG使用相同的思想----使用卷积或者残差块只学习超分残差。EmbededAI和mju gogogo提供的量化模型展示了比基准bicubic超分方法的微弱提升。剩下的队伍无法战胜模型量化带来的输出结果精确度降低，尽管他们原始的浮点网络展示了相当好的保真度分数。

在运行时间验证阶段刚开始时，大多数提交方案要么在目标Synaptics平台上宕机，要么测试一张目标分辨率的图像需要几秒钟的时间。大多数队伍花费了几轮迭代时间提出一个能够在所使用的NPU上加速的解决方案，尽管一开始大多数模型非常轻量并且在桌面CPUs和GPUs上展示了好的运行速度。这表明当谈到在移动AI芯片上做模型部署时，在通常的深度学习硬件平台上获得的运行时间没有代表性。在考虑到IoT的特殊限制和移动AI的加速硬件与框架的条件下，甚至那些看起来非常高效的解决方案也会变得非常艰难。这使得为移动设备开发深度学习解决方案极具挑战性，尽管在这次竞赛中获得的结果证明：当考虑了上述所提问题后仍能获得一个非常高效的模型。

4. Challenge Methods

这一小节描述在MAI 2021实时图像超分竞赛的最后阶段所有参赛队伍提交的解决方案。

4.1 Aselsan Research

Aselsan Research提供的模型框架如图3所示。这个模型的主结构块建立在分组卷积的概念上：输入特征图被分为4个部分，它们被分别传送给卷积（并行工作）来降低内存消耗和计算代价。作者强调尽管使用分开的卷积代替标准的卷积层带来了更好的运行时间，但是这也导致了模型量化后大的精确度降低，因此在最终的模型框架中没有采用分开卷积结构。一个额外的skip连接被用来提升所提解决方案的保真度结果，没有使用输入归一化提升模型速度。

训练数据的输入尺寸是64x64。batchsize为16。采用Charbonnier损失，优化器是Adam optimizer，学习率范围是25e-4到1e-4。使用tensorflow的标准后训练量化工具做模型量化，在模型顶部添加clipped ReLU避免错误的输出归一化。在[5]中有关于模型、设计选择和训练步骤的更详细描述。

4.2 MCG

队伍MCG提出了一个针对该问题的基于锚的CNN结构，如图4所示。这个结构背后的主要设计思路是只学习超分图像的残差部分。如果我们将这个模型中的卷积层全部移除，那么整个工作流就变成下面的样子：输入图像被堆叠3x3=9次（这里3是上采样倍率），然后使用depth-to-space层重新排列（reshape）到输出分辨率。结果图拥有与目标超分图相同的分辨率------通过重复每个像素值9次来实现resize操作。前面描述的“移除”卷积块因此学习低分辨率图像与超分图像的差，这个差在depth-to-space层之前被加到输入图像。这些块结构包含5个3x3卷积（后面带ReLU激活层）和一个顶部的额外卷积层（无激活层）。

训练样本输入尺寸是64x64（包含随机翻转和随机旋转），batchsize是16。使用L1损失函数，优化器是Adam。初始学习率是1e-3，每隔200个epoch降低到原来的一半。使用Quantized-aware 训练和后训练量化来获得精确的INT8模型。在网络顶部添加clipped ReLU来避免错误的输出归一化。[13]中提供了所提方法的更详细描述。

4.3 Noah TerminalVision

队伍Noah TerminalVision提出了一个小的TinySRNet模型，如图Fig5所示。这个网络包含3个残差块（每个包含两个卷积）、space-to-depth(S2D)、depth-to-space(D2S)和一个残差卷积层。作者特别强调了残差块的重要性，这帮助模型在量化后保持高的精确度。网络采用L1损失，优化器采用Adam，迭代100w次。学习率从5e-4开始，每隔200K次迭代下降到1e-6。采用Quantized-aware训练提升结果INT8模型的精确度。

4.4 ALONG

队伍ALONG提出的结构与上一个解决方案非常相似，如图Fig6所示。主要差异在于在原始分辨上执行操作和使用最近邻上采样代替连接输入输出残差块中的卷积。模型首先使用L1损失函数训练，然后使用L2损失函数做finetune，输入分辨率是128x128（包含随机翻转和随机旋转）。优化器是Adam，初始学习率2e-4, 每个20w步，学习率降低二分之一。采用Quantized-aware训练来提升结果INT8模型的精确度。

4.5 EmbededAI

队伍EmbededAI 提出了一个针对超分任务的朴素重参数化卷积超分模型（PRPSR）。他们使用纯卷积拓扑结构设计网络：输入低分辨率图像被送给一个5x5的卷积层，执行特征提取，后接5个3x3卷积，一个5x5卷积和一个pixel-shuffle层将输出转换到目标分辨率，如图Fig7右图所示。该网络结构没有残差块，skip连接和上采样甚至没有加法操作来提升速度。在训练阶段，每一个卷积被分离成3个操作的组：

$y=x+conv_{1x1}\left ( x \right )+conv_{3x3}\left ( x \right )$ , 后接一个ReLU激活函数，如图Fig7左边的图像。在训练结束后，每一个卷积组的参数被转换为一个单独卷积的参数。

训练输入图像尺寸是64x64，batchsize是32。损失函数使用L1，优化器是Adam, 初始学习率是5e-4, 每隔200个epoch学习率降低到原来的一半。一共训练600个epoch。使用Tensorflow 的后训练量化工具做模型量化，最后一个卷积层后面接一个Clipped ReLU激活函数来避免错误的输出归一化。

4.6 mju gogogo

队伍mju_gogogo提出的结构如图Fig8所示。该模型基于EDSR网络设计，添加了额外的空间attention块。这个网络是相对比较大的，它包含6个残差块，每一个跟对应的attention单元的输出相乘。一个1x1卷积层将滤波器数量从384降低到64，一个全局skip连接。三个3x3卷积层和一个depth-to-space 层，模型首先使用mean absolute error（MAE）损失训练750个epochs，优化器是Adam。然后使用quantized-aware训练来做fine-tune，使用MSE损失训练75个epochs。

5. Additional Literature

往届竞赛中的移动设备相关任务和相应解决方案的综述可以在下面的文献中找到：

reference:

[1]. Abdelrahman Abdelhamed, Mahmoud Afifi, Radu Timofte, and Michael S Brown. Ntire 2020 challenge on real image denoising: Dataset, methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 496–497, 2020. 7
[2]. Abdelrahman Abdelhamed, Radu Timofte, and Michael S Brown. Ntire 2019 challenge on real image denoising: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 7

[5]. Mustafa Ayazoglu. Extremely lightweight quantization robust real-time single-image super resolution for mobile devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2021. 5, 7

[6]. Jianrui Cai, Shuhang Gu, Radu Timofte, and Lei Zhang. Ntire 2019 challenge on real image super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 1, 7

[7]. Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W Mahoney, and Kurt Keutzer. Zeroq: A novel zero shot quantization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13169–13178, 2020. 1

[8]. Cheng-Ming Chiang, Yu Tseng, Yu-Syuan Xu, Hsien-Kai Kuo, Yi-Min Tsai, Guan-Yu Chen, Koan-Sin Tan, Wei-Ting Wang, Yu-Chieh Lin, Shou-Yao Roy Tseng, et al. Deploying image deblurring across mobile devices: A perspective of quality and latency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 502–503, 2020. 1

[13]. Zongcai Du, Jie Liu, Jie Tang, and Gangshan Wu. Anchorbased plain net for mobile image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2021. 6, 7

[15]. Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314–1324, 2019. 1

[17]. Andrey Ignatov, Kim Byeoung-su, and Radu Timofte. Fast camera image denoising on mobile gpus with deep learning, mobile ai 2021 challenge: Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2021. 2

[18]. Andrey Ignatov, Jimmy Chiang, Hsien-Kai Kuo, Anastasia Sycheva, and Radu Timofte. Learned smartphone isp on mobile npus with deep learning, mobile ai 2021 challenge: Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0,2021. 2

[21]. Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, and Radu Timofte. Fast and accurate singleimage depth estimation on mobile devices, mobile ai 2021 challenge: Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2021. 2

[22]. Andrey Ignatov, Grigory Malivenko, and Radu Timofte. Fast and accurate quantized camera scene detection on smartphones, mobile ai 2021 challenge: Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2021. 2

[23]. Andrey Ignatov, Jagruti Patel, and Radu Timofte. Rendering natural camera bokeh effect with deep learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 418–419, 2020. 1

[24]. Andrey Ignatov, Jagruti Patel, Radu Timofte, Bolun Zheng, Xin Ye, Li Huang, Xiang Tian, Saikat Dutta, Kuldeep Purohit, Praveen Kandula, et al. Aim 2019 challenge on bokeh effect synthesis: Methods and results. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 3591–3598. IEEE, 2019. 7

[25]. Andrey Ignatov, Andres Romero, Heewon Kim, and Radu Timofte. Real-time video super-resolution on smartphones with deep learning, mobile ai 2021 challenge: Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2021.2

[26].Andrey Ignatov and Radu Timofte. Ntire 2019 challenge on image enhancement: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 7

[27]. Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. Ai benchmark: Running deep neural networks on android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018. 1, 2

[28].Andrey Ignatov, Radu Timofte, Sung-Jea Ko, Seung-Wook Kim, Kwang-Hyun Uhm, Seo-Won Ji, Sung-Jin Cho, JunPyo Hong, Kangfu Mei, Juncheng Li, et al. Aim 2019 challenge on raw to rgb mapping: Methods and results. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 3584–3590. IEEE, 2019. 7

[29]. Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. Ai benchmark: All about deep learning on smartphones in 2019. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 3617–3635. IEEE, 2019. 1, 2

[30]. Andrey Ignatov, Radu Timofte, Ming Qian, Congyu Qiao, Jiamin Lin, Zhenyu Guo, Chenghua Li, Cong Leng, Jian Cheng, Juewen Peng, et al. Aim 2020 challenge on rendering realistic bokeh. In European Conference on Computer Vision, pages 213–228. Springer, 2020. 7

[31]. Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, et al. Pirm challenge on perceptual image enhancement on smartphones: Report. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018. 1, 7

[34]. Dmitry Ignatov and Andrey Ignatov. Controlling information capacity of binary neural network. Pattern Recognition Letters, 138:276–281, 2020. 1

[35]. Synaptics Inc. https://www.synaptics.com/technology/edge-computing. 3

[37].Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2704–2713, 2018. 1

[38]. Sambhav R Jain, Albert Gural, Michael Wu, and Chris H Dick. Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. arXiv preprint arXiv:1903.08066, 2019. 1

[40]. Yawei Li, Shuhang Gu, Luc Van Gool, and Radu Timofte. Learning filter basis for convolutional neural network compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5623–5632, 2019. 1

[42]. Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3296–3305, 2019. 1

[43]. Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, and Kwang-Ting Cheng. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proceedings of the European conference on computer vision (ECCV), pages 722–737, 2018. 1

[44]. Andreas Lugmayr, Martin Danelljan, and Radu Timofte. Ntire 2020 challenge on real-world image super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 494–495, 2020. 1, 7

[45]. Anton Obukhov, Maxim Rakhuba, Stamatios Georgoulis, Menelaos Kanakis, Dengxin Dai, and Luc Van Gool. T-basis: a compact representation for neural networks. In International Conference on Machine Learning, pages 7392–7404. PMLR, 2020. 1

[47]. Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2820–2828, 2019. 1

[52]. Radu Timofte, Shuhang Gu, Jiqing Wu, and Luc Van Gool. Ntire 2018 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 852–863, 2018. 1, 7

[54]. Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, and Akira Nakamura. Mixed precision dnns: All you need is a good parametrization. arXiv preprint arXiv:1905.11452, 2019. 1

[55]. Alvin Wan, Xiaoliang Dai, Peizhao Zhang, Zijian He, Yuandong Tian, Saining Xie, Bichen Wu, Matthew Yu, Tao Xu, Kan Chen, et al. Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12965–12974, 2020. 1

[56]. Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10734–10742, 2019. 1

[58]. Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. Quantization networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7308–7316, 2019. 1