图像 识别算法 分类算法
FixEfficientNet is a technique combining two existing techniques: The FixRes from the Facebook AI Team[2] and the EfficientNet [3] first presented from the Google AI Research Team. FixRes is the short form for Fix Resolution and tries to keep a fixed size for either the RoC (Region of Classification) used for train time or the crop used for test time. The EfficientNet is a compound scaling of the dimensions of a CNN which improves both accuracy and efficiency. This article is meant to explain both techniques and why they are state-of-the-art.
FixEfficientNet是一种结合了两种现有技术的技术:来自Facebook AI团队的FixRes [2] 以及由Google AI研究团队首先提出的EfficientNet [3]。 FixRes是Fix Resolution的简写形式,它尝试为火车时间或测试时间的作物保持固定大小。 EfficientNet是CNN尺寸的复合缩放,可提高准确性和效率。 本文旨在说明这两种技术及其最新技术。
The FixEfficientNet has been presented first with the corresponding paper on the 20th April 2020 from the Facebook AI Research Team [1]. The technique is used for Image Classification and consecutively a task of the field of Computer Vision. It is currently the state-of-the-art and has the best results on the ImageNet Dataset with 480M params, a top-1 accuracy of 88.5%, and top-5 accuracy of 98,7%.
首先,Facebook AI研究团队于2020年4月20日将FixEfficientNet与相应的论文一起展示[1]。 该技术用于图像分类,并且连续地是计算机视觉领域的任务。 目前,它是最新技术,在ImageNet数据集上具有480M参数,顶级1精度为88.5%和顶级5精度为98.7%的最佳结果。
But let’s dive in a bit deeper to get a better understanding of the combined techniques:
但是,让我们深入一点,以更好地了解组合技术:
了解FixRes (Understanding FixRes)
训练时间 (Training Time)
Until the Facebook AI Research Team proposed the FixRes technique the state-of-the-art was to extract a random square of pixels out of an image. This was used as RoC for the training time. (Be aware that using this technique the amount of data is artificially increased). The image has then been resized to obtain an image of a fixed size (=crop). This was then fed to the Convolutional Neural Network [2].
在Facebook AI研究团队提出FixRes技术之前,最先进的技术是从图像中提取像素的随机正方形。 在训练时间用作RoC。 (请注意,使用此技术会人为增加数据量)。 然后将图像调整大小以获得固定大小(=裁剪)的图像。 然后将其输入到卷积神经网络[2]。
RoC = rectangle/square in input imagecrop = pixels of RoC rescaled with a biliniear interpolation to a certain resolution
RoC =输入图像中的矩形/正方形crop =通过双线性插值将RoC像素重新缩放到特定分辨率
训练时规模扩大 (Train-time scale augmentation)
TTo get a better understanding of what FixRes exactly does let’s take a look at the math. Changing the size of the RoC in the input image affects the distribution of the size of the object given to CNN. The object has a size of r x r in the input image. If the RoC is now scaled it is changed by s and consecutively the size of the object is now rs x rs.
为了更好地了解FixRes的确切功能,让我们看一下数学。 更改输入图像中RoC的大小会影响分配给CNN的对象大小的分布。 该对象在输入图像中的大小为rxr 。 如果现在对RoC进行了缩放,则将s更改,并且对象的大小现在将连续变为rs x rs 。
For the augmentation, the RandomResizedCrop of PyTorch is used. The input image has a size of H x W, from which a RoC is randomly selected. This RoC is then resized to a crop of size
对于增强,使用PyTorch的RandomResizedCrop。 输入图像的大小为H x W ,从中随机选择RoC。 然后将此RoC调整为裁剪大小
The scaling of the input image (H x W) to the crop that is outp