【论文阅读】Shuffle Net系列【V1—V2】

何如千泷

已于 2024-05-28 11:08:01 修改

阅读量1.2k

点赞数 2

分类专栏： # 图像分类论文阅读文章标签： ShuffleNet 深度学习轻量级网络卷积卷积神经网络

于 2021-10-31 15:19:07 首次发布

本文链接：https://blog.youkuaiyun.com/qq_42735631/article/details/121064514

版权

论文阅读同时被 2 个专栏收录

23 篇文章

订阅专栏

图像分类

9 篇文章

订阅专栏

1. ShuffleNet V1

1.1 Abstract

我们提出了一个极其效率的CNN架构——ShuffleNet，其专为计算能力非常有限的移动设备设计。这个新的架构利用了两个新的操作：pointwise group conv和channel shuffle，并大大降低计算成本，同时确保准确性。

1.2 Approach

1.2.1 Channel Shuffle for Group Convolutions

在微型网络结构中，由于 1×1 卷积计算代价很高，在计算资源有限的情况下特征图的通道数就会受限，这会极大地降低模型的准确率。为了解决这个问题，一个简单的方案就是通道之间进行稀疏连接，也就是对 1×1 卷积也进行分组。

如下图图(a)所示，输出特征图只与一部分输入特征图相连接。但这样就会带来一个副作用，叠加几个卷积层后，输出的特征图都只由输入特征的其中一部分产生，比如图中红色部分的特征就只由输入的红色部分特征得来，而蓝色部分的特征就只由输入的蓝色部分特征得来。这阻止了不同组之间特征的信息流动因此会减弱网络的表示能力。

在这里插入图片描述

如果我们允许组卷积可以从不同组获取输入数据，那么输入通道和输出通道将会完全相关联。如上图图(b)所示，先把每个组内的特征分为几个子组特征，再把每个子组特征分别送到下一层的每个组中去卷积。假设一个卷积层有 $g$ 个组，其输出具有 $\times n$ 个通道，我们首先将输出通道reshape为 $(g, n)$ ，然后再转置，最后将其flatten（展平）作为下一层的输入。具体见下图，其中 $g = 3 ， n = 2$

在这里插入图片描述

1.2.2 ShuffleNet Unit

在这里插入图片描述

接下来分析一下，ShuffleNet的FLOPs的变化。假设输出尺寸为 $c * h * w$ ，和bottleneck中的通道数为 $m$ ， $g$ 是分组的组数。其中ResNet（左）和ResNeXt（右）的结构单元如下图所示：
在这里插入图片描述

ResNet的FLOPs为： $\times 1 \times 1)hwm+(m \times 3 \times 3)hwm+(m \times 1 \times 1)hwc=9hwm^2+2hwcm=hw(2cm+9m^2)$
ResNeXt的FLOPs为： $c/g \times 1 \times 1)hwm + (m/g \times 3 \times 3)hwm+(m/g \times 1 \times 1)hwc=hw(2cm/g+9m^2/g)$ （论文中是 $hw(2cm+9m^2/g)$ ，应该是将ResNeXt中bottleneck中的1x1卷积看作为常规卷积，而非组卷积）
ShuffleNet的FLOPs为： $c/g \times 1 \times 1)hwm+(m/m \times 3 \times 3)hwm+(m/g \times 1 \times 1)hwc=hw(2cm/g+9m)$

可以看出，ShuffleNet相对的FLOPs较小

1.3 Network Architecture

在这里插入图片描述

网络中的 ShuffleNet 单元可以划分为三个阶段，每个阶段的第一个单元步长为 2，每经过一个阶段特征图通道数翻倍，瓶颈层的特征图通道数为输出通道数的1/4

分组卷积的组数 $g$ 控制着点卷积的稀疏性，在同一个复杂度下，组数越多，特征图的通道数就可以越大。

1.4 Pytorch实现

import torch
import torch.nn as nn


class ConvBNRelU(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size, stride, groups):
        padding = (kernel_size - 1) // 2
        super(ConvBNRelU, self).__init__(
            nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups),
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True),
        )


class ConvBN(nn.Sequential):
    def __init__(self, in_channel, out_channel, groups):
        super(ConvBN, self).__init__(
            nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=1, stride=1, groups=groups),
            nn.BatchNorm2d(out_channel),
        )


class ChannelShuffle(nn.Module):
    def __init__(self, groups):
        super(ChannelShuffle, self).__init__()
        self.groups = groups

    def forward(self, x):
        # Channel shuffle: [N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]
        bacth_size, num_channels, height, width = x.size()
        channels_per_group = num_channels // self.groups

        x = x.view(bacth_size, self.groups, channels_per_group, height, width)
        x = torch.transpose(x, dim0=1, dim1=2).contiguous()
        x = x.view(bacth_size, -1, height, width)
        return x


class ShuffleNetUnits(nn.Module):
    def __init__(self, in_channel, out_channel, stride, groups):
        super(ShuffleNetUnits, self).__init__()
        self.stride = stride
        out_channel = out_channel - in_channel if self.stride > 1 else out_channel
        mid_channel = out_channel // 4

        self.bottleneck = nn.Sequential(
            ConvBNRelU(in_channel=in_channel, out_channel=mid_channel, kernel_size=1, stride=1, groups=groups),
            ChannelShuffle(groups=groups),
            ConvBNRelU(in_channel=mid_channel, out_channel=mid_channel, kernel_size=3, stride=stride, groups=groups),
            ConvBN(in_channel=mid_channel, out_channel=out_channel, groups=groups),
        )

        if self.stride > 1:
            self.shortcut = nn.AvgPool2d(kernel_size=3, stride=2, padding=1)

        self.relu = nn.ReLU6(inplace=True)

    def forward(self, x):
        out = self.bottleneck(x)
        if self.stride > 1:
            out = torch.cat([self.shortcut(x), out], dim=1)
        else:
            out += x
        return self.relu(out)


class ShuffleNet(nn.Module):
    def __init__(self, planes, layers, groups, num_classes=1000):
        super(ShuffleNet, self).__init__()

        self.stage1 = nn.Sequential(
            ConvBNRelU(in_channel=3, out_channel=24, kernel_size=3, stride=2, groups=1),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
        )

        self.stage2 = self._make_layer(in_channel=24, out_channel=planes[0], groups=groups, block_num=layers[0], is_stage2=True)
        self.stage3 = self._make_layer(in_channel=planes[0], out_channel=planes[1], groups=groups, block_num=layers[1], is_stage2=False)
        self.stage4 = self._make_layer(in_channel=planes[1], out_channel=planes[2], groups=groups, block_num=layers[2], is_stage2=False)

        self.globalpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Sequential(
            nn.Dropout(p=0.2),
            nn.Linear(in_features=planes[2], out_features=num_classes)
        )

        # weight init
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, mean=0, std=0.01)
                nn.init.zeros_(m.bias)

    def _make_layer(self, in_channel, out_channel, groups, block_num, is_stage2):
        layers = []
        layers.append(ShuffleNetUnits(in_channel=in_channel, out_channel=out_channel, stride=2, groups=1 if is_stage2 else groups))
        for _ in range(1, block_num):
            layers.append(ShuffleNetUnits(in_channel=out_channel, out_channel=out_channel, stride=1, groups=groups))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)

        x = self.globalpool(x)
        x = torch.flatten(x, start_dim=1)
        x = self.fc(x)
        return x


def shufflenet_g1(**kwargs):
    planes = [144, 288, 576]
    layers = [4, 8, 4]
    model = ShuffleNet(planes=planes, layers=layers, groups=1)
    return model

def shufflenet_g2(**kwargs):
    planes = [200, 400, 800]
    layers = [4, 8, 4]
    model = ShuffleNet(planes=planes, layers=layers, groups=2)
    return model


model = shufflenet_g2()
x = torch.randn(1, 3, 224, 224)
out = model(x)
print(out.size())

2. ShuffleNet V2

主要的创新点是提出了4条轻量化的原则，并基于此4条原则，在v1基础上提出了ShuffleNet V2。

2.1 FLOPs

$F L OP s$ ：浮点运算数，理解为计算量。可以用来衡量算法/模型的复杂度。

对于卷积层：

$(C_{in} \times K^2)HWC_{out}$

$C_{in}$ ：input channel
$K$ ：kernel size
$H 、 W$ ：output feature map size
$C_{out}$ ：output channel

$M A C$ ：memory access cost

2.2 Practical Guidelines

G1. 输入输出具有相同`channel`时，内存消耗最小

在这里插入图片描述

$(c_1\times 1 \times 1)hwc_2$

$hwc_1 + hwc_2 + 1 \times 1 \times c1 \times c2 = hw(c_1+c_2) + c_1 c_2$

由均值不等式可知，当 $c_1 = c_2$ 时，取得最小值。
在这里插入图片描述

G2. 过多的分组卷积操作会增大 $M A C$ ，从而使模型速度变慢

在这里插入图片描述

$(c_1/g \times 1 \times 1)hwc_2=hwc_1c_2/g$

$hwc_1 + hwc_2 + 1 \times 1 \times c_1 \times c_2 / g = hw(c_1+c_2) + c_1 c_2 /g$

MAC 与 FLOPs的关系如下：

MAC = $hwc_1 + \frac {FLOPs \times g} {c_1} + \frac {FLOPs} {hw}$
当FLOPs不变时， $g$ 越大，MAC也越大

在这里插入图片描述

G3. 模型中的分支数量越少，模型速度越快

在这里插入图片描述

G4. Element-wise操作不能被忽略

element-wise操作包含ReLU，ADDTensor，ADDbias，depthwise conv
在这里插入图片描述

在这里插入图片描述

2.3. ShuffleNet V2

ShuffleNetV2 的结构单元：
在这里插入图片描述

ShuffleNetV1 的不足之处：
在这里插入图片描述

ShuffleNetV2 的网络结构：
在这里插入图片描述

2.4. Pytorch实现

import torch
import torch.nn as nn


class ConvBNReLu(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size, stride, groups):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLu, self).__init__(
            nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups),
            nn.BatchNorm2d(out_channel),
            nn.ReLU6(inplace=True),
        )


class ConvBN(nn.Sequential):
    def __init__(self, in_channel, out_channel, kernel_size, stride, groups):
        padding = (kernel_size - 1) // 2
        super(ConvBN, self).__init__(
            nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups),
            nn.BatchNorm2d(out_channel),
        )


class HalfSplit(nn.Module):
    def __init__(self, dim=0, first_half=True):
        super(HalfSplit, self).__init__()
        self.first_half = first_half
        self.dim = dim

    def forward(self, x):
        splits = torch.chunk(x, 2, dim=self.dim)
        return splits[0] if self.first_half else splits[1]
    

class ChannelShuffle(nn.Module):
    def __init__(self, groups):
        super(ChannelShuffle, self).__init__()
        self.groups = groups

    def forward(self, x):
        # Channel shuffle: [N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]
        batch_size, num_channels, height, width = x.size()
        channels_per_group = num_channels // self.groups
        x = x.view(batch_size, self.groups, channels_per_group, height, width)
        x = torch.transpose(x, dim0=1, dim1=2).contiguous()
        x = x.view(batch_size, -1, height, width)
        return x


class ShuffleNetUnits(nn.Module):
    def __init__(self, in_channel, out_channel, stride, groups):
        super(ShuffleNetUnits, self).__init__()
        self.stride = stride
        if self.stride > 1:
            mid_channel = out_channel - in_channel
        else:
            mid_channel = out_channel // 2
            in_channel = mid_channel
            self.first_split = HalfSplit(dim=1, first_half=True)
            self.second_split = HalfSplit(dim=1, first_half=False)

        self.bottleneck = nn.Sequential(
            ConvBNReLu(in_channel=in_channel, out_channel=in_channel, kernel_size=1, stride=1, groups=1),
            ConvBN(in_channel=in_channel, out_channel=mid_channel, kernel_size=3, stride=stride, groups=groups),
            ConvBNReLu(in_channel=mid_channel, out_channel=mid_channel, kernel_size=1, stride=1, groups=1),
        )

        if self.stride > 1:
            self.shortcut = nn.Sequential(
                ConvBN(in_channel=in_channel, out_channel=in_channel, kernel_size=3, stride=stride, groups=groups),
                ConvBNReLu(in_channel=in_channel, out_channel=in_channel, kernel_size=1, stride=1, groups=1),
            )

        self.channel_shuffle = ChannelShuffle(groups=groups)

    def forward(self, x):
        if self.stride > 1:
            x1 = self.bottleneck(x)
            x2 = self.shortcut(x)
        else:
            x1 = self.first_split(x)
            x2 = self.second_split(x)
            x1 = self.bottleneck(x1)
        out = torch.cat([x1, x2], dim=1)
        out = self.channel_shuffle(out)
        return out


class ShuffleNetV2(nn.Module):
    def __init__(self, planes, layers, groups, num_classes=1000):
        super(ShuffleNetV2, self).__init__()
        self.groups = groups

        self.stage1 = nn.Sequential(
            ConvBNReLu(in_channel=3, out_channel=24, kernel_size=3, stride=2, groups=1),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
        )
        self.stage2 = self._make_layers(in_channel=24, out_channel=planes[0], block_num=layers[0], is_stage2=True)
        self.stage3 = self._make_layers(in_channel=planes[0], out_channel=planes[1], block_num=layers[1], is_stage2=False)
        self.stage4 = self._make_layers(in_channel=planes[1], out_channel=planes[2], block_num=layers[2], is_stage2=False)

        self.conv5 = ConvBNReLu(in_channel=planes[2], out_channel=planes[3], kernel_size=1, stride=1, groups=1)
        self.globalpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Sequential(
            nn.Dropout(p=0.2),
            nn.Linear(in_features=planes[3], out_features=num_classes)
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.Linear):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layers(self, in_channel, out_channel, block_num, is_stage2):
        layers = []
        layers.append(ShuffleNetUnits(in_channel=in_channel, out_channel=out_channel, stride=2, groups=1 if is_stage2 else self.groups))
        for _ in range(1, block_num):
            layers.append(ShuffleNetUnits(in_channel=out_channel, out_channel=out_channel, stride=1, groups=self.groups))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)

        x = self.conv5(x)
        x = self.globalpool(x)
        x = torch.flatten(x, start_dim=1)
        x = self.fc(x)
        return x


def shufflenet_v2_x0_5(**kwargs):
    planes = [48, 96, 192, 1024]
    layers = [4, 8, 4]
    model = ShuffleNetV2(planes=planes, layers=layers, groups=1)
    return model