Resnet的实现以及add层与concatenate层的区别

最新推荐文章于 2025-02-25 10:25:23 发布

卷福1995

最新推荐文章于 2025-02-25 10:25:23 发布

阅读量3.8k

点赞数 1

分类专栏： Python 深度学习

Python 同时被 2 个专栏收录

16 篇文章

订阅专栏

深度学习

1 篇文章

订阅专栏

本文详细介绍了ResNet和InceptionV3等深度学习架构的设计理念与实现方式，并对比了add层与concatenate层的区别。ResNet通过skip connection解决梯度消失问题，而InceptionV3采用多尺度卷积核提升特征提取能力。

Resnet的实现以及add层与concatenate层的区别

ResNet个人总结
Inception V3
Xception

ResNet个人总结

在这里插入图片描述

'''

何凯明大神在2015年提出了Resnet结构，于2016年对于该结构进行了优化，提出了Resnet-bottlenet结构，
本文代码基于Resnet-bottlenet结构进行实现,本文主要实现pre-activation residual module

参考：https://blog.youkuaiyun.com/mdjxy63/article/details/81105713
'''

from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D, AveragePooling2D, MaxPooling2D, ZeroPadding2D
from keras.layers.core import Activation, Flatten, Dense, Dropout
from keras.layers import Input, add
from keras.regularizers import l2
from keras.utils.vis_utils import plot_model
import keras.backend as K

#resnet做加和操作，因此用add函数，
# googlenet以及densenet做filter的拼接，因此用concatenate
#add和concatenate的区别参考链接：https://blog.youkuaiyun.com/u012193416/article/details/79479935
class ResNet:
    def residual_module(x, K, stride, chanDim, reduce=False, reg=1e-4, bnEps=2e-5, bnMom = 0.9):
        '''
        :param x:残差模块的输入
        :param K:在残差连接处（bottlenecks）最后一个Conv的过滤器数量，即最终卷积层的输出
        :param stride:控制卷积的步长，帮助减少the spatial dimensions ， 而不是使用max-pooling
        :param chanDim:定义将执行批处理规范化的轴
        :param reduce:并不是所有的residual_module都负责减少the dimensions of spatial volums，使用bool值控制
        :param reg:控制residual_module中所有CONV层的正则化强度
        :param bnEps:防止BN层出现除以0的异常
        :param bnMom:控制the momentum for the moving average
        :return:返回一个residual_module
        '''

        # Resnet 模块的shortcut应该初始化为输入数据
        shortcut = x

        # the first block of the Resnet module -- 1x1 Convs
        bn1 = BatchNormalization(axis=chanDim, epsilon=bnEps, momentum=bnMom)(x)
        ac1 = Activation('relu')(bn1)

        # 因为biases在BN层中，紧随卷积之后，所以没必要引入*second*偏差项，因为我们改变了典型
        # 的Conv块顺序，而是使用了*Pre-activation*
        conv1 = Conv2D(int(K * 0.25), (1, 1), use_bias=False, kernel_regularizer=l2(reg))(ac1)

        # the second block of the Resnet module -- 3x3 Convs
        bn2 = BatchNormalization(axis=chanDim, epsilon=bnEps, momentum=bnMom)(conv1)
        ac2 = Activation('relu')(bn2)
        conv2 = Conv2D(int(K * 0.25), (3, 3), strides=stride, padding='same',use_bias=False, kernel_regularizer=l2(reg))(ac2)

        # the third block of the Resnet module -- 1x1 Convs
        # 注意：这里Conv是K
        bn3 = BatchNormalization(axis=chanDim, epsilon=bnEps, momentum=bnMom)(conv2)
        ac3 = Activation('relu')(bn3)
        conv3 = Conv2D(int(K), (1, 1), use_bias=False, kernel_regularizer=l2(reg))(ac3)

        # If we would like to reduce the spatial size, apply a CONV layer to the shortcut
        if reduce: #是否需要降维，如果降维的话，需要将stride设置为大于1，更改shortcut值
            shortcut = Conv2D(K, (1, 1), strides=stride, use_bias=False, kernel_regularizer=l2(reg))(ac1) # 在shortcut分支上再加个卷积

        # Add together the shortcut (shortcut branch) and the final CONV (main branch)
        x = add([conv3, shortcut])

        return x # f(x)输出结果=Conv3+shortcut

    def build(width, heigth, depth, classes, stages, filters, reg=1e-4, bnEps=2e-5, bnMom=0.9, dataset='cifar'):
        '''
        :param width:
        :param heigth:
        :param depth:
        :param classes:
        :param stages:
        :param filters:
        :param reg:
        :param bnEps:
        :param bnMom:
        :param dataset:
        :return:
        '''

        inputShape = (heigth, width, depth)
        chanDim = -1

        # if channels order is "channels first", modify the input shape and channels dimension
        if K.image_data_format() == 'channels_first':
            inputShape = (depth, heigth, width)
            chanDim = 1

        # set the input and apply BN layer
        input = Input(shape=inputShape)

        # use BN layer as the first layer, acts as an added level of nomalization
        # 第一层用BN而不是CONV是因为取代数据归一化的操作
        x = BatchNormalization(axis=chanDim, epsilon=bnEps, momentum=bnMom)(input)

        # check if trained on the CIFAR dataset
        if dataset == 'cifar':
            # Apply the first and signale CONV layer
            x = Conv2D(filters[0], (3, 3), use_bias=False, padding='same', kernel_regularizer=l2(reg))(x)

        # Loop over the number of stages (block names)
        for i in range(0, len(stages)):
            # initialize the stride, then apply a residual module used to reduce the spatial size of the input volume
            # if this is the first entry in the stage, we'll set the stride to (1, 1), indicating the no downsampling
            # should be performed. However, for every subsequent stage we'll apply a residual module with a stride of (2, 2)
            # which will allow us to decrease the volume size
            stride = (1, 1) if i == 0 else (2, 2)

            # Once we have stacked stages[i] residual modules on top of each other, our for loop brings us bcak up to here
            # where we decrease the spatial dimensions of the volume and repeat the process
            x = ResNet.residual_module(x, filters[i + 1], stride=stride, chanDim=chanDim, reduce=True, bnEps=bnEps, bnMom=bnMom)

            # Loop over the number of layers in stage.
            for j in range(0, stages[i] - 1):
                # apply a residual module
                x = ResNet.residual_module(x, filters[i + 1], stride=stride, chanDim=chanDim, reduce=True, bnEps=bnEps, bnMom=bnMom)

        # After stacked all the residual modules on top of each other, we would move to the classifier stage.
        # Apply BN=>ACT=>POOL, in order to avoid using dense/FC layers we would instead apply Global Averager POOL to reduce
        # the volume size to 1x1xclasses
        x = BatchNormalization(axis=chanDim, epsilon=bnEps, momentum=bnMom)
        x = Activation('relu')(x)
        x = AveragePooling2D((8, 8))(x)

        # softmax classifier
        x = Flatten()(x)
        x = Dense(classes, kernel_regularizer=l2(reg))(x)
        x = Activation('softmax')(x)

        # Construct the model
        model = Model(input, x, name='ResNet')

        model.summary()

        return model

'''
对于网络结构的说明：
我们可以通过这段代码生成resnet.png文件,stages和filters的实现
 
首先我们在Input之后并不是+conv，而是+BN,这样可以省去对于图像的均值处理
然后由于我们输入是cifar10，因此需要加上卷积层
然后开始进入residual_module函数
1）调用参数reduce=True的residual_model用来降维
2）循环stages[i]-1次,调用residual_model，其参数reduce=False
 
因此最终residual_module的调用次数是：
[1+(3-1)]+[1+(4-1)]+[1+(6-1)]=13
'''
# 可视化ResNet
model = ResNet.build(32, 32, 3, 10, stages=[3, 4, 6], filters=[64, 128, 256, 512])
plot_model(model, to_file='ResNet.png', show_shapes=True)

Inception V3

“Inception”微架构由Szegedy等人在2014年论文"Going Deeper with Convolutions"中首次提出

GoogLeNet中所用的原始Inception模型
Inception模块的目的是充当“多级特征提取器”，使用1×1、3×3和5×5的卷积核，最后把这些卷积输出连接起来，当做下一层的输入。

这种架构先前叫GoogLeNet，现在简单地被称为Inception vN，其中N指的是由Google定的版本号。Keras库中的Inception V3架构实现基于Szegedy等人后来写的论文"Rethinking the Inception Architecture for Computer Vision"，其中提出了对Inception模块的更新，进一步提高了ImageNet分类效果。Inception V3的weight数量小于VGG和ResNet，大小为96MB。