CNN (Convolutional Neural Network)
CNN原理
简单阐述
![]() |
![]() |
---|---|
拓扑图 | 模型图 |
Left: A regular 3-layer Neural Network. Right: A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).
A ConvNet is made up of Layers. Every Layer has a simple API:It transforms an input 3D volume to an output 3D volume with some differentiable function that may or may not have parameters.
一个简单的卷积神经网络(ConvNet)就是有一组层(a sequence of layers)构成的。通常用以下三种层来构建神经网络:卷积层(Convolutional Layer),池化层(Pooling Layer),全连接层(Fully-Connected Layer)
- 一个CIFAR-10 分类神经网络例子Example Architecture: Overview*. We will go into more details below, but a simple ConvNet for CIFAR-10 classification could have the architecture [INPUT - CONV - RELU - POOL - FC]. In more detail:
-
INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
-
CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.
-
RELU layer will apply an elementwise activation function, such as the $ max(0,x) $ thresholding at zero. This leaves the size of the volume unchanged ([32x32x12]).
-
POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12].
-
FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1x10], where each of the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.
-
In summary
- 一个卷积网络就是有一组 可以将一个volume转化为另一个volume的 层组成的
- 目前流行的层有 CONV/ POOL/ RLU/ FC
- 各个层通过不同的函数来实现将一个volume转化为另一个volume的
- 有的层有参数(parameter),如 CONV/ FC;有的层没有参数,如 RELU/ POOL
- 有的层有超参数(hyperparameter),如 CONV/ FC/ POOL;有的层没有,如 RELU
参数(parameter)通常是在模型训练的过程中,我们根据训练集数据自动得到的。超参(hyperparameter)通常是在模型训练前,我们手动设置的,其目的是为了在训练参数的时候让模型的表现更好。
Convolutional Layer 卷积层
卷积层是承担了主要运算量核心构建层
filter on a layer
-
filter 也就是卷积块,典型的如 [ 5 ∗ 5 ∗ 3 ] [5*5*3] [5∗5∗3]大小。
-
receptive field 感受域,也就是filter的size,宽和高的积,如上个例子中的 [ 5 ∗ 5 ] [5*5] [5∗5]。
-
Local Connectivity 局部连通性。
-
两个例子阐述计算 connection 或者 weight 的数量
Example 1. For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 553 = 75 weights (and +1 bias parameter). Notice that the extent of the connectivity along the depth axis must be 3, since this is the depth of the input volume.
Example 2. Suppose an input volume had size [16x16x20]. Then using an example receptive field size of 3x3, every neuron in the Conv Layer would now have a total of 3320 = 180 connections to the input volume. Notice that, again, the connectivity is local in space (e.g. 3x3), but full along the input depth (20).
Spatial arrangement
Three hyperparameters control the size of the output volume: the depth, stride and zero-padding
- depth:输出的深度
- stride:步长 通常取1或者2
- zero-padding:零填充
计算size of output volume
-
W W W size of input
-
F F F size of filter
-
P P P size of padding
-
S S S stride
size of output volume is ( W + 2 ∗ P − F ) / S + 1 (W + 2*P - F)/S+1 (W+2∗P−F)/S+1
In general, setting zero padding to be P = ( F − 1 ) / 2 P=(F−1)/2 P=(F−1)/2 when the stride is S = 1 S=1 S=1 ensures that the input volume and output volume will have the same size spatially.
Constraints on strides. 根据四个参数计算得到的必须是整数
Parameter Sharing
当output volume的每一个值都由不同的filter计算得到的话,参数的个数十分巨大。因此,对于output volume的每一个深度切片(depth slice),使用同一个filter。
Using the real-world example above, we see that there are 555596 = 290,400 neurons in the first Conv Layer, and each has 11113 = 363 weights and 1 bias. Together, this adds up to 290400 * 364 = 105,705,600 parameters on the first layer of the ConvNet alone. Clearly, this number is very high.
With this parameter sharing scheme, the first Conv Layer in our example would now have only 96 unique set of weights (one for each depth slice)