Improving Deep Neural Networks学习笔记(三)

本文探讨了超参数调整的重要性和方法,包括学习率、动量等的选择技巧,并介绍了批量归一化(Batch Normalization)的概念及其实现方式,解释了它如何帮助加速神经网络训练。

文章作者:Tyan
博客:noahsnail.com  |  优快云  |  简书

5. Hyperparameter tuning

5.1 Tuning process

Hyperparameters:

α , β , β1,β2,ϵ , layers, hidden units, learning rate decay, mini-batch size.

The learning rate is the most important hyperparameter to tune. β , mini-batch size and hidden units is second in importance to tune.

Try random values: Don’t use a grid. Corarse to fine.

5.2 Using an appropriate scale to pick hyperparameters

Appropriate scale to hyperparameters:

α=[0.0001,1] , r = -4 * np.random.rand(), α=10r .

If α=[10a,10b] , random pick from [a, b] uniformly, and set α=10r .

Hyperparameters for exponentially weighted average

β=[0.9,0.999] , don’t random pick from [0.9,0.999] . Use 1β=[0.001,0.1] , use similar method lik α .

Why don’t use linear pick? Because when β is close one, even if a little change, it will have a huge impact on algorithm.

5.3 Hyperparameters tuning in practice: Pandas vs Caviar
  • Re-test hyperparamters occasionally

  • Babysitting one model(Pandas)

  • Training many models in parallel(Caviar)

6. Batch Normalization

6.1 Normalizing activations in a network

In logistic regression, normalizing inputs to speed up learning.

  1. compute means μ=1mni=1x(i)
  2. subtract off the means from training set x=xμ \
  3. compute the variances σ2=1mni=1x(i)2
  4. normalize training set X=Xσ2

Similarly, in order to speed up training neural network, we can normalize intermediate values in layers(z in hidden layer), it is called Batch Normalization or Batch Norm.

Implementing Batch Norm

  1. Given some intermediate value in neural network, z(1),z(2),...,z(m)
  2. compute means μ=1mi=1z(i)
  3. compute the variances σ2=1mi=1(z(i)μ)2
  4. normalize z , z(i)=z(i)μ(σ2+ϵ)
  5. compute ẑ  , ẑ =γz(i)+β .

Now we have normalized Z to have mean zero and standard unit variance. But maybe it makes sense for hidden units to have a different distribution. So we use ẑ  instead of z , γ and β are learnable parameters of your model.

6.2 Fitting Batch Norm into a neural network

Add Batch Norm to a network

XZ[1]Ẑ [1]a[1]Z[2]Ẑ [2]a[2]...

Parameters:
W[1],b[1] , W[2],b[2]...
γ[1],β[1] , γ[2],β[2]...

If you use Batch Norm, you need to computing means and subtracting means, so b[i] is useless, so we can set b[i]=0 permanently.

6.3 Why does Batch Norm work?

Covariate Shift: You have learned a function from xy , it works well. If the distribution of x changes, you need to learn a new function to make it work well.

Hidden unit values change all the time, and so it’s suffering from the problem of covariate.

Batch Norm as regularization

  • Each mini-batch is scaled by the mean/variance computed on just that mini-batch.
  • This adds some noise to the values z[l] within that mini-batch. So similar to dropout, it adds some noise to each hidden layer’s activations.

    • This has a slight regularization effect.
    • 6.4 Batch Norm at test time

      In order to apply neural network at test time, come up with some seperate estimate of mu and sigma squared.

      7. Multi-class classification

      7.1 Softmax regression
      7.2 Training a softmax classifier

      Hard max.

      Loss function.

      Gradient descent with softmax.

      8. Programming Frameworks

      8.1 Deep Learning frameworks
      • Caffe/Caffe2
      • TensorFlow
      • Torch
      • Theano
      • mxnet
      • PaddlePaddle
      • Keras
      • CNTK

      Choosing deep learning frameworks

      • Ease of programming (development and deployment)
      • Running speed
      • Truly open (open source with good governance)
      8.2 TensorFlow

### 线性可变形卷积技术增强卷积神经网络性能 线性可变形卷积是一种扩展标准卷积操作的技术,允许模型自适应地调整感受野的位置。这种灵活性使得模型能够更好地捕捉不同尺度和形状的目标特征。 #### 基本原理 在传统卷积中,滤波器按照固定的空间布局滑动并提取特征。然而,在处理复杂场景时,物体可能呈现不同的姿态、比例或遮挡情况。为此,线性可变形卷积引入了一个额外的分支来预测每个位置上的偏移量(offset),这些偏移会作用于原始采样点上[^4]: ```python import torch.nn as nn class LinearDeformConv(nn.Module): def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1): super(LinearDeformConv, self).__init__() # 学习偏移量的子网 self.offset_conv = nn.Conv2d(in_channels=in_channels, out_channels=kernel_size * kernel_size * 2, # xy方向各需一个偏移 kernel_size=kernel_size, stride=stride, padding=padding) # 主干卷积层 self.conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding) def forward(self, x): offsets = self.offset_conv(x) # 预测偏移场 output = deform_conv_function(input=x, offset=offsets, weight=self.conv.weight, bias=self.conv.bias) return output ``` 这里`deform_conv_function()`代表实现具体变形卷积运算的功能函数。通过这种方式,即使面对形态各异的对象实例,也能获得更精准的感受域定位。 #### 性能提升机制 当应用于目标检测任务时,该方法有助于提高边界框回归精度以及分类准确性。特别是在多尺度变换下保持鲁棒性的能力尤为突出[^2]。由于采用了基于数据驱动的方式动态调整空间支持区域,因此相比静态模板化的常规做法更具表达力。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值