【PyTorch】详解pytorch中nn模块的BatchNorm2d()函数

最新推荐文章于 2024-08-14 19:11:53 发布

qq_41978139

最新推荐文章于 2024-08-14 19:11:53 发布

阅读量919

点赞数

CC 4.0 BY-SA版权

分类专栏：深度学习

原文链接：https://blog.youkuaiyun.com/bigFatCat_Tom/article/details/91619977?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.

深度学习专栏收录该内容

20 篇文章

订阅专栏

本文详细介绍了卷积神经网络中BatchNorm2d的作用和原理，包括其数学公式以及在PyTorch中的实现。通过示例代码展示了BatchNorm2d如何对输入数据进行归一化处理，解释了参数如num_features、eps、momentum和affine的含义，并通过实际计算验证了 BatchNorm2d 的运算过程。文章还提到了贝塞尔校正的概念及其在计算样本方差中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

基本原理

在卷积神经网络的卷积层之后总会添加BatchNorm2d进行数据的归一化处理，这使得数据在进行Relu之前不会因为数据过大而导致网络性能的不稳定，BatchNorm2d()函数数学原理如下：

BatchNorm2d()内部的参数如下：

1.num_features：一般输入参数为batch_size*num_features*height*width，即为其中特征的数量

2.eps：分母中添加的一个值，目的是为了计算的稳定性，默认为：1e-5

3.momentum：一个用于运行过程中均值和方差的一个估计参数（我的理解是一个稳定系数，类似于SGD中的momentum的系数）

4.affine：当设为true时，会给定可以学习的系数矩阵gamma和beta

上面的讲解还不够形象，我们具体通过如下的代码进行讲解：

代码演示


    
      
       
      
      
       
        #encoding:utf-8
       
      

      
       
      
      
       
        import torch
       
      

      
       
      
      
       
        import torch.nn 
        as nn
       
      

      
       
      
      
       
        #num_features - num_features from an expected input of size:batch_size*num_features*height*width
       
      

      
       
      
      
       
        #eps:default:1e-5 (公式中为数值稳定性加到分母上的值)
       
      

      
       
      
      
       
        #momentum:动量参数，用于running_mean and running_var计算的值，default：0.1
       
      

      
       
      
      
       
        m=nn.BatchNorm2d(
        2,affine=
        True) 
        #affine参数设为True表示weight和bias将被使用
       
      

      
       
      
      
       
        input=torch.randn(
        1,
        2,
        3,
        4)
       
      

      
       
      
      
       
        output=m(input)
       
      

      
       
      
      
        
       
      

      
       
      
      
       
        print(input)
       
      

      
       
      
      
       
        print(m.weight)
       
      

      
       
      
      
       
        print(m.bias)
       
      

      
       
      
      
       
        print(output)
       
      

      
       
      
      
       
        print(output.size())

具体的输出如下：


    
      
       
      
      
       
        tensor([[[[ 
        1.4174
        , 
        -1.9512
        , 
        -0.4910
        , 
        -0.5675
        ],
       
      

      
       
      
      
       
                  [ 
        1.2095,  
        1.0312,  
        0.8652, 
        -0.1177]
        ,
       
      

      
       
      
      
       
                  [
        -0.5964,  
        0.5000, 
        -1.4704,  
        2.3610]
        ],
       
      

      
       
      
      
        
       
      

      
       
      
      
       
                 [[
        -0.8312, 
        -0.8122, 
        -0.3876,  
        0.1245],
       
      

      
       
      
      
       
                  [ 
        0.5627, 
        -0.1876, 
        -1.6413, 
        -1.8722],
       
      

      
       
      
      
       
                  [
        -0.0636,  
        0.7284,  
        2.1816,  
        0.4933]]
        ]])
       
      

      
       
      
      
       
        Parameter containing:
       
      

      
       
      
      
       
        tensor([0.2837, 
        0.1493
        ], 
        requires_grad=True)
       
      

      
       
      
      
       
        Parameter containing:
       
      

      
       
      
      
       
        tensor([0., 
        0
        .], 
        requires_grad=True)
       
      

      
       
      
      
       
        tensor([[[[ 
        0.2892
        , 
        -0.4996
        , 
        -0.1577
        , 
        -0.1756
        ],
       
      

      
       
      
      
       
                  [ 
        0.2405,  
        0.1987,  
        0.1599, 
        -0.0703]
        ,
       
      

      
       
      
      
       
                  [
        -0.1824,  
        0.0743, 
        -0.3871,  
        0.5101]
        ],
       
      

      
       
      
      
        
       
      

      
       
      
      
       
                 [[
        -0.0975, 
        -0.0948, 
        -0.0347,  
        0.0377],
       
      

      
       
      
      
       
                  [ 
        0.0997, 
        -0.0064, 
        -0.2121, 
        -0.2448],
       
      

      
       
      
      
       
                  [ 
        0.0111,  
        0.1232,  
        0.3287,  
        0.0899]]
        ]],
       
      

      
       
      
      
              
        grad_fn=<NativeBatchNormBackward>)
       
      

      
       
      
      
       
        torch.Size([1, 
        2
        , 
        3
        , 
        4
        ])

分析：输入是一个1*2*3*4 四维矩阵，gamma和beta为一维数组，是针对input[0][0]，input[0][1]两个3*4的二维矩阵分别进行处理的，我们不妨将input[0][0]的按照上面介绍的基本公式来运算，看是否能对的上output[0][0]中的数据。首先我们将input[0][0]中的数据输出，并计算其中的均值和方差。


    
      
       
      
      
       
        print(
        "输入的第一个维度:")
       
      

      
       
      
      
       
        print(input[
        0][
        0]) 
        #这个数据是第一个3*4的二维数据
       
      

      
       
      
      
       
        #求第一个维度的均值和方差
       
      

      
       
      
      
       
        firstDimenMean=torch.Tensor.mean(input[
        0][
        0])
       
      

      
       
      
      
       
        firstDimenVar=torch.Tensor.var(input[
        0][
        0],
        False)   
        #false表示贝塞尔校正不会被使用
       
      

      
       
      
      
       
        print(m)
       
      

      
       
      
      
       
        print(
        'm.eps=',m.eps)
       
      

      
       
      
      
       
        print(firstDimenMean)
       
      

      
       
      
      
       
        print(firstDimenVar)

输出结果如下：


    
      
       
      
      
       
        输入的第一个维度:
       
      

      
       
      
      
       
        tensor([[ 
        1.4174
        , 
        -1.9512
        , 
        -0.4910
        , 
        -0.5675
        ],
       
      

      
       
      
      
       
                [ 
        1.2095,  
        1.0312,  
        0.8652, 
        -0.1177]
        ,
       
      

      
       
      
      
       
                [
        -0.5964,  
        0.5000, 
        -1.4704,  
        2.3610]
        ])
       
      

      
       
      
      
       
        BatchNorm2d(2, 
        eps=1e-05, 
        momentum=0.1, 
        affine=True, 
        track_running_stats=True)
       
      

      
       
      
      
       
        m.eps= 
        1e-05
       
      

      
       
      
      
       
        tensor(0.1825)
       
      

      
       
      
      
       
        tensor(1.4675)

我们可以通过计算器计算出均值和方差均正确计算。最后通过公式计算input[0][0][0][0]的值，代码如下：


    
      
       
      
      
       
        batchnormone=((input[
        0][
        0][
        0][
        0]-firstDimenMean)/(torch.pow(firstDimenVar,
        0.5)+m.eps))\
       
      

      
       
      
      
       
            *m.weight[
        0]+m.bias[
        0]
       
      

      
       
      
      
       
        print(batchnormone)