一、Batch Normalization的前向与反向传播
前向传播:
反向传播:
根据前向传播过程,可以获得如下计算图,均值和方差都省略了下标B。
假定损失函数为 L L L,已知 L L L相对于 y i y_{i} yi 的偏导 ∂ L ∂ y i \frac{\partial L}{\partial y_{i}} ∂yi∂L, 求 ∂ L ∂ γ \frac{\partial L}{\partial \gamma} ∂γ∂L, ∂ L ∂ β \frac{\partial L}{\partial \beta} ∂β∂L, ∂ L ∂ x i \frac{\partial L}{\partial x_{i}} ∂xi∂L,前两个比较直观,求 ∂ L ∂ x i \frac{\partial L}{\partial x_{i}} ∂xi∂L需要用到 ∂ L ∂ x ^ i \frac{\partial L}{\partial \widehat{x}_{i}} ∂x i∂L, 直接列出:
∂ L ∂ γ = ∑ i = 1 m ∂ L ∂ y i ⋅ x i ^ ( 1 ) \frac{\partial L}{\partial \gamma}=\sum_{i=1}^{m} \frac{\partial L}{\partial y_{i}} \cdot \widehat{x_{i}} \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ }(1) ∂γ∂L=i=1∑m∂yi∂L⋅xi (1)
∂ L ∂ β = ∑ i = 1 m ∂ L ∂ y i ( 2 ) \frac{\partial L}{\partial \beta}=\sum_{i=1}^{m} \frac{\partial L}{\partial y_{i}} \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ }(2) ∂β∂L=i=1∑m∂yi∂L (2)
∂ L ∂ x ^ i = ∂ L ∂ y i ⋅ γ ( 3 ) \frac{\partial L}{\partial \widehat{x}_{i}}=\frac{\partial L}{\partial y_{i}} \cdot \gamma\text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ } \text{ }\text{ }(3) ∂x i∂L=