神经网络及相关公式推导

本文深入探讨了神经网络的基本概念,包括输入输出表示、代价函数的定义与计算,以及前向传播和后向传播算法的详细步骤。通过数学公式解析了神经网络训练过程中的关键环节,如偏导数计算、梯度下降等。

1. 神经网络

neural_network
输入[x1,x2,...,xn][x_1, x_2,...,x_n][x1,x2,...,xn],输出[y1,y2,...,yk][y_1, y_2,...,y_k][y1,y2,...,yk]
当输出分类k>2k>2k>2时,使用
[10...0],[01...0],[0...10],[00...1]\begin{bmatrix}1\\0\\... \\0\end{bmatrix},\begin{bmatrix}0\\1\\...\\0\end{bmatrix},\begin{bmatrix}0\\...\\1\\0\end{bmatrix},\begin{bmatrix}0\\0\\...\\1\end{bmatrix} 10...0,01...0,0...10,00...1
作为输出。

2. 代价函数

J(Θ)=−1m[∑i=1m∑k=1Kyk(i)log(hΘ(x(i)))k+(1−yk(i))log(1−(hΘ(x(i)))k)]+λ2m∑l=1L−1∑i=1sl∑j=1s(l+1)(Θji(l))2J(\Theta)=-\frac{1}{m}\left[\sum_{i=1}^{m}\sum_{k=1}^{K}y_k^{(i)}log(h_{\Theta}(x^{(i)}))_k+(1-y_k^{(i)})log(1-(h_{\Theta}(x^{(i)}))_k)\right]+\frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_{(l+1)}}(\Theta_{ji}^{(l)})^2 J(Θ)=m1[i=1mk=1Kyk(i)log(hΘ(x(i)))k+(1yk(i))log(1(hΘ(x(i)))k)]+2mλl=1L1i=1slj=1s(l+1)(Θji(l))2

3. 前向传播

a(1)=xa(l)=hΘ(l)(a(l−1)^)=g(Θ(l−1)[1a(l−1)]),1<l≤LΘ(l)∈Rsl+1×(sl+1) a^{(1)}=x \\a^{(l)} = h_{\Theta^{(l)}}(\widehat{a^{(l-1)}})=g\left({\Theta^{(l-1)}}\begin{bmatrix}1\\a^{(l-1)}\end{bmatrix}\right),1<l\le L \\\Theta^{(l)} \in \Bbb R^{s_{l+1}\times(s_l+1)}a(1)=xa(l)=hΘ(l)(a(l1))=g(Θ(l1)[1a(l1)]),1<lLΘ(l)Rsl+1×(sl+1)

4. 后向传播

δ(L)=(a(L)−y)a(L)(1−a(L))δ(l)=(Θ(l)^)Tδ(l+1).∗a(l).∗(1−a(l)),1<l<LΘ(l)^为不包含θ0的Θ(l)\delta^{(L)} = (a^{(L)}-y)a^{(L)}(1-a^{(L)}) \\ \delta^{(l)} = (\widehat{\Theta^{(l)}})^T\delta^{(l+1)}.*a^{(l)}.*(1-a^{(l)}), 1<l<L \\\widehat{\Theta^{(l)}}为不包含\theta_0的\Theta^{(l)}δ(L)=(a(L)y)a(L)(1a(L))δ(l)=(Θ(l))Tδ(l+1).a(l).(1a(l)),1<l<LΘ(l)θ0Θ(l)

5. 后向传播推导

由前向传播可得:
[θ1,0(l)θ1,1(l)...θ1,sl(l)θ2,0(l)θ2,1(l)...θ2,sl(l)............θs(l+1),0(l)θs(l+1),1(l)...θs(l+1),sl(l)][1a1(l)a2(l)...asl(l)]=[z1(l+1)z2(l+1)...zs(l+1)(l+1)]→g(x)→[a1(l+1)a2(l+1)...as(l+1)(l+1)]\begin{bmatrix}\theta_{1,0}^{(l)} & \theta_{1,1}^{(l)} & ... & \theta_{1,s_l}^{(l)}\\\theta_{2,0}^{(l)} & \theta_{2,1}^{(l)} & ... & \theta_{2,s_l}^{(l)}\\... & ... & ... & ...\\\theta_{s_{(l+1)},0}^{(l)} & \theta_{s_{(l+1)},1}^{(l)} & ... & \theta_{s_{(l+1)},s_l}^{(l)}\\\end{bmatrix}\begin{bmatrix}1 \\a_1^{(l)} \\a_2^{(l)} \\... \\a_{s_l}^{(l)} \\\end{bmatrix}=\begin{bmatrix}z_1^{(l+1)} \\z_2^{(l+1)} \\... \\z_{s_{(l+1)}}^{(l+1)} \\\end{bmatrix}\to g(x) \to\begin{bmatrix}a_1^{(l+1)} \\a_2^{(l+1)} \\... \\a_{s_{(l+1)}}^{(l+1)} \\\end{bmatrix}\\ θ1,0(l)θ2,0(l)...θs(l+1),0(l)θ1,1(l)θ2,1(l)...θs(l+1),1(l)............θ1,sl(l)θ2,sl(l)...θs(l+1),sl(l)1a1(l)a2(l)...asl(l)=z1(l+1)z2(l+1)...zs(l+1)(l+1)g(x)a1(l+1)a2(l+1)...as(l+1)(l+1)
ai(l+1)=g(zi(l+1))=g(θi,0(l)+θi,1(l)a1(l)+θi,2(l)a2(l)+...+θi,sl(l)as1(l))a_i^{(l+1)} = g(z_i^{(l+1)}) = g(\theta_{i,0}^{(l)}+\theta_{i,1}^{(l)}a_1^{(l)}+\theta_{i,2}^{(l)}a_2^{(l)}+...+\theta_{i,s_l}^{(l)}a_{s_1}^{(l)})\\ ai(l+1)=g(zi(l+1))=g(θi,0(l)+θi,1(l)a1(l)+θi,2(l)a2(l)+...+θi,sl(l)as1(l))
设输出对激励输入的偏导数为当前输入的误差,则:
dJ(x,θ)=δi(l+1)d(zi(l+1))=δi(l+1)d(∑k=0slθi,k(l)ak(l))(1) dJ(x,\theta)=\delta_i^{(l+1)}d(z_i^{(l+1)}) = \delta_i^{(l+1)}d(\sum_{k=0}^{s_l}\theta_{i, k}^{(l)}a_k^{(l)}) \tag{1} dJ(x,θ)=δi(l+1)d(zi(l+1))=δi(l+1)d(k=0slθi,k(l)ak(l))(1)
所以有:
dJ(x,θ)d(zm(l))=dJ(x,θ)d(z(l+1))d(z(l+1))d(zm(l))=∑i=1sl+1δi(l+1)d(∑k=0slθi,k(l)ak(l))/d(zm(l))=∑i=1sl+1δi(l+1)∑k=0slθi,k(l)d(ak(l))d(zm(l))=∑i=1sl+1δi(l+1)θi,m(l)am(l)(1−am(l)) \begin{aligned} \frac{dJ(x,\theta)}{d(z_m^{(l)})} &= \frac{dJ(x,\theta)}{d(z^{(l+1)})}\frac{d(z^{(l+1)})}{d(z_m^{(l)})}=\sum_{i=1}^{s_{l+1}}\delta_i^{(l+1)}d(\sum_{k=0}^{s_l}\theta_{i, k}^{(l)}a_k^{(l)})/d(z_m^{(l)}) \\&=\sum_{i=1}^{s_{l+1}}\delta_i^{(l+1)}\sum_{k=0}^{s_l}\theta_{i, k}^{(l)}\frac{d(a_k^{(l)})}{d(z_m^{(l)})}=\sum_{i=1}^{s_{l+1}}\delta_i^{(l+1)}\theta_{i, m}^{(l)}a_m^{(l)}(1-a_m^{(l)})\\ \end{aligned} d(zm(l))dJ(x,θ)=d(z(l+1))dJ(x,θ)d(zm(l))d(z(l+1))=i=1sl+1δi(l+1)d(k=0slθi,k(l)ak(l))/d(zm(l))=i=1sl+1δi(l+1)k=0slθi,k(l)d(zm(l))d(ak(l))=i=1sl+1δi(l+1)θi,m(l)am(l)(1am(l))
得到:
δm(l)=am(l)(1−am(l))∑i=1sl+1δi(l+1)θi,m(l) \delta_m^{(l)}=a_m^{(l)}(1-a_m^{(l)})\sum_{i=1}^{s_{l+1}}\delta_i^{(l+1)}\theta_{i, m}^{(l)}\\ δm(l)=am(l)(1am(l))i=1sl+1δi(l+1)θi,m(l)
向量扩充:
[δ1(l)δ2(l)...δsl(l)]=[a1(l)a2(l)...asl(l)].∗(1−[a1(l)a2(l)...asl(l)]).∗([θ1,1(l)θ2,1(l)...θsl+1,1(l)θ1,2(l)θ2,2(l)...θsl+1,2(l)...θ1,sl(l)θ2,sl(l)...θsl+1,sl(l)][δ1(l+1)δ2(l+1)...δsl+1(l+1)])\begin{bmatrix}\delta_1^{(l)} \\ \delta_2^{(l)} \\ ... \\ \delta_{s_l}^{(l)}\end{bmatrix}=\begin{bmatrix}a_1^{(l)} \\ a_2^{(l)} \\ ... \\ a_{s_l}^{(l)}\end{bmatrix}.*\left (1-\begin{bmatrix}a_1^{(l)} \\ a_2^{(l)} \\ ... \\ a_{s_l}^{(l)}\end{bmatrix}\right).*\left(\begin{bmatrix}\theta_{1, 1}^{(l)} &\theta_{2, 1}^{(l)} &...&\theta_{s_{l+1}, 1}^{(l)}\\ \theta_{1, 2}^{(l)} &\theta_{2, 2}^{(l)} &...&\theta_{s_{l+1}, 2}^{(l)} \\ ... \\ \theta_{1, s_l}^{(l)} &\theta_{2, s_l}^{(l)} &...&\theta_{s_{l+1}, s_l}^{(l)}\end{bmatrix}\begin{bmatrix}\delta_1^{(l+1)} \\ \delta_2^{(l+1)} \\ ... \\ \delta_{s_{l+1}}^{(l+1)}\end{bmatrix}\right) \\ δ1(l)δ2(l)...δsl(l)=a1(l)a2(l)...asl(l).1a1(l)a2(l)...asl(l).θ1,1(l)θ1,2(l)...θ1,sl(l)θ2,1(l)θ2,2(l)θ2,sl(l).........θsl+1,1(l)θsl+1,2(l)θsl+1,sl(l)δ1(l+1)δ2(l+1)...δsl+1(l+1)
最后得到:
δ(l)=a(l).∗(1−a(l)).∗(θ(l)Tδ(l+1))(2)\delta^{(l)}=a^{(l)}.*(1-a^{(l)}).*({\theta^{(l)}}^T\delta^{(l+1)}) \tag{2} δ(l)=a(l).(1a(l)).(θ(l)Tδ(l+1))(2)
又由公式(1)得:
dJ(x,θ)dθi,j(l)=δi(l+1)aj(l)(3) \frac{dJ(x,\theta)}{d\theta_{i, j}^{(l)}}=\delta_i^{(l+1)}a_j^{(l)} \tag{3} dθi,j(l)dJ(x,θ)=δi(l+1)aj(l)(3)
假设J(x,θ)=12(h(x)−y)2J(x,\theta)=\frac{1}{2}(h(x)-y)^2J(x,θ)=21(h(x)y)2,则:
δ(L)=dJ(x,θ)dzL=dJ(x,θ)daLdaLdzL \delta^{(L)}=\frac{dJ(x,\theta)}{dz^{L}}=\frac{dJ(x,\theta)}{da^{L}}\frac{da^{L}}{dz^{L}} δ(L)=dzLdJ(x,θ)=daLdJ(x,θ)dzLdaL
得到
δ(L)=(aL−y)aL(1−aL)(4) \delta^{(L)}=(a^{L}-y)a^{L}(1-a^{L}) \tag{4} δ(L)=(aLy)aL(1aL)(4)

6. 后向传播算法

  1. 误差矩阵Δij(l)\Delta_{ij}^{(l)}Δij(l)初始化为零
  2. For i=1:m
      (1). a(1)=x(i)a^{(1)}=x^{(i)}a(1)=x(i) 其中aaa的上标表示不同的层数,xxx的上标表示不同的测试样本
      (2). 利用前向传播计算a(2),a(3),...,a(L)a^{(2)},a^{(3)},...,a^{(L)}a(2),a(3),...,a(L)
      (3). 初始δ(L)=(a(L)−y(i))a(L)(1−a(L))\delta^{(L)}=(a^{(L)}-y^{(i)})a^{(L)}(1-a^{(L)})δ(L)=(a(L)y(i))a(L)(1a(L))
      (4). 利用后向传播计算δ(l)\delta^{(l)}δ(l)
      (5). Δij(l)\Delta_{ij}^{(l)}Δij(l):=Δij(l)+aj(l)δil+1\Delta_{ij}^{(l)}+a_j^{(l)}\delta_i^{l+1}Δij(l)+aj(l)δil+1
    求出偏导数Dij(l)=1mΔij(l)+λθij(l)D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}+\lambda\theta_{ij}^{(l)}Dij(l)=m1Δij(l)+λθij(l)
  3. 利用偏导数进行梯度下降

7. 神经网络训练

A. 参数随机化,若Θ\ThetaΘ全为0,会导致全为相同值,所以必须初始化初始值为随机值。

B. 利用正向传播计算所有层的值a(l)a^{(l)}a(l)

C. 计算此时的代价函数J(Θ)J(\Theta)J(Θ)

D. 利用后向传播计算所有偏导数

E. 利用数值检验法检验偏导数(D≈J(Θ+ϵ)−J(Θ−ϵ)2ϵD\approx\frac{J(\Theta+\epsilon)-J(\Theta-\epsilon)}{2\epsilon}D2ϵJ(Θ+ϵ)J(Θϵ)

F. 利用优化算法最小化代价函数(梯度下降)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值