hw3

1

a.

∂yi∂xj=∂(σ(Wi1x1+...+Widxd))∂xj=∂(11+e−(Wi1x1+...+Widxd))∂xj=e−(Wi1x1+...+Widxd)Wij(1+e−(Wi1x1+...+Widxd))2 \frac{\partial y_i}{\partial x_j}\newline =\frac{\partial(\sigma(W_{i1}x_1+...+W_{id}x_d))}{\partial x_j}\newline =\frac{\partial(\frac{1}{1+e^{-(W_{i1}x_1+...+W_{id}x_d)}})}{\partial x_j}\newline =\frac{e^{-(W_{i1}x_1+...+W_{id}x_d)}W_{ij}}{(1+e^{-(W_{i1}x_1+...+W_{id}x_d)})^2} xjyi=xj(σ(Wi1x1+...+Widxd))=xj(1+e(Wi1x1+...+Widxd)1)=(1+e(Wi1x1+...+Widxd))2e(Wi1x1+...+Widxd)Wij
let
qi=e−(Wi1x1+...+Widxd)(1+e−(Wi1x1+...+Widxd))2 q_i=\frac{e^{-(W_{i1}x_1+...+W_{id}x_d)}}{(1+e^{-(W_{i1}x_1+...+W_{id}x_d)})^2} qi=(1+e(Wi1x1+...+Widxd))2e(Wi1x1+...+Widxd)

∂Y∂X=[q1W11...q1W1d.........qnWn1...qnWnd]=[q1...0.........0...qn]∗[W11...W1d.........Wn1...Wnd] \frac{\partial Y}{\partial X}= \left[ \begin{matrix} q_1W_{11} & ... & q_1W_{1d} \\ ... & ... &... \\ q_nW_{n1} & ... & q_nW_{nd} \end{matrix} \right]\newline =\left[ \begin{matrix} q_1 & ... & 0\\ ... & ... &... \\ 0 & ... & q_n \end{matrix} \right]*\left[ \begin{matrix} W_{11} & ... & W_{1d} \\ ... & ... &... \\ W_{n1} & ... & W_{nd} \end{matrix} \right] XY=q1W11...qnWn1.........q1W1d...qnWnd=q1...0.........0...qnW11...Wn1.........W1d...Wnd
calculate

let z=Wxσ′(zi)=qiσ′(z)=[q1q2...qn] let \space z = Wx\newline \sigma_{'}(z_i)=q_i\newline \sigma_{'}(z) =\left[ \begin{matrix} q_1 \\ q_2 \\ ... \\ q_n \\ \end{matrix} \right] let z=Wxσ(zi)=qiσ(z)=q1q2...qn

calculate

∂Y∂X=diag(σ′)∗W \frac{\partial Y}{\partial X}=diag(\sigma^{'})*W XY=diag(σ)W

b.Derive the quantity ∂L∂W=∑t=0T∑k=1t∂Lt∂ht∂ht∂hk∂hk∂W\frac{\partial L}{\partial W}=\sum_{t=0}^T\sum_{k=1}^t\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}WL=t=0Tk=1thtLthkhtWhk

根据连式法则:
hk=f1(x;W)ht=f2(y1,W2)Lt=Loss(ht,hGT)于是有:∂Lt∂W1∂Lt∂W2∂Lt∂W2=(∂Lt∂ht)(∂ht∂W2)∂Lt∂W=(∂Lt∂ht)(∂hk∂W)LtW=∂Lt∂ht∂ht∂hk∂hk∂W于是直到T次有:∂L∂W=∑t=0T∑k=1t∂Lt∂ht∂ht∂hk∂hk∂W h_k=f_1(x;W)\newline h_t=f_2(y_1,W_2)\newline L_t=Loss(h_t,h_{GT})\newline 于是有:\newline \frac{\partial L_t}{\partial W_1}\frac{\partial L_t}{\partial W_2}\newline \frac{\partial L_t}{\partial W_2}=(\frac{\partial L_t}{\partial h_t})(\frac{\partial h_t}{\partial W_2})\newline \frac{\partial L_t}{\partial W}=(\frac{\partial L_t}{\partial h_t})(\frac{\partial h_k}{\partial W})\newline \frac{L_t}{W}=\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}\newline 于是直到T次有: \newline \frac{\partial L}{\partial W}=\sum_{t=0}^T\sum_{k=1}^t\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W} hk=f1(x;W)ht=f2(y1,W2)Lt=Loss(ht,hGT)W1LtW2LtW2Lt=(htLt)(W2ht)WLt=(htLt)(Whk)WLt=htLthkhtWhkTWL=t=0Tk=1thtLthkhtWhk

2.

a.

当T=3:
∂L∂W=∑t=03∑k=1t∂Lt∂ht∂ht∂hk∂hk∂W=∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W \frac{\partial L}{\partial W}=\sum_{t=0}^3\sum_{k=1}^t\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}\newline =\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}+ \frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}+ \frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}\newline+ \frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}+ \frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}+ \frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W} WL=t=03k=1thtLthkhtWhk=htLthkhtWhk+htLthkhtWhk+htLthkhtWhk+htLthkhtWhk+htLthkhtWhk+htLthkhtWhk

b.

Mn=Mn−1MM=QAQ−1Mn=Mn−1QAQ−1=Mn−2QAQ−1QAQ−1=Mn−2QA2Q−1...=QAnQ−1 M^n=M^{n-1}M\newline M=QAQ^{-1}\newline M^n=M^{n-1}QAQ^{-1}\newline =M^{n-2}QAQ^{-1}QAQ^{-1}\newline =M^{n-2}QA^2Q^{-1}\newline ...\newline =QA^nQ^{-1} Mn=Mn1MM=QAQ1Mn=Mn1QAQ1=Mn2QAQ1QAQ1=Mn2QA2Q1...=QAnQ1

c.

A30=[0.930000.430]w30=[0.6∗0.9300.8∗0.4300.8∗0.9300.6∗0.430] A^{30}=\left[ \begin{matrix} 0.9^{30} & 0 \\ 0 & 0.4^{30} \\ \end{matrix} \right] w^{30}=\left[ \begin{matrix} 0.6*0.9^{30} & 0.8*0.4^{30} \\ 0.8*0.9^{30} & 0.6*0.4^{30} \\ \end{matrix} \right] A30=[0.930000.430]w30=[0.60.9300.80.9300.80.4300.60.430]
分析:通过计算矩阵的 30 次方最后矩阵的值都会趋于 0,如果特征值的绝对值都小于1则在矩阵n次方后,特征值会趋近于0,所以计算结果趋近于0,但是如果一个特征值大于1,那么在指数增长下,对应的列会趋近于无穷。

3.

a.

三个函数都是LSTMs中的门函数,用于保护和控制单元的状态。每一个门函数其基础函数都是sigmoid函数。其中it,oti_t,o_tit,ot在LSTMs中起到控制状态信息存储的功能。
ftf_tft:遗忘层,它对每一个Ct1C_{t1}Ct1生成一个0到1之间的数,1表示完全保留,0表示完全放弃。
iti_tit:输入层,生成0到1之间的数,从而决定要更新的值,并且在tanh层创建新的候选值,添加到里面,之后将两个结合,创建一个更新。
oto_tot:输出层,生成-1到1之间的数,决定单元状态中哪一个需要被输出,在经过tanh之后,利用sigmoid函数相乘,即可得到对应需要的输出。

b.

因为 f t , i t , o t 总是非负数, 并且取值范围为 [0,1],由于 f t , i t , o t 都是属于 sigmoid 函
数,则对应的值输出区间为 [0,1], 其余函数由于与 tanh 函数有关,取值范围在 [-1,1] 之
间。

c.

因为∂Ct∂Ck=∏i=k+1t∂Ct∂Ct−1\frac{\partial C_t}{\partial C_k}=\prod_{i=k+1}^t\frac{\partial C_t}{\partial C_{t-1}}CkCt=i=k+1tCt1Ct
所以由ft=1,it=0f_t=1,i_t=0ft=1,it=0可以得:
Ct=ft⊗Ct−1+it⊗Ct‾Ct=Ct−1 C_t=f_t\otimes C_{t-1}+i_t\otimes\overline{C_t}\newline C_t=C_{t-1}\newline Ct=ftCt1+itCtCt=Ct1
所以:

∂Ct∂Ck=∏i=k+1t∂Ct∂Ct−1=∏i=k+1t1 \frac{\partial C_t}{\partial C_k}=\prod_{i=k+1}^{t}\frac{\partial C_t}{\partial C_{t-1}}\newline =\prod_{i=k+1}^{t}1 CkCt=i=k+1tCt1Ct=i=k+1t1

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值