1
a.
∂yi∂xj=∂(σ(Wi1x1+...+Widxd))∂xj=∂(11+e−(Wi1x1+...+Widxd))∂xj=e−(Wi1x1+...+Widxd)Wij(1+e−(Wi1x1+...+Widxd))2
\frac{\partial y_i}{\partial x_j}\newline
=\frac{\partial(\sigma(W_{i1}x_1+...+W_{id}x_d))}{\partial x_j}\newline
=\frac{\partial(\frac{1}{1+e^{-(W_{i1}x_1+...+W_{id}x_d)}})}{\partial x_j}\newline
=\frac{e^{-(W_{i1}x_1+...+W_{id}x_d)}W_{ij}}{(1+e^{-(W_{i1}x_1+...+W_{id}x_d)})^2}
∂xj∂yi=∂xj∂(σ(Wi1x1+...+Widxd))=∂xj∂(1+e−(Wi1x1+...+Widxd)1)=(1+e−(Wi1x1+...+Widxd))2e−(Wi1x1+...+Widxd)Wij
let
qi=e−(Wi1x1+...+Widxd)(1+e−(Wi1x1+...+Widxd))2
q_i=\frac{e^{-(W_{i1}x_1+...+W_{id}x_d)}}{(1+e^{-(W_{i1}x_1+...+W_{id}x_d)})^2}
qi=(1+e−(Wi1x1+...+Widxd))2e−(Wi1x1+...+Widxd)
∂Y∂X=[q1W11...q1W1d.........qnWn1...qnWnd]=[q1...0.........0...qn]∗[W11...W1d.........Wn1...Wnd]
\frac{\partial Y}{\partial X}=
\left[
\begin{matrix}
q_1W_{11} & ... & q_1W_{1d} \\
... & ... &... \\
q_nW_{n1} & ... & q_nW_{nd}
\end{matrix}
\right]\newline
=\left[
\begin{matrix}
q_1 & ... & 0\\
... & ... &... \\
0 & ... & q_n
\end{matrix}
\right]*\left[
\begin{matrix}
W_{11} & ... & W_{1d} \\
... & ... &... \\
W_{n1} & ... & W_{nd}
\end{matrix}
\right]
∂X∂Y=⎣⎡q1W11...qnWn1.........q1W1d...qnWnd⎦⎤=⎣⎡q1...0.........0...qn⎦⎤∗⎣⎡W11...Wn1.........W1d...Wnd⎦⎤
calculate
let z=Wxσ′(zi)=qiσ′(z)=[q1q2...qn] let \space z = Wx\newline \sigma_{'}(z_i)=q_i\newline \sigma_{'}(z) =\left[ \begin{matrix} q_1 \\ q_2 \\ ... \\ q_n \\ \end{matrix} \right] let z=Wxσ′(zi)=qiσ′(z)=⎣⎢⎢⎡q1q2...qn⎦⎥⎥⎤
calculate
∂Y∂X=diag(σ′)∗W \frac{\partial Y}{\partial X}=diag(\sigma^{'})*W ∂X∂Y=diag(σ′)∗W
b.Derive the quantity ∂L∂W=∑t=0T∑k=1t∂Lt∂ht∂ht∂hk∂hk∂W\frac{\partial L}{\partial W}=\sum_{t=0}^T\sum_{k=1}^t\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}∂W∂L=∑t=0T∑k=1t∂ht∂Lt∂hk∂ht∂W∂hk
根据连式法则:
hk=f1(x;W)ht=f2(y1,W2)Lt=Loss(ht,hGT)于是有:∂Lt∂W1∂Lt∂W2∂Lt∂W2=(∂Lt∂ht)(∂ht∂W2)∂Lt∂W=(∂Lt∂ht)(∂hk∂W)LtW=∂Lt∂ht∂ht∂hk∂hk∂W于是直到T次有:∂L∂W=∑t=0T∑k=1t∂Lt∂ht∂ht∂hk∂hk∂W
h_k=f_1(x;W)\newline
h_t=f_2(y_1,W_2)\newline
L_t=Loss(h_t,h_{GT})\newline
于是有:\newline
\frac{\partial L_t}{\partial W_1}\frac{\partial L_t}{\partial W_2}\newline
\frac{\partial L_t}{\partial W_2}=(\frac{\partial L_t}{\partial h_t})(\frac{\partial h_t}{\partial W_2})\newline
\frac{\partial L_t}{\partial W}=(\frac{\partial L_t}{\partial h_t})(\frac{\partial h_k}{\partial W})\newline
\frac{L_t}{W}=\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}\newline
于是直到T次有:
\newline
\frac{\partial L}{\partial W}=\sum_{t=0}^T\sum_{k=1}^t\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}
hk=f1(x;W)ht=f2(y1,W2)Lt=Loss(ht,hGT)于是有:∂W1∂Lt∂W2∂Lt∂W2∂Lt=(∂ht∂Lt)(∂W2∂ht)∂W∂Lt=(∂ht∂Lt)(∂W∂hk)WLt=∂ht∂Lt∂hk∂ht∂W∂hk于是直到T次有:∂W∂L=t=0∑Tk=1∑t∂ht∂Lt∂hk∂ht∂W∂hk
2.
a.
当T=3:
∂L∂W=∑t=03∑k=1t∂Lt∂ht∂ht∂hk∂hk∂W=∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W+∂Lt∂ht∂ht∂hk∂hk∂W
\frac{\partial L}{\partial W}=\sum_{t=0}^3\sum_{k=1}^t\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}\newline
=\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}+
\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}+
\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}\newline+
\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}+
\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}+
\frac{\partial L_t}{\partial h_t}\frac{\partial h_t}{\partial h_k}\frac{\partial h_k}{\partial W}
∂W∂L=t=0∑3k=1∑t∂ht∂Lt∂hk∂ht∂W∂hk=∂ht∂Lt∂hk∂ht∂W∂hk+∂ht∂Lt∂hk∂ht∂W∂hk+∂ht∂Lt∂hk∂ht∂W∂hk+∂ht∂Lt∂hk∂ht∂W∂hk+∂ht∂Lt∂hk∂ht∂W∂hk+∂ht∂Lt∂hk∂ht∂W∂hk
b.
Mn=Mn−1MM=QAQ−1Mn=Mn−1QAQ−1=Mn−2QAQ−1QAQ−1=Mn−2QA2Q−1...=QAnQ−1 M^n=M^{n-1}M\newline M=QAQ^{-1}\newline M^n=M^{n-1}QAQ^{-1}\newline =M^{n-2}QAQ^{-1}QAQ^{-1}\newline =M^{n-2}QA^2Q^{-1}\newline ...\newline =QA^nQ^{-1} Mn=Mn−1MM=QAQ−1Mn=Mn−1QAQ−1=Mn−2QAQ−1QAQ−1=Mn−2QA2Q−1...=QAnQ−1
c.
A30=[0.930000.430]w30=[0.6∗0.9300.8∗0.4300.8∗0.9300.6∗0.430]
A^{30}=\left[
\begin{matrix}
0.9^{30} & 0 \\
0 & 0.4^{30} \\
\end{matrix}
\right]
w^{30}=\left[
\begin{matrix}
0.6*0.9^{30} & 0.8*0.4^{30} \\
0.8*0.9^{30} & 0.6*0.4^{30} \\
\end{matrix}
\right]
A30=[0.930000.430]w30=[0.6∗0.9300.8∗0.9300.8∗0.4300.6∗0.430]
分析:通过计算矩阵的 30 次方最后矩阵的值都会趋于 0,如果特征值的绝对值都小于1则在矩阵n次方后,特征值会趋近于0,所以计算结果趋近于0,但是如果一个特征值大于1,那么在指数增长下,对应的列会趋近于无穷。
3.
a.
三个函数都是LSTMs中的门函数,用于保护和控制单元的状态。每一个门函数其基础函数都是sigmoid函数。其中it,oti_t,o_tit,ot在LSTMs中起到控制状态信息存储的功能。
ftf_tft:遗忘层,它对每一个Ct1C_{t1}Ct1生成一个0到1之间的数,1表示完全保留,0表示完全放弃。
iti_tit:输入层,生成0到1之间的数,从而决定要更新的值,并且在tanh层创建新的候选值,添加到里面,之后将两个结合,创建一个更新。
oto_tot:输出层,生成-1到1之间的数,决定单元状态中哪一个需要被输出,在经过tanh之后,利用sigmoid函数相乘,即可得到对应需要的输出。
b.
因为 f t , i t , o t 总是非负数, 并且取值范围为 [0,1],由于 f t , i t , o t 都是属于 sigmoid 函
数,则对应的值输出区间为 [0,1], 其余函数由于与 tanh 函数有关,取值范围在 [-1,1] 之
间。
c.
因为∂Ct∂Ck=∏i=k+1t∂Ct∂Ct−1\frac{\partial C_t}{\partial C_k}=\prod_{i=k+1}^t\frac{\partial C_t}{\partial C_{t-1}}∂Ck∂Ct=∏i=k+1t∂Ct−1∂Ct
所以由ft=1,it=0f_t=1,i_t=0ft=1,it=0可以得:
Ct=ft⊗Ct−1+it⊗Ct‾Ct=Ct−1
C_t=f_t\otimes C_{t-1}+i_t\otimes\overline{C_t}\newline
C_t=C_{t-1}\newline
Ct=ft⊗Ct−1+it⊗CtCt=Ct−1
所以:
∂Ct∂Ck=∏i=k+1t∂Ct∂Ct−1=∏i=k+1t1 \frac{\partial C_t}{\partial C_k}=\prod_{i=k+1}^{t}\frac{\partial C_t}{\partial C_{t-1}}\newline =\prod_{i=k+1}^{t}1 ∂Ck∂Ct=i=k+1∏t∂Ct−1∂Ct=i=k+1∏t1
952

被折叠的 条评论
为什么被折叠?



