状态方程:
st=(1−α)st−1+αtanh(Ast−1+Byt−1)s_t = (1-\alpha) s_{t-1} + \alpha \tanh (As_{t-1} + By_{t-1})st=(1−α)st−1+αtanh(Ast−1+Byt−1)
或者
st+1=(1−α)st+αtanh(Ast+Byt)≜f(st,yt)
\begin{array}{ll}
s_{t+1} &= (1-\alpha) s_{t} + \alpha \tanh (As_{t} + B y_{t}) \\
&\triangleq f(s_t, y_t)
\end{array}
st+1=(1−α)st+αtanh(Ast+Byt)≜f(st,yt)
∂ft∂st=(1−α)I+α∂∂st[tanh(∑iA1ist,i+∑iB1iyt,i)tanh(∑iA2ist,i+∑iB1iyt,i)⋮tanh(∑iAnist,i+∑iB1iyt,i)]=(1−α)I+α[[1−tanh2(∑iA1ist,i+∑iB1iyt,i)]A11⋯[1−tanh2(∑iA1ist,i+∑iB1iyt,i)]A1n⋮⋱⋮[1−tanh2(∑iAnist,i+∑iB1iyt,i)]An1⋯[1−tanh2(∑iAnist,i+∑iB1iyt,i)]Ann]=(1−α)I+α[I−diag(tanh2(Ast+Byt))]A
\begin{array}{ll}
\frac{\partial f_t}{\partial s_t} &= (1-\alpha) I + \alpha \frac{\partial}{\partial s_t}
\left [
\begin{array}{c}
\tanh(\sum_iA_{1i}s_{t,i} + \sum_iB_{1i}y_{t,i}) \\
\tanh(\sum_iA_{2i}s_{t,i} + \sum_iB_{1i}y_{t,i}) \\
\vdots \\
\tanh(\sum_iA_{ni}s_{t,i} + \sum_iB_{1i}y_{t,i}) \\
\end{array}
\right] \\\\
&= (1-\alpha) I + \alpha
\left [
\begin{array}{c}
[1-\tanh^2(\sum_iA_{1i}s_{t,i}+ \sum_iB_{1i}y_{t,i})]A_{11} & \cdots & [1-\tanh^2(\sum_iA_{1i}s_{t,i}+ \sum_iB_{1i}y_{t,i})]A_{1n}\\
\vdots & \ddots &\vdots\\
[1-\tanh^2(\sum_iA_{ni}s_{t,i}+ \sum_iB_{1i}y_{t,i})]A_{n1} & \cdots & [1-\tanh^2(\sum_iA_{ni}s_{t,i}+ \sum_iB_{1i}y_{t,i})]A_{nn}
\end{array}
\right] \\\\
&= (1-\alpha) I + \alpha [I - diag(\tanh^2(As_t + By_t)) ]A
\end{array}
∂st∂ft=(1−α)I+α∂st∂⎣⎢⎢⎢⎡tanh(∑iA1ist,i+∑iB1iyt,i)tanh(∑iA2ist,i+∑iB1iyt,i)⋮tanh(∑iAnist,i+∑iB1iyt,i)⎦⎥⎥⎥⎤=(1−α)I+α⎣⎢⎡[1−tanh2(∑iA1ist,i+∑iB1iyt,i)]A11⋮[1−tanh2(∑iAnist,i+∑iB1iyt,i)]An1⋯⋱⋯[1−tanh2(∑iA1ist,i+∑iB1iyt,i)]A1n⋮[1−tanh2(∑iAnist,i+∑iB1iyt,i)]Ann⎦⎥⎤=(1−α)I+α[I−diag(tanh2(Ast+Byt))]A
所以对于谱范数
∣∣∂ft∂st∣∣2≤(1−α)+α∣∣I−diag(tanh2(Ast+Byt))∣∣2⋅∣∣A∣∣2≤1−α+α∣∣A∣∣2≤1
\begin{array}{ll}
\left|\left|\frac{\partial f_t}{\partial s_t}\right|\right|_2 & \leq
(1-\alpha) + \alpha \left|\left| I - diag(\tanh^2(As_t + By_t)) \right|\right|_2 \cdot \left|\left| A\right|\right|_2 \\
&\leq 1-\alpha + \alpha ||A||_2 \\
&\leq 1
\end{array}
∣∣∣∣∣∣∂st∂ft∣∣∣∣∣∣2≤(1−α)+α∣∣∣∣I−diag(tanh2(Ast+Byt))∣∣∣∣2⋅∣∣A∣∣2≤1−α+α∣∣A∣∣2≤1
1235

被折叠的 条评论
为什么被折叠?



