RNN
我一直以为循环神经网络的输出是上边的y,实际上输出的是a
keras 的SimpleRNN
keras.layers.SimpleRNN(units, activation='tanh', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False)
参数
units: 正整数,输出空间的维度。
return_sequences: 布尔值。是返回输出序列中的最后一个输出,还是全部序列。
return_state: 布尔值。除了输出之外是否返回最后一个状态。
对于SimpleRNN来说,如果return_sequences=False,return_state=True的话,将返回输出和最后的状态,而循环神经网络的输出是a,所以此时返回的两个东西实际上内容一样,代码验证:
import keras
import numpy as np
import keras.backend as K
a = np.random.randn(100, 20, 30)
a = K.constant(a)
ls = keras.layers.SimpleRNN(10, return_state=True)
o = ls(a)
print(len(o)) # 2
print(o[0].shape) # (100, 10)
print(o[1].shape) # (100, 10)
# # 发现一样的
print(K.eval(o[0][0]))
'''
[ 0.9602081 -0.99240303 -0.85192835 0.46388015 -0.3387786 -0.64301735
-0.41622564 -0.69669306 0.72947526 0.22001031]
'''
print(K.eval(o[1][0]))
'''
[ 0.9602081 -0.99240303 -0.85192835 0.46388015 -0.3387786 -0.64301735
-0.41622564 -0.69669306 0.72947526 0.22001031]
'''
对于SimpleRNN来说,如果return_sequences=True,return_state=True的话,将返回全部序列(即全部的a,实际上就是所有状态)和最后的状态,因此全部序列的最后一个应该和最终状态一样,代码验证:
import keras
import numpy as np
import keras.backend as K
a = np.random.randn(100, 20, 30)
a = K.constant(a)
ls = keras.layers.SimpleRNN(10, return_sequences=True, return_state=True)
o = ls(a)
print(len(o)) # 2
print(o[0].shape) # (100, 20, 10)
print(o[1].shape) # (100, 10)
# 发现一样的
print(K.eval(o[0][0][-1]))
'''
[-0.70218027 -0.64057803 0.99573636 -0.5584237 -0.9264632 0.6889967
0.01906533 -0.668484 -0.13754909 0.45146388]
'''
print(K.eval(o[1][0]))
'''
[-0.70218027 -0.64057803 0.99573636 -0.5584237 -0.9264632 0.6889967
0.01906533 -0.668484 -0.13754909 0.45146388]
'''
# LSTM返回两个状态
inp = keras.layers.Input((20, 30))
ls = keras.layers.LSTM(10, return_sequences=True, return_state=True)
o = ls(a)
print(len(o)) # 3
print(o[0].shape) # (100, 20, 10)
print(o[1].shape) # (100, 10)
print(o[2].shape) # (100, 10)
print(K.eval(o[0][0][-1]))
'''
[-0.19201267 0.22842234 -0.03906217 0.04675736 -0.441967 -0.03653109
0.01334425 0.19605611 -0.38279617 0.4425362 ]
'''
print(K.eval(o[1][0]))
'''
[-0.19201267 0.22842234 -0.03906217 0.04675736 -0.441967 -0.03653109
0.01334425 0.19605611 -0.38279617 0.4425362 ]
'''
print(K.eval(o[2][0]))
'''
[-0.36619347 0.884866 -0.7736238 0.31789908 -0.733658 -0.07859978
0.02072818 0.47486547 -0.48310652 0.52434754]
'''