机器学习和深度学习学习目录 python3
模型构建
模型介绍
模型包含两层LSTM神经网络,Pre-Bi-attention(双向)是构建Attention的核心部分输入为
T
x
T_x
Tx个。
第二层LSTM在Post-Attention层后,输入为
T
y
T_y
Ty个,这里我们注意下Post-Attention的输入包含
s
(
t
−
1
)
和
c
(
t
)
s^{(t-1)}和c^{(t)}
s(t−1)和c(t),没有将
y
(
t
−
1
)
y^(t-1)
y(t−1)作为输入是因为这里我们的日期翻译模型(YYYY-MM-DD)相邻元素的关系不大。
注意:此模型会有两次连接,Pre-Bi-Attention正向和反向的
a
(
t
)
a^{(t)}
a(t)值,以及Post-attention的输出
s
(
t
)
s^{(t)}
s(t)
来看下Attention层的详细情况,首先repeat
s
(
t
−
1
)
s^{(t-1)}
s(t−1)分别与隐藏层状态
a
(
t
)
a^{(t)}
a(t)连接通过全连接层和softmax层获得权重,
c
o
n
t
e
x
t
(
t
)
context^{(t)}
context(t)即为权重化的
a
(
t
)
′
a^{(t)'}
a(t)′
接下来我们把核心代码解释下,导入Keras模块。
from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import numpy as np
repeator = RepeatVector(Tx)
concatenator = Concatenate(axis=-1)
densor1 = Dense(10, activation = "tanh")
densor2 = Dense(1, activation = "relu")
activator = Activation(softmax, name='attention_weights') #激活函数(axis=1)
dotor = Dot(axes = 1)
计算Attention
构建one_step_attention函数计算注意力 c o n t e x t t context^{t} contextt
def one_step_attention(a, s_prev):
s_prev = repeator(s_prev)#RepeatVector复制Tx份s_prev
concat = concatenator([s_prev,a])#连接s_prev和a
e = densor1(concat)#全连接,激活函数为tanh
energies = densor2(e)#全连接层,激活函数为relu
alphas = activator(energies)#softmax层获取权重
context = dotor([alphas,a])#计算context向量
return context
model构建
n_a = 32#Bi-LSTM隐藏层大小
n_s = 64#Post-LSTM隐藏层大小
post_activation_LSTM_cell = LSTM(n_s, return_state = True)
output_layer = Dense(len(machine_vocab), activation=softmax)#输出machine词汇大小的概率预测
def model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):
X = Input(shape=(Tx, human_vocab_size))
s0 = Input(shape=(n_s,), name='s0')#post-attention LSTM 上一神经元输出
c0 = Input(shape=(n_s,), name='c0')#post-attention LSTM 上一神经元隐藏状态
s = s0
c = c0
outputs = []
a = Bidirectional(LSTM(n_a,return_sequences=True),input_shape=(m,Tx, n_a*2))(X)# Bi-LSTM
for t in range(Ty):
context = one_step_attention(a, s)
s, _, c = post_activation_LSTM_cell(context,initial_state=[s,c])
out = output_layer(s)
outputs.append(out)
model = Model(inputs=[X,s0,c0],outputs=outputs)
return model
训练模型
接下来组建并训练模型
model = model(Tx, Ty, n_a, n_s, len(human_vocab), len(machine_vocab))
#model.summary()#查看模型
opt = Adam(lr=0.005,beta_1=0.9,beta_2=0.999,decay=0.01) # 梯度计算函数
model.compile(loss='categorical_crossentropy',optimizer=opt,metrics=['accuracy'])#组建完整模型--损失函数、梯度计算函数、模型计算方法采用准确率
s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))
outputs = list(Yoh.swapaxes(0,1))
model.fit([Xoh, s0, c0], outputs, epochs=1, batch_size=100)
可以看出,使用了Keras代码量大幅减少。