keras for attention

本文介绍了一种基于双向LSTM结合Attention机制的深度学习模型,并在MNIST数据集上进行了实验验证。该模型能够有效提升字符级别的图像分类任务表现,实验结果显示训练集准确率达到98.43%,测试集准确率为98.95%。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

keras还没有官方实现attention机制,有些attention的个人实现,在mnist数据集上做了下实验。模型是双向lstm+attention+dropout,话说双向lstm本身就很强大了。
参考链接:https://github.com/philipperemy/keras-attention-mechanism
https://github.com/keras-team/keras/issues/1472
环境:win10,py2.7,keras2+
代码如下:

# mnist attention
import numpy as np
np.random.seed(1337)
from keras.datasets import mnist
from keras.utils import np_utils
from keras.layers import *
from keras.models import *
from keras.optimizers import Adam

TIME_STEPS = 28
INPUT_DIM = 28
lstm_units = 64

# data pre-processing
(X_train, y_train), (X_test, y_test) = mnist.load_data('mnist.npz')
X_train = X_train.reshape(-1, 28, 28) / 255.
X_test = X_test.reshape(-1, 28, 28) / 255.
y_train = np_utils.to_categorical(y_train, num_classes=10)
y_test = np_utils.to_categorical(y_test, num_classes=10)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

# first way attention
def attention_3d_block(inputs):
    #input_dim = int(inputs.shape[2])
    a = Permute((2, 1))(inputs)
    a = Dense(TIME_STEPS, activation='softmax')(a)
    a_probs = Permute((2, 1), name='attention_vec')(a)
    #output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
    output_attention_mul = multiply([inputs, a_probs], name='attention_mul')
    return output_attention_mul

# build RNN model with attention
inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
drop1 = Dropout(0.3)(inputs)
lstm_out = Bidirectional(LSTM(lstm_units, return_sequences=True), name='bilstm')(drop1)
attention_mul = attention_3d_block(lstm_out)
attention_flatten = Flatten()(attention_mul)
drop2 = Dropout(0.3)(attention_flatten)
output = Dense(10, activation='sigmoid')(drop2)
model = Model(inputs=inputs, outputs=output)

# second way attention
# inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
# units = 32
# activations = LSTM(units, return_sequences=True, name='lstm_layer')(inputs)
#
# attention = Dense(1, activation='tanh')(activations)
# attention = Flatten()(attention)
# attention = Activation('softmax')(attention)
# attention = RepeatVector(units)(attention)
# attention = Permute([2, 1], name='attention_vec')(attention)
# attention_mul = merge([activations, attention], mode='mul', name='attention_mul')
# out_attention_mul = Flatten()(attention_mul)
# output = Dense(10, activation='sigmoid')(out_attention_mul)
# model = Model(inputs=inputs, outputs=output)

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
print(model.summary())

print('Training------------')
model.fit(X_train, y_train, epochs=10, batch_size=16)

print('Testing--------------')
loss, accuracy = model.evaluate(X_test, y_test)

print('test loss:', loss)
print('test accuracy:', accuracy)

结果:训练集上准确率98.43%,测试集上准确率98.95%。貌似没有过拟合,还可以接着训练。之前跑过tensorflow给的mnist例子,双向lstm也可以达到98%以上的准确率。

关于attention的博客:
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
https://www.cnblogs.com/shixiangwan/p/7573589.html
https://codekansas.github.io/blog/2016/language.html
https://distill.pub/2016/augmented-rnns/

论文:
《nerual machine translation by jointly learning to align and translate》
《show attend and tell : nerual image caption generation with visual attention》
《attention based bidirectional lstm for relation classification》
《hierarchical attention networks for document classification》
欢迎交流~

评论 16
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值