机器翻译 -- Neural Machine Translation

本文探讨吴恩达深度学习课程中关于机器翻译NMT的实践,重点在于NMT的注意力机制。首先介绍日期转换的背景,然后详细阐述注意力机制的原理并实现,最后通过可视化展示注意力权重如何帮助模型聚焦于输入的相关部分。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文是基于吴恩达老师《深度学习》第五课第三周练习题所做。

0.背景介绍

 为探究机器翻译的奥秘,我们首先从日期翻译着手。本程序所需的第三方库、数据集及辅助程序,可点击此处下载。

from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import numpy as np

from faker import Faker
import random
from tqdm import tqdm
from babel.dates import format_date
from nmt_utils import *
import matplotlib.pyplot as plt

1. 将人类可读日期转化为机器可读日期

 1.1 数据集

 我们训练模型所用的数据集中有10000个人类可读的日期及等价的机器可读的日期。

m = 10000
dataset, human_vocab, machine_vocab, inv_machine_vocab = load_dataset(m)
[('9 may 1998', '1998-05-09'), ('10.09.70', '1970-09-10'), ('4/28/90', '1990-04-28'), ('thursday january 26 1995', '1995-01-26'), ('monday march 7 1983', '1983-03-07'), ('sunday may 22 1988', '1988-05-22'), ('tuesday july 8 2008', '2008-07-08'), ('08 sep 1999', '1999-09-08'), ('1 jan 1981', '1981-01-01'), ('monday may 22 1995', '1995-05-22')]

其中:dataset为元组列表(human_readable_date, machine_readable_date);

human_vocab为字典映射,将human_readable_date映射为整数值向量;

machine_vocab为字典映射,将machine_readable_date映射为整数值向量;

inv_machine_vocab为machine_vocab的翻转映射。

接下来对数据及初始的文本数据进行预处理,设human_readable_date的最大长度为Tx=30,machine_readable_date的最大长度为Ty=10

Tx = 30
Ty = 10
X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)
print("X.shape:", X.shape)
print("Y.shape:", Y.shape)
print("Xoh.shape:", Xoh.shape)
print("Yoh.shape:", Yoh.shape)

X.shape: (10000, 30)
Y.shape: (10000, 10)
Xoh.shape: (10000, 30, 37)
Yoh.shape: (10000, 10, 11)

2. NMT的注意力机制

如果我们想将一篇文章从英语翻译成法语,我们不可能通读文章后一口气翻译出来,而是要考虑到前后文的关系,一点一点翻译。注意力机制就是要告诉NMT算法需要在哪一步特别留意。

2.1 注意力机制

本小节所要实现的注意力机制流程如图所示,one_step_attention函数的直接输出为context变量。

 

图1 

图2 

由图可知,在实现one_step_attention()之前,需要通过一些运算对输入X进行处理,在keras框架下,这些步骤可以抽象成一个一个层。计算的机理可参考吴恩达老师的视频教程。

repeator = RepeatVector(Tx)
concatenator = Concatenate(axis = -1)
densor1 = Dense(10, activation = 'tanh')
densor2 = Dense(1, activation = 'relu')
activator = Activation(softmax, name = 'attention_weights')
dotor = Dot(axes = 1)

有兴趣的同学可以详细阅读:RepeatVector()Concatenate()Dense()Activation()Dot().

def one_step_attention(a, s_prev):
    """
    Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights
    "alphas" and the hidden states "a" of the Bi-LSTM.

    Arguments:
    a -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_a)
    s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)

    Returns:
    context -- context vector, input of the next (post-attetion) LSTM cell
    """

    ### START CODE HERE ###
    # Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)
    s_prev = repeator(s_prev)
    # Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)
    concat = concatenator([a, s_prev])
    # Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e. (≈1 lines)
    e = densor1(concat)
    # Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies. (≈1 lines)
    energies = densor2(e)
    # Use "activator" on "energies" to compute the attention w
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值