tf.contrib.seq2seq.BahdanauAttention函数和tf.contrib.seq2seq.LuongAttention函数学习

最新推荐文章于 2025-07-17 09:01:36 发布

原创最新推荐文章于 2025-07-17 09:01:36 发布 · 4k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#attention #BahdanauAttention #Bahdanau

tensorflow 专栏收录该内容

6 篇文章

订阅专栏

这篇博客介绍了TensorFlow中BahdanauAttention和LuongAttention两种注意力机制，包括它们的计算公式和应用场景。BahdanauAttention通过加权平均编码器的隐含层来计算注意力权重，而LuongAttention则提供了归一化的改进。文章还提到了这两个方法在源码中的实现，并引用了相关论文作为深入学习的参考资料。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

tf.contrib.seq2seq.BahdanauAttention（）

__init__(
    num_units,
    memory,
    memory_sequence_length=None,
    normalize=False,
    probability_fn=None,
    score_mask_value=None,
    dtype=None,
    name='BahdanauAttention'
)

有两种attention模式：

1.Bahdanau attention

现在，对上面的解码器稍作修改。我们假设时刻t′的背景向量为ct′。那么解码器在t′时刻的隐含层变量

s t' = g (y t' - 1, c t', s t' - 1)

令编码器在t时刻的隐含变量为ht，解码器在t′时刻的背景向量为

c t' = \sum t = 1 T α t' t h t

也就是说，给定解码器的当前时刻t′，我们需要对编码器中不同时刻t的隐含层变量求加权平均。而权值也称注意力权重。它的计算公式是

α t' t = exp ( e t ' t ) \sum T k = 1 exp ( e t ' k )

而et′t∈R的计算为：

e t' t = a (s t' - 1, h t)

其中函数a有多种设计方法。在Bahanau的论文中，

e t' t = v ⊤ tanh (W s s t' - 1 + W h h t)

其中的v、Ws、Wh和编码器与解码器两个循环神经网络中的各个权重和偏移项以及嵌入层参数等都是需要同时学习的模型参数。

2. 当normalize=True,上面所提到的a模型发生的变化.变化如下：

v = g * v / ||v||

et′t=v⊤tanh(Wsst′−1+Whht)

这中方法是受到了weight normalization的方法启发而来的。

具体的你可以查看源码了解：


	v = variable_scope.get_variable("attention_v", [num_units], dtype=dtype)
	if normalize:

	# Scalar used in weight normalization
	g = variable_scope.get_variable(
	"attention_g", dtype=dtype,
	initializer=math.sqrt((1. / num_units)))
	# Bias added prior to the nonlinearity
	b = variable_scope.get_variable(
	"attention_b", [num_units], dtype=dtype,
	initializer=init_ops.zeros_initializer())
	# normed_v = g * v / \|\|v\|\|
	normed_v = g * v * math_ops.rsqrt(
	math_ops.reduce_sum(math_ops.square(v)))
	return math_ops.reduce_sum(
	normed_v * math_ops.tanh(keys + processed_query + b), [2])
	else:
	return math_ops.reduce_sum(v * math_ops.tanh(keys + processed_query), [2])

tf.contrib.seq2seq.LuongAttention（）

__init__(
    num_units,
    memory,
    memory_sequence_length=None,
    scale=False,
    probability_fn=None,
    score_mask_value=None,
    dtype=None,
    name='LuongAttention'
)