Tensorflow 实现常见mask

最新推荐文章于 2022-04-11 10:07:37 发布

小汣结

最新推荐文章于 2022-04-11 10:07:37 发布

阅读量1.7k

点赞数 1

分类专栏： # 结构复现

本文链接：https://blog.youkuaiyun.com/qwexdl/article/details/115718139

版权

结构复现专栏收录该内容

2 篇文章

订阅专栏

一、self-attention中的mask

1.1 attention的mask.

1.1.1 举例

q_mask = [1, 1, 1, 1, 0, 0]  # seq_len. 其中1表示有效, 0表示无效. 

# self-attention的score为 [seq_len, seq_len]

q_mask = tf.expand_dims(q_mask, axis=-1)  # [seq_len, 1]
k_mask = tf.reshape(q_mask, [1, -1])  # [1, seq_len] 
attention_mask = tf.matmul(q_mask, k_mask)   # [seq_len, seq_len]

1.1.2 调用

import tensorflow as tf

q_mask = tf.expand_dims(tf.constant([1, 1, 1, 1, 0, 0]), axis=-1)      # seq_len, 1
k_mask = tf.reshape(q_mask, [1, -1])   # 1, seq_len
output = tf.matmul(q_mask, k_mask)


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    print_output = sess.run([output])
    print(print_output)

# [[1, 1, 1, 1, 0, 0],
#  [1, 1, 1, 1, 0, 0],
#  [1, 1, 1, 1, 0, 0],
#  [1, 1, 1, 1, 0, 0],
#  [0, 0, 0, 0, 0, 0],
#  [0, 0, 0, 0, 0, 0]]

二、序列mask

1.1 tf.sequence_mask

1.1.1 函数参数

tf.sequence_mask(
    lengths, maxlen=None, dtype=tf.dtypes.bool, name=None
)

1.1.2 调用

tf.sequence_mask([1, 3, 2], 5)  # [[True, False, False, False, False],
                                #  [True, True, True, False, False],
                                #  [True, True, False, False, False]]

tf.sequence_mask([1, 3, 2], 5, dtype=tf.int64)
								# [[1 0 0 0 0]
								#  [1 1 1 0 0]
                                #  [1 1 0 0 0]]

tf.sequence_mask([[1, 3],[2,0]])  # [[[True, False, False],
                                  #   [True, True, True]],
                                  #  [[True, True, False],
                                  #   [False, False, False]]

由此可以得知：padding部分的mask是False, 有效部分的mask是True。
如果指定dtype=int64, 那么输出也便是如此。

三、decoder中的attention mask

1.1 下三角mask

保证一个词只能看到前面的词语，看不到后面的词。

import tensorflow as tf

q_len, k_len = 5, 5
mask = tf.ones(shape=[q_len, k_len])                       
tril = tf.linalg.LinearOperatorLowerTriangular(mask).to_dense()


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    print_output = sess.run([tril])
    print(print_output)

[[1., 0., 0., 0., 0.],
 [1., 1., 0., 0., 0.],
 [1., 1., 1., 0., 0.],
 [1., 1., 1., 1., 0.],
 [1., 1., 1., 1., 1.]]