tensorflow中mask

最新推荐文章于 2025-07-30 15:19:11 发布

小妖精Fsky

最新推荐文章于 2025-07-30 15:19:11 发布

阅读量1w

点赞数 1

CC 4.0 BY-SA版权

分类专栏： TensorFlow

本文链接：https://blog.youkuaiyun.com/appleml/article/details/56675152

TensorFlow 专栏收录该内容

36 篇文章

订阅专栏

在训练LSTM时，面对不同长度的输入序列，通常需要填充到固定长度。在TensorFlow中，可以通过添加“NUL”符号来实现。然而，这可能导致模型学习到填充符号的行为，影响泛化能力。解决办法是在计算成本时应用Mask，忽略填充部分。在序列到序列模型中，可以找到许多这样的例子，如TensorFlow官方的seq2seq教程中所介绍的‘bucketing and padding’。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

    #loss的shape=[batch,num_step],seqlen的shape = [batch]
    def mask(self, loss, seqlen): 
        mask = tf.sequence_mask(seqlen, maxlen=config.num_steps, dtype=tf.int32)
        clear_loss = np.sum(loss * mask) / np.sum(mask)
        return clear_loss

该段代码的实现是受下面问答的启发：

Question：
Hi,

Say I want to train some LSTM unit, and my training data has variable lengths with a maximum length of say, 30.
What is the right thing to do?

In TF we cannot dynamically create a computation graph of varied lengths, so the number of LSTM unrolling is fixed.
So do we have to pad everything to have a length of 30?

Let’s say my input is a sequence of symbols from a certain alphabet, do I have to add a “NUL” symbol to my alphabet, so that my input now looks like:
w1, w2, … wn, NUL, NUL, NUL, NUL…

This is what I am doing now. However I think this is wrong as the LSTM now will learn some additional behaviours when consuming the (artificial) NUL symbol.
I’m worried that models trained this way won’t be able to generalize well when the length is not bound to 30.

Thanks!
–evan

Answer：
for transduction problems (1:1 between sequence input and target) the general approach i think is to allow the RNN to run over these NUL values but then you apply a mask to zero out the cost associated with them.

eg for sequence [w1, w2, w3, NUL, NUL, NUL]
you first calculate the per element costs, say, costs = [3.1, 4.1, 5.9, 2.6, 5.3, 5.8]

usually you’d take the mean; np.mean(costs) = 4.8, but in this case you don’t care about the last three.

so now you’ll maintain a mask, 0 for NUL and 1 otherwise, mask = [1,1,1,0,0,0]
and you’ll calculate your sequence cost using this mask to zero out the costs you don’t care about;
sequence_cost = np.sum(costs * mask) / np.sum(mask)
(note! NOT np.mean(costs * mask) since the effective sequence “length” has changed from 6 to 3)

it’s “wasteful” in the sense you’re doing more work than the unpadded version but the argument is the denser packed data makes up for in the speed up of the lower level libraries

there are lots of examples of this in the tensorflow seq2seq models
see http://www.tensorflow.org/tutorials/seq2seq/index.html “bucketing and padding” for the high level view of this (+ the extended idea of bucketing)
and https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py for more detail in code

原贴地址：https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/wk8sbFGyfHA