tf.losses.sparse_softmax_cross_entropy()

最新推荐文章于 2023-06-24 12:34:06 发布

原创最新推荐文章于 2023-06-24 12:34:06 发布 · 4.9k 阅读

8 ·

CC 4.0 BY-SA版权

文章标签：

#tensorflow

深度学习专栏收录该内容

4 篇文章

订阅专栏

本文详细解析了Sparse Softmax Cross Entropy损失函数的工作原理及其与Softmax Cross Entropy的区别，通过实例展示了如何在TensorFlow中应用这两种损失函数，并讨论了权重参数的作用。

部署运行你感兴趣的模型镜像

损失函数，经常用语多分类，相比于softmax交叉熵，其区别主要在于，softmax 的label是onehot编码的，如[0,0,1],而sparse它的label是一个可能性最高位置的索引。

logits = tf.constant([0.1,0.1,0.8])
labels = tf.constant([2])
labels2 = tf.constant([0,0,1])
y1 = tf.losses.sparse_softmax_cross_entropy(labels=labels,logits=logits)
y2 = tf.losses.softmax_cross_entropy(onehot_labels=labels2,logits=logits)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run((y1,y2))
    #(0.68972665, 0.68972665)

其原理是说：先队logits进行softmax，具体原理网上很多，使用代码实现就是：

a = (math.e**0.1+math.e**0.1+math.e**0.8)
b,c,d = math.e**0.1/a,math.e**0.1/a,math.e**0.8/a
e = 0*np.log(b)+0*np.log(c)+1*np.log(d)
print(-e)
# 0.6897266409702166

关于weights的说明，从api介绍中：

weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.

可以看出来，weight是一个和batch_size相同的系统，也就是，如果一个批次有10个，weight可以设置10个，这10个数会依次和计算出来的交叉熵相乘之后进行一个累加。

logits = tf.constant([[0.1,0.1,0.8],[0.1,0.9,0.0]])
labels = tf.constant([[2],[1]])
labels2 = tf.constant([[0,0,1],[0,1,0]])
y1 = tf.losses.sparse_softmax_cross_entropy(labels=labels,logits=logits)
y2 = tf.losses.softmax_cross_entropy(onehot_labels=labels2,logits=logits,weights=[0.1,0.10])
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run((y1,y2)))
# (0.65404785, 0.06540479)

关键是这个weight的使用时机，比如说做一个多分类的时候，如果类别之间差距特别大，class1 占比90%，class2，占 2%，class3 占比2%，那这时候就需要使用参数进行调整了，不然会出现模型针对某一个类别的占比很大，对其他的就不敏感了，导致模型效果变差。

一般在训练的时候，放入的都是一个batch，这时候得到的其实是针对每一个分别计算之后得到的均值。

您可能感兴趣的与本文相关的镜像

TensorFlow-v2.15

TensorFlow

TensorFlow 是由Google Brain 团队开发的开源机器学习框架,广泛应用于深度学习研究和生产环境。它提供了一个灵活的平台,用于构建和训练各种机器学习模型