sigmoid cross entorpy loss

最新推荐文章于 2024-09-20 15:37:29 发布

原创最新推荐文章于 2024-09-20 15:37:29 发布 · 2.4k 阅读

0 ·

CC 4.0 BY-SA版权

Caffe学习同时被 3 个专栏收录

44 篇文章

订阅专栏

Deep Learning

39 篇文章

订阅专栏

Machine Learning

5 篇文章

订阅专栏

本文解释了交叉熵误差的概念及其在神经网络训练中的作用，并通过实例展示了如何计算平均交叉熵误差。此外，还讨论了交叉熵在多标签学习中的应用。

1.Cross Entropy Error
The mathematics behind cross entropy (CE) error and its relationship to NN training are very complex, but, fortunately, the results are remarkably simple to understand and implement. CE is best explained by example. Suppose you have just three training items with the following computed outputs and target outputs:

InCE, A 4-7-3 NN(4input - 7hindden - 3output) is instantiated and then trained using the back-propagation algorithm in conjunction with cross entropy error. After training’s completed, the NN model correctly predicted the species of 29 of the 30 (0.9667) test items.

Using a winner-takes-all evaluation technique, the NN predicts the first two data items correctly because the positions of the largest computed outputs match the positions of the 1 values in the target outputs, but the NN is incorrect on the third data item. The mean (average) squared error for this data is the sum of the squared errors divided by three. The squared error for the first item is (0.1 - 0)^2 + (0.3 - 0)^2 + (0.6 - 1)^2 = 0.01 + 0.09 + 0.16 = 0.26. Similarly, the squared error for the second item is 0.04 + 0.16 + 0.04 = 0.24, and the squared error for the third item is 0.49 + 0.16 + 0.09 = 0.74. So the mean squared error is (0.26 + 0.24 + 0.74) / 3 = 0.41.

Notice that in some sense the NN predicted the first two items with identical accuracy, because for both those items the computed outputs that correspond to target outputs of 1 are 0.6. But observe the squared error for the first two items are different (0.24 and 0.26), because all three outputs contribute to the sum.

The mean (average) CE error for the three items is the sum of the CE errors divided by three. The fancy way to express CE error with a function is shown in Figure 2.

In words this means, “Add up the product of the log to the base e of each computed output times its corresponding target output, and then take the negative of that sum.” So for the three items above, the CE of the first item is - (ln(0.1)*0 + ln(0.3)*0 + ln(0.6)*1) = - (0 + 0 -0.51) = 0.51. The CE of the second item is - (ln(0.2)*0 + ln(0.6)*1 + ln(0.2)*0) = - (0 -0.51 + 0) = 0.51. The CE of the third item is - (ln(0.3)*1 + ln(0.4)*0 + ln(0.3)*0) = - (-1.2 + 0 + 0) = 1.20. So the mean cross entropy error for the three-item data set is (0.51 + 0.51 + 1.20) / 3 = 0.74.

Notice that when computing mean cross entropy error with neural networks in situations where target outputs consist of a single 1 with the remaining values equal to 0, all the terms in the sum except one (the term with a 1 target) will vanish because of the multiplication by the 0s. Put another way, cross entropy essentially ignores all computed outputs which don’t correspond to a 1 target output. The idea is that when computing error during training, you really don’t care how far off the outputs which are associated with non-1 targets are, you’re only concerned with how close the single computed output that corresponds to the target value of 1 is to that value of 1. So, for the three items above, the CEs for the first two items, which in a sense were predicted with equal accuracy, are both 0.51.

联系NG在ML课程中LR回归所讲，可知，NG所说的LR回归loss其实就是sigmoid cross entorpy loss（注意上文Notice）。当然sigmoid cross entorpy loss不仅仅用在这样的问题中，还可以应用在多标签学习问题中（多标签学习概念）。
多标签学习与传统的单标签学习的区别在于：
Traditional single-label classification is concerned with learning from a set of examples that are associated with a single label l from a set of disjoint labels L, |L| > 1. In multi-label classification, the examples are associated with a set of labels Y in L. In the past, multi-label classification was mainly motivated by the tasks of text categorization and medical diagnosis. Nowadays, we notice that multilabel classification methods are increasingly required by modern applications, such as protein function classification, music categorization and semantic scene classification。

2. caffe里sigmoidCrossEntropyLoss层计算
参考自caffecn
这里写图片描述