why cross entropy loss works

最新推荐文章于 2024-12-11 10:04:06 发布

yusisc

最新推荐文章于 2024-12-11 10:04:06 发布

阅读量236

点赞数

分类专栏： ai

本文链接：https://blog.youkuaiyun.com/yusisc/article/details/85926584

版权

ai 专栏收录该内容

15 篇文章

订阅专栏

交叉熵损失是衡量概率分布的度量，用于同一随机变量。其目标是使算法生成的概率分布q(xi)与训练数据中的真实分布p(xi)一致，从而实现正确分类。当q(xi)等于p(xi)时，交叉熵损失达到最小，这意味着算法能够产生正确的分类结果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

cross entropy is measurement of probability distributions p and q over the same underlying random variable.
$-\sum_{x_i \in X} p(x_i)log^{q(x_i)}$

Speaking of classification problem, the random variable $X$ represents the probable category of a instance. $p(x_i)$ or $q(x_i)$ is the probability that the instance is belonged to category $x_i$ . They are belong to different classification system. $p(x_i)$ is known from the training data. $q(x_i)$ is produced by the algorithm. The goal of cross entropy loss is to make $q(x_i)$ be same to $p(x_i)$ , so that the algorithm makes the right classification.

why cross entropy loss works

The short answer is when $q(x_i)$ is same to $p(x_i)$ , the $H (p, q)$ becomes the minimum.
To make the problem simple, let’s take binary classification as example. The category of an instance denote as $X = \{x_0, x_1\}$ . As it’s binary classification, there is a relationship:
$p(x_1) = 1 - p(x_0)$
The ralation of $p(x_0)$ and $e n t r o p y (X)$ is as shown as
在这里插入图片描述
The entropy of certain data will correspond with one point in the curve. The curve covered any probabilities, so makes a line.
The cross entropy of $p (.)$ and $q (.)$ , where $p (.) = q (.)$ , will be

The cross entropy of $p (.)$ and $q (.)$ , where any condition is taken in account, will be
在这里插入图片描述
As shown in the figure, no matter what distribution the true distribution $p (.)$ is, and no matter what distribution the algorithm produced distribution $q (.)$ is, as the cross entropy goes to be the minimum, the distribution $q (.)$ goes to be same to $p (.)$ . If the algorithm produces the right distribution, it produces the right classification. That’s why minimizing the cross entropy loss makes the algorithm produce the right classification.

script of plotting

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 1/4/2019 10:04 PM
# @Author  : yusisc (yusisc@gmail.com)

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

# Cross entropy - Wikipedia
# https://en.wikipedia.org/wiki/Cross_entropy
# 2D and 3D Axes in same Figure — Matplotlib 3.0.2 documentation
# https://matplotlib.org/gallery/mplot3d/mixed_subplots.html

fig = plt.figure(figsize=plt.figaspect(0.4))

p0 = np.linspace(0, 1, 20)
p0 = p0[1: -1]
# print(p.size)
entropy = -(p0 * np.log(p0) +
            (1 - p0) * np.log(1 - p0))

ax0 = fig.add_subplot(1, 3, 1)
ax0.plot(p0, entropy)
ax0.set_xlabel('p(x0)', color='r')
ax0.set_ylabel('entropy of X', color='r')
ax0.set_title('entropy of random var X')

ax1 = fig.add_subplot(1, 3, 2, projection='3d')
ax1.plot(p0, p0, entropy)
ax1.set_xlabel('p(x0)', color='r')
ax1.set_ylabel('shadow of p(x0)', color='r')
ax1.set_zlabel('entropy of X', color='r')
ax1.set_title('cross entropy of distribution p()\nand p() over random var X')

p, q = np.meshgrid(p0, p0)
cross_entropy = -(p * np.log(q) +
                  (1-p) * np.log(1 - q))

ax2 = fig.add_subplot(1, 3, 3, projection='3d')
ax2.plot(p0, p0, entropy, 'r+')
ax2.plot_surface(p, q, cross_entropy, cmap=cm.coolwarm,
                       linewidth=0, antialiased=False, alpha=0.7)
ax2.set_xlabel('p(x0)', color='r')
ax2.set_ylabel('q(x0)', color='r')
ax2.set_zlabel('\n\ncross entropy \n of distribution p() and q() \n over the random variable X', color='r')
ax2.set_title('cross entropy of distribution p()\nand q() over random var X')

plt.show()