why cross entropy loss works

交叉熵损失是衡量概率分布的度量,用于同一随机变量。其目标是使算法生成的概率分布q(xi)与训练数据中的真实分布p(xi)一致,从而实现正确分类。当q(xi)等于p(xi)时,交叉熵损失达到最小,这意味着算法能够产生正确的分类结果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

cross entropy is measurement of probability distributions p and q over the same underlying random variable.
H ( p , q ) = − ∑ x i ∈ X p ( x i ) l o g q ( x i ) H(p, q) = -\sum_{x_i \in X} p(x_i)log^{q(x_i)} H(p,q)=xiXp(xi)logq(xi)

Speaking of classification problem, the random variable X X X represents the probable category of a instance. p ( x i ) p(x_i) p(xi) or q ( x i ) q(x_i) q(xi) is the probability that the instance is belonged to category x i x_i xi. They are belong to different classification system. p ( x i ) p(x_i) p(xi) is known from the training data. q ( x i ) q(x_i) q(xi) is produced by the algorithm. The goal of cross entropy loss is to make q ( x i ) q(x_i) q(xi) be same to p ( x i ) p(x_i) p(xi), so that the algorithm makes the right classification.

why cross entropy loss works

The short answer is when q ( x i ) q(x_i) q(xi) is same to p ( x i ) p(x_i) p(xi), the H ( p , q ) H(p, q) H(p,q) becomes the minimum.
To make the problem simple, let’s take binary classification as example. The category of an instance denote as X = { x 0 , x 1 } X = \{x_0, x_1\} X={x0,x1}. As it’s binary classification, there is a relationship:
p ( x 1 ) = 1 − p ( x 0 ) p(x_1) = 1 - p(x_0) p(x1)=1p(x0)
The ralation of p ( x 0 ) p(x_0) p(x0) and e n t r o p y ( X ) entropy(X) entropy(X) is as shown as
在这里插入图片描述
The entropy of certain data will correspond with one point in the curve. The curve covered any probabilities, so makes a line.
The cross entropy of p ( . ) p(.) p(.) and q ( . ) q(.) q(.), where p ( . ) = q ( . ) p(.)=q(.) p(.)=q(.), will be
在这里插入图片描述
The cross entropy of p ( . ) p(.) p(.) and q ( . ) q(.) q(.), where any condition is taken in account, will be
在这里插入图片描述
As shown in the figure, no matter what distribution the true distribution p ( . ) p(.) p(.) is, and no matter what distribution the algorithm produced distribution q ( . ) q(.) q(.) is, as the cross entropy goes to be the minimum, the distribution q ( . ) q(.) q(.) goes to be same to p ( . ) p(.) p(.). If the algorithm produces the right distribution, it produces the right classification. That’s why minimizing the cross entropy loss makes the algorithm produce the right classification.

script of plotting

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 1/4/2019 10:04 PM
# @Author  : yusisc (yusisc@gmail.com)

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

# Cross entropy - Wikipedia
# https://en.wikipedia.org/wiki/Cross_entropy
# 2D and 3D Axes in same Figure — Matplotlib 3.0.2 documentation
# https://matplotlib.org/gallery/mplot3d/mixed_subplots.html

fig = plt.figure(figsize=plt.figaspect(0.4))

p0 = np.linspace(0, 1, 20)
p0 = p0[1: -1]
# print(p.size)
entropy = -(p0 * np.log(p0) +
            (1 - p0) * np.log(1 - p0))

ax0 = fig.add_subplot(1, 3, 1)
ax0.plot(p0, entropy)
ax0.set_xlabel('p(x0)', color='r')
ax0.set_ylabel('entropy of X', color='r')
ax0.set_title('entropy of random var X')

ax1 = fig.add_subplot(1, 3, 2, projection='3d')
ax1.plot(p0, p0, entropy)
ax1.set_xlabel('p(x0)', color='r')
ax1.set_ylabel('shadow of p(x0)', color='r')
ax1.set_zlabel('entropy of X', color='r')
ax1.set_title('cross entropy of distribution p()\nand p() over random var X')

p, q = np.meshgrid(p0, p0)
cross_entropy = -(p * np.log(q) +
                  (1-p) * np.log(1 - q))

ax2 = fig.add_subplot(1, 3, 3, projection='3d')
ax2.plot(p0, p0, entropy, 'r+')
ax2.plot_surface(p, q, cross_entropy, cmap=cm.coolwarm,
                       linewidth=0, antialiased=False, alpha=0.7)
ax2.set_xlabel('p(x0)', color='r')
ax2.set_ylabel('q(x0)', color='r')
ax2.set_zlabel('\n\ncross entropy \n of distribution p() and q() \n over the random variable X', color='r')
ax2.set_title('cross entropy of distribution p()\nand q() over random var X')

plt.show()

reference

Cross entropy - Wikipedia
https://en.wikipedia.org/wiki/Cross_entropy
2D and 3D Axes in same Figure — Matplotlib 3.0.2 documentation
https://matplotlib.org/gallery/mplot3d/mixed_subplots.html

### 关于 PyTorch 中 CrossEntropyLoss 的实现与应用 在机器学习领域,`CrossEntropyLoss` 是一种常用的损失函数,尤其适用于分类任务。它结合了 `LogSoftmax` 和 `Negative Log Likelihood (NLL)` 损失[^3]。以下是关于其具体实现和使用的详细介绍: #### 1. **PyTorch 中 CrossEntropyLoss 的定义** `torch.nn.CrossEntropyLoss` 是 PyTorch 提供的一个类,用于计算输入张量和目标之间的交叉熵损失。该方法内部实现了 softmax 函数以及负对数似然损失的组合操作。 ```python import torch import torch.nn as nn criterion = nn.CrossEntropyLoss() ``` 上述代码片段展示了如何实例化一个 `CrossEntropyLoss` 对象。需要注意的是,此函数期望未经缩放的原始分数(logits),而不是经过 softmax 处理后的概率分布[^4]。 --- #### 2. **典型应用场景** 假设我们正在处理一个多类别分类问题,其中模型输出是一个大小为 `[batch_size, num_classes]` 的 logits 张量,而标签则是一个形状为 `[batch_size]` 的整型张量,表示每一批次样本的真实类别索引。 ```python # 假设 batch size 为 3,num classes 为 5 outputs = torch.randn(3, 5, requires_grad=True) # 随机初始化 logits labels = torch.tensor([1, 0, 4]) # 真实标签 loss = criterion(outputs, labels) print(f'Computed Loss: {loss.item()}') ``` 在此示例中,`nn.CrossEntropyLoss` 自动执行以下两步: - 应用 Softmax 转换到 `outputs` 上; - 计算并返回 NLL 损失值。 因此,在使用 `CrossEntropyLoss` 时无需手动调用 `softmax()` 方法[^5]。 --- #### 3. **自定义权重调整** 如果数据集中某些类别的样本数量较少,则可以通过设置参数 `weight` 来平衡不同类别的贡献度。 ```python weights = torch.tensor([1.0, 2.0, 1.0, 1.5, 0.5]) weighted_criterion = nn.CrossEntropyLoss(weight=weights) loss_with_weights = weighted_criterion(outputs, labels) print(f'Weighted Loss: {loss_with_weights.item()}') ``` 通过这种方式可以缓解因类别不平衡而导致的训练偏差问题[^6]。 --- #### 4. **忽略特定索引** 当存在未标注的数据或者希望跳过某些预测结果时,可利用 `ignore_index` 参数指定这些特殊位置不会参与最终的误差累积过程。 ```python ignored_criterion = nn.CrossEntropyLoss(ignore_index=-100) modified_labels = torch.tensor([1, -100, 4]) # 将第二个样例标记为忽略项 (-100) loss_ignoring_some = ignored_criterion(outputs, modified_labels) print(f'Ignored Index Loss: {loss_ignoring_some.item()}') ``` 这里 `-100` 表明对应条目应被排除在外[^7]。 --- #### 5. **与其他框架对比** 相较于 TensorFlow,虽然两者都支持类似的交叉熵功能,但在灵活性方面各有千秋。例如,PyTorch 更适合快速原型开发研究项目;而对于大规模分布式部署场景来说,TensorFlow 可能更具优势[^8]。 --- ### 总结 综上所述,`CrossEntropyLoss` 不仅简化了多分类任务中的复杂流程,还提供了丰富的配置选项来满足实际需求下的各种特殊情况处理要求。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值