Kullback-Leibler (KL) loss

最新推荐文章于 2024-04-13 09:47:14 发布

转载最新推荐文章于 2024-04-13 09:47:14 发布 · 2.6k 阅读

math 专栏收录该内容

9 篇文章

订阅专栏

博客介绍了Kullback-Leibler(KL)损失，包括离散和连续概率分布下的定义，指出其始终非负且在概率质量函数对中是凸的，还给出了多元正态分布下KL损失的计算公式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Kullback-Leibler( $KL\mathrm {KL}$ ) loss
（离散）For discrete probability distributions $F (x)$ and $G (x)$ , the Kullback-Leibler ( $KL\mathrm {KL}$ ) loss from $F (x)$ to $G (x)$ is defined[5] to be
$\mathrm {KL}\{F(x)\|G(x)\} = \sum_{i=1}^nF(x)\log\frac{F(x)}{G(x)}.$
（连续）For distributions $F (x)$ and $G (x)$ of a continuous random variable, the Kullback–Leibler( $KL\mathrm {KL}$ ) loss is defined to be
$KL{F(x)∥G(x)}=∫−∞∞f(x)log⁡f(x)g(x)dx\mathrm {KL}\{F(x)\|G(x)\} = \int_{-\infty}^{\infty}f(x)\log\frac{f(x)}{g(x)}dx$
where $f (x)$ and $g (x)$ is the densities function of $F (x)$ and $G (x)$ .

The Kullback–Leibler loss is always non-negative(始终非负), that is
$KL{F(x)∥G(x)}⩾0.\mathrm {KL}\{F(x)\|G(x)\}\geqslant0.$
The Kullback–Leibler( $KL\mathrm {KL}$ ) loss $KL{F(x)∥G(x)}\mathrm {KL}\{F(x)\|G(x)\}$ is convex(凸的) in the pair of probability mass functions $(f,g){\displaystyle (f,g)}$ , i.e. if $(f1,g1){\displaystyle (f_{1},g_{1})}$ and $(f2,g2){\displaystyle (f_{2},g_{2})}$ are two pairs of probability mass functions, then $KL{λf1+(1−λ)f2∥λg1+(1−λ)g2}≤λKL(f1∥g1)+(1−λ)KL(f2∥g2){\mathrm {KL}\{\lambda f_{1}+(1-\lambda )f_{2}\|\lambda g_{1}+(1-\lambda )g_{2}\}\leq \lambda \mathrm {KL} (f_{1}\|g_{1})+(1-\lambda )\mathrm {KL} (f_{2}\|g_{2})}$ for $0≤λ≤10\leq\lambda\leq1$ .

eg: Multivariate normal distributions
Suppose that we have two multivariate normal distributions, with means $μ0,μ1{\displaystyle \mu _{0},\mu _{1}}$ and with (nonsingular) covariance matrices $Σ0,Σ1{\displaystyle \Sigma _{0},\Sigma _{1}}$ . If the two distributions have the same dimension, k, then the Kullback–Leibler( $KL\mathrm{KL}$ ) loss between the distributions is as follows:
$KL(N0∥N1)=12{tr(Σ1−1Σ0)+(μ1−μ0)TΣ1−1(μ1−μ0)−k+log⁡(det⁡Σ1det⁡Σ0)}.\mathrm{KL}({\mathcal {N}}_{0}\|{\mathcal {N}}_{1})={1 \over 2}\left\{\mathrm {tr} \left(\Sigma _{1}^{-1}\Sigma _{0}\right)+\left(\mu _{1}-\mu _{0}\right)^{\text{T}}\Sigma _{1}^{-1}(\mu _{1}-\mu _{0})-k+\log \left({\det \Sigma _{1} \over \det \Sigma _{0}}\right)\right\}.$