理论基础
KL散度:衡量两个概率分布之间的相似性,其值越小,概率分布越接近。公式表达如下。
D K L ( P ∥ Q ) = ∑ i = 1 N [ p ( x i ) log p ( x i ) − p ( x i ) log q ( x i ) ] = ∑ i = 1 N [ p ( x i ) log p ( x i ) log q ( x i ) ] \begin{aligned} D_{K L}(P \| Q) & =\sum_{i=1}^{N}\left[p\left(x_{i}\right) \log p\left(x_{i}\right)-p\left(x_{i}\right) \log q\left(x_{i}\right)\right] \\ & = \sum_{i=1}^{N}\left[p\left(x_{i}\right) \frac{\log p\left(x_{i}\right)}{\log q\left(x_{i}\right)} \right] \end{aligned} DKL(P∥Q)=i=1∑N[p(xi)logp