EM算法Q函数推导过程详解

幻影123！

已于 2023-12-24 21:06:20 修改

阅读量1.8k

点赞数 23

分类专栏：算法机器学习文章标签：算法机器学习人工智能 Q函数 EM算法统计学习概率论

于 2023-12-22 23:15:10 首次发布

本文链接：https://blog.youkuaiyun.com/qq_33909788/article/details/135162605

版权

算法同时被 2 个专栏收录

5 篇文章

订阅专栏

机器学习

3 篇文章

订阅专栏

本文详细解释了Q函数在EM算法中的关键作用，它是期望完全数据对数似然函数，用于E步中的参数优化，通过推导证明了如何从似然函数转化为Q函数形式，最终求得模型参数的最大似然估计。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Q函数

$\begin{aligned} Q\left(\theta, \theta^{(i)}\right) & =E_Z \left[\log P(Y, Z \mid \theta) \mid Y, \theta^{(i)}\right] \\ & =\sum_Z \log P(Y, Z \mid \theta) \cdot P\left(Z \mid Y, \theta^{(i)}\right) \end{aligned}$

Q函数是EM算法中的一个重要函数，全称为“期望完全数据对数似然函数”。它的作用是在E步中计算出完全数据的对数似然函数的期望值，以便在M步中求出模型参数的最大似然估计值。

在之前的一篇文章（EM算法求解三硬币模型参数推导）中，为大家介绍了李航教授《统计学习方法》中求解三硬币模型的参数推导过程，其中使用的EM算法是从一个Q函数直接展开求解的，限于篇幅，文章并未展示证明过程，本篇文章作为上一篇文章以及《统计学习方法-第九章-179页》推导的补充，详细推导Q函数的由来。

Q函数推导证明

我们已知关于参数 $\theta$ 的似然函数
$L(\theta)=\log P(Y \mid \theta) \\ =\log \frac{P(Y, \theta)}{ P(\theta)} \\=\log \frac{ \sum_Z P(Y, \theta,Z)}{ P(\theta)} \\=\log \sum_Z \frac{ P(Y, \theta,Z)}{ P(\theta)}=\log \sum_Z P(Y, Z \mid \theta) \\=\log \sum_Z \frac{ P(Y, Z, \theta)}{ P(Z,\theta) }\cdot \frac{ P(Z, \theta)}{ P(\theta) }$
即
$L(\theta)=\log \sum_Z P(Y \mid Z, \theta) \cdot P(Z \mid \theta)$
假设第i次参数取 $\theta^{(i)}$ ，我们希望优化后 $L(\theta)>L(\theta^{(i)})$
于是可以作差

即
$L(\theta)-L\left(\theta^{(i)}\right)=\log \Sigma_Z P(Y \mid Z, \theta) \cdot P(Z \mid \theta)-\log P\left(Y \mid \theta^{(i)}\right)$
第一项可以凑一个分式出来
$L(\theta)-L\left(\theta^{(i)}\right)=\log \left(\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}\right)-\log P\left(Y \mid \theta^{(i)}\right)$
利用 $\sum_Z P \left(Z \mid Y, \theta^{(i)}\right)=1$ 的特性，第二项乘以这一串，可以得到
$L(\theta)-L\left(\theta^{(i)}\right)=\log \left(\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}\right)-\log P\left(Y \mid \theta^{(i)}\right) \cdot \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right)$
利用 $J e n se n$ 不等式
$log\sum_{j}\lambda_j \cdot y_j \geqslant \sum_j \lambda_j \cdot log y_j$ ，其中 $\lambda \geqslant 0,\sum_j \lambda_j =1$

可知
$\geqslant \sum_Z P \left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)}\right)}-\log P\left(Y \mid \theta^{(i)}\right) \cdot \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right)$

$=\sum_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot\left[\log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{p\left(Z \mid Y, \theta^{(i)}\right)}-\log P\left(Y \mid \theta^{(i)}\right)\right]$
$=\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) }$
即此时 $L(\theta)-L\left(\theta^{(i)}\right) \geqslant \Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) }$
即 $L(\theta) \geqslant L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) }$
令 $B\left(\theta, \theta^{(i)}\right)=L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{ P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{P\left(Z \mid Y, \theta^{(i)} \right) \cdot P\left(Y \mid \theta^{(i)} \right) }$
此时 $B\left(\theta, \theta^{(i)}\right)$ 是 $L(\theta)$ 的下界，使 $B\left(\theta, \theta^{(i)}\right)$ 最大化的 $\theta$ 也可使 $L\left( \theta\right)$ 最大

于是我们的目标是 $\theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} B\left(\theta, \theta^{(i)}\right)$
也即
$\theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} \left[L\left(\theta^{(i)}\right)+\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log \frac{P(Y \mid Z, \theta) \cdot P(Z \mid \theta)}{\left.P(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right)}\right]$

可把 $L\left( \theta^{(i)}\right)、P\left(Z \mid Y, \theta^{(i)}\right)、 P\left(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right)$ 三项视为常数
且已知 $P\left(Z \mid Y, \theta^{(i)}\right) \cdot P\left(Y \mid \theta^{(i)}\right)>0$ ，这一项从分母去掉，不影响求最大值，注意这里的 $\left.P(Z \mid Y, \theta^{(i)}\right)$ 不能省略，因为它是 $\sum$ 后面中的每一项的系数

于是
$\theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}}\left[\Sigma_Z P\left(Z \mid Y,{ \theta}^{(i)}\right) \cdot \log P(Y \mid Z, \theta) \cdot P(Z \mid \theta)\right]$

我们令 $Q\left(\theta, \theta^{(i)}\right)=\Sigma_Z P\left(Z \mid Y, \theta^{(i)}\right) \cdot \log P(Y \mid Z, \theta) \cdot P(Z \mid \theta)$
即
$\theta^{(i+1)}=\underset{\theta}{\operatorname{argmax}} Q\left(\theta, \theta^{(i+1)}\right)$