GMM
定义
(混合高斯模型) 高斯混合模型的概率分布模型形如:
P(y∣θ)=∑i=1Kαkϕ(y∣θk)P(y\mid \theta) = \sum_{i =1 }^K \alpha_k \phi(y \mid \theta_k)P(y∣θ)=i=1∑Kαkϕ(y∣θk)
其中,αk\alpha_kαk为系数,且αk≥0\alpha_k\ge 0αk≥0 ,∑i=1Kαk=1\sum_{i =1 }^{K} {\alpha_k}=1∑i=1Kαk=1 ;ϕ(y∣θk)\phi(y \mid \theta_k)ϕ(y∣θk)是第kkk个高斯分布密度分模型,θk=(μk,σk2)\theta_k = (\mu_k,\sigma_k^2)θk=(μk,σk2)。
ϕ(y∣θk)=12πσkexp(−(y−μk)22σk2)\phi(y \mid \theta_k) = \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y-\mu_k)^2}{2\sigma_k^2}}\right)ϕ(y∣θk)=2πσk1exp(−2σk2(y−μk)2)
模型参数估计—EM求解
假设观测数据y1,y2,⋯ ,yny_1,y_2,\cdots,y_ny1,y2,⋯,yn由高斯混合模型生成,即P(y∣θ)=∑i=1Kαkϕ(y∣θk)P(y\mid \theta) = \sum_{i =1 }^K \alpha_k \phi(y \mid \theta_k)P(y∣θ)=∑i=1Kαkϕ(y∣θk),其中的参数θ=(α1,⋯ ,αk;θ1,⋯ ,θk)\theta = (\alpha_1,\cdots,\alpha_k;\theta_1,\cdots,\theta_k)θ=(α1,⋯,αk;θ1,⋯,θk)
1、确定隐含变量,写出完全数据的对数似然函数
在混合高斯模型中存在多个分模型,观测变量的观测值并不知道是由哪一个模型生成的,因此可以假设隐变量γjk\gamma_{jk}γjk,其定义为:
γjk={1,  第i个观测值由第k个分模型产生0,  otherwise
\gamma_{jk}=\left \{
\begin{array}{}
1,\;第i个观测值由第k个分模型产生\\
0,\;otherwise\\
\end{array}
\right.
γjk={1,第i个观测值由第k个分模型产生0,otherwise
由此,完全数据为(yj,γj1,⋯ ,γjK)(y_j,\gamma_{j1},\cdots,\gamma_{jK})(yj,γj1,⋯,γjK),可以得到完全数据的似然函数:
P(y,γ∣θ)=L(θ)=∏j=1nP(y1,γj1,⋯ ,γjK∣θ)=∏j=1n(γjk∑i=1Kαkϕ(yj∣θk))=∏j=1n(∏i=1K(αkϕ(yj∣θk))γjk)=∏j=1n∏i=1K(αkϕ(yj∣θk))γjk=∏j=1n∏i=1K(αk12πσkexp(−(yj−μk)22σk2))γjk
\begin{aligned}
P(y,\gamma \mid \theta)
&= L(\theta)\\
&=\prod_{j=1}^n P(y_1,\gamma_{j1},\cdots,\gamma_{jK}\mid \theta)\\
&=\prod_{j=1}^n (\gamma_{jk}\sum_{i =1 }^K \alpha_k \phi(y_j \mid \theta_k))\\
&=\prod_{j=1}^n \left(\prod_{i =1 }^K \left(\alpha_k \phi(y_j \mid \theta_k)\right)^{\gamma_{jk}}\right)\\
&=\prod_{j=1}^n \prod_{i =1 }^K \left(\alpha_k \phi(y_j \mid \theta_k)\right)^{\gamma_{jk}}\\
&=\prod_{j=1}^n \prod_{i =1 }^K \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)^{\gamma_{jk}}\\
\end{aligned}
P(y,γ∣θ)=L(θ)=j=1∏nP(y1,γj1,⋯,γjK∣θ)=j=1∏n(γjki=1∑Kαkϕ(yj∣θk))=j=1∏n(i=1∏K(αkϕ(yj∣θk))γjk)=j=1∏ni=1∏K(αkϕ(yj∣θk))γjk=j=1∏ni=1∏K(αk2πσk1exp(−2σk2(yj−μk)2))γjk
那么完全数据的对数似然函数为:
logP(y,γ∣θ)=ℓ(θ)=∑i=1K∑j=1nlog(αk12πσkexp(−(yj−μk)22σk2))γjk=∑i=1K∑j=1nγjklog(αk12πσkexp(−(yj−μk)22σk2))=∑i=1K∑j=1nγjklogαk+∑i=1K∑j=1nγjklog(12πσkexp(−(yj−μk)22σk2))=∑i=1Klogαk∑j=1nγjk+∑i=1K∑j=1nγjklog(12πσkexp(−(yj−μk)22σk2))=∑i=1Klogαk∑j=1nγjk+∑i=1K∑j=1nγjk[log12π−logσk+(−(yj−μk)22σk2)]
\begin{aligned}
\log P(y,\gamma \mid \theta)
&= \ell (\theta)\\
&=\sum_{i=1}^K \sum_{j=1}^n \log \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)^{\gamma_{jk}}\\
&=\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\alpha_k \frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\
&=\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \alpha_k +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\
&=\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\log \left(\frac{1}{\sqrt{2 \pi}\sigma_k}\exp\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right)\\
&=\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\\
\end{aligned}
logP(y,γ∣θ)=ℓ(θ)=i=1∑Kj=1∑nlog(αk2πσk1exp(−2σk2(yj−μk)2))γjk=i=1∑Kj=1∑nγjklog(αk2πσk1exp(−2σk2(yj−μk)2))=i=1∑Kj=1∑nγjklogαk+i=1∑Kj=1∑nγjklog(2πσk1exp(−2σk2(yj−μk)2))=i=1∑Klogαkj=1∑nγjk+i=1∑Kj=1∑nγjklog(2πσk1exp(−2σk2(yj−μk)2))=i=1∑Klogαkj=1∑nγjk+i=1∑Kj=1∑nγjk[log2π1−logσk+(−2σk2(yj−μk)2)]
2、E步:确定Q函数
Q(θ,θ(i))=E[logP(y,γ∣θ)∣y,θ(i)]=E{∑i=1Klogαk∑j=1nγjk+∑i=1K∑j=1nγjk[log12π−logσk+(−(yj−μk)22σk2)]}=∑i=1K{logαk∑j=1nEγjk+∑j=1nEγjk[log12π−logσk+(−(yj−μk)22σk2)]}
\begin{aligned}
Q(\theta,\theta^{(i)}) &=E[ \log P(y,\gamma \mid \theta)\mid y,\theta^{(i)} ]\\
&=E\left\{\sum_{i=1}^K\log \alpha_k\sum_{j=1}^n {\gamma_{jk}} +\sum_{i=1}^K\sum_{j=1}^n {\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\right\}\\
&=\sum_{i=1}^K\left\{\log \alpha_k\sum_{j=1}^n {E\gamma_{jk}} +\sum_{j=1}^n {E\gamma_{jk}}\left [ \log\frac{1}{\sqrt{2 \pi}}-\log\sigma_k+\left({-\frac{(y_j-\mu_k)^2}{2\sigma_k^2}}\right)\right]\right\}\\
\end{aligned}
Q(θ,θ(i))=E[logP(y,γ∣θ)∣y,θ(i)]=E{i=1∑Klogαkj=1∑nγjk+i=1∑Kj=1∑nγjk[log2π1−logσk+(−2σk2(yj−μk)2)]}=i=1∑K{logαkj=1∑nEγjk+j=1∑nEγjk[log2π1−logσk+(−2σk2(yj−μk)2)]}
记γ^jk=E(γjk∣y,θ)\hat \gamma_{jk}= E(\gamma_{jk}\mid y,\theta)γ^jk=E(γjk∣y,θ)则:
γ^jk=1⋅P(γjk=1∣y,θ)+0⋅P(γjk=0∣y,θ)=P(γjk=1,yj∣θ)1=P(γjk=1,yj∣θ)∑k=1KP(γjk=1,yj∣θ)=P(yj∣γjk=1,θ)P(γjk=1∣θ)∑k=1KP(yj∣γjk=1,θ)P(γjk=1∣θ)=αkϕ(yj∣θk)∑k=1Kαkϕ(yj∣θk)
\begin{aligned}
\hat \gamma_{jk} &= 1\cdot P(\gamma_{jk} = 1 \mid y,\theta)+ 0\cdot P(\gamma_{jk} = 0 \mid y,\theta)\\
&=\frac {P(\gamma_{jk} = 1 ,y_j\mid \theta)}{1}\\
&=\frac {P(\gamma_{jk} = 1 ,y_j\mid \theta)}{\sum_{k=1}^KP(\gamma_{jk} = 1 ,y_j\mid \theta)}\\
&=\frac {P(y_j\mid \gamma_{jk} = 1 ,\theta)P(\gamma_{jk} = 1\mid \theta)}{\sum_{k=1}^KP(y_j\mid \gamma_{jk} = 1 ,\theta)P(\gamma_{jk} = 1\mid \theta)}\\
&=\frac{\alpha_k \phi(y_j\mid \theta_{k})}{\sum_{k=1}^K\alpha_k \phi(y_j\mid \theta_{k})}\\
\end{aligned}
γ^jk=1⋅P(γjk=1∣y,θ)+0⋅P(γjk=0∣y,θ)=1P(γjk=1,yj∣θ)=∑k=1KP(γjk=1,yj∣θ)P(γjk=1,yj∣θ)=∑k=1KP(yj∣γjk=1,θ)P(γjk=1∣θ)P(yj∣γjk=1,θ)P(γjk=1∣θ)=∑k=1Kαkϕ(yj∣θk)αkϕ(yj∣θk)
γ^jk\hat \gamma_{jk}γ^jk是在当前模型下第jjj个观测数据来自第kkk个分模型的概率,成为分模型kkk对观测数据yjy_jyj的响应度。
3、M步
迭代中的M步是求函数Q(θ,θ(i))Q(\theta, \theta^{(i)})Q(θ,θ(i))对θ\thetaθ的极大值,即:
θ(i+1)=argmaxθ  Q(θ,θ(i))\theta^{(i+1)} =\arg \underset{\theta}{\max} \; Q(\theta,\theta^{(i)})θ(i+1)=argθmaxQ(θ,θ(i))
依据QQQ函数对各个参数(μk,σk2,αk\mu_k,\sigma_k^2,\alpha_kμk,σk2,αk)求偏导并置其为0得到新的参数为:
μ^k=∑j=1nγ^jkyj∑j=1nγ^jkσ^k2=∑j=1nγ^jk(yj−μk)2∑j=1nγ^jkα^k=∑j=1nγ^jkn\begin{aligned}
\hat \mu_k &= \frac{\sum_{j=1}^n \hat \gamma_{jk}y_j}{\sum_{j=1}^n\hat \gamma_{jk}}\\
\hat \sigma_k^2 &= \frac{\sum_{j=1}^n \hat \gamma_{jk}(y_j-\mu_k)^2}{\sum_{j=1}^n\hat \gamma_{jk}}\\
\hat \alpha_k &= \frac{\sum_{j=1}^n \hat \gamma_{jk}}{n}\\
\end{aligned}μ^kσ^k2α^k=∑j=1nγ^jk∑j=1nγ^jkyj=∑j=1nγ^jk∑j=1nγ^jk(yj−μk)2=n∑j=1nγ^jk
4、重复迭代,直到参数收敛即可。
算法流程
输入:观测数据YYY,高斯混合模型;
输出:高斯混合模型参数
(1)为参数赋予初值;
(2)E步,计算分模型对观测数据的响应度;
(3)M步,计算新一轮的迭代模型参数;
(4)重复(2)(3)步,直到模型参数收敛。
推广
EM算法可以解释为F函数的极大-极大算法,可以推广为广义期望极大算法(GEM)。
Reference:
[1]李航:《统计学习方法》