18.2 概率潜在语义分析的算法
生成模型的对数似然函数是:
L=∑i=1M∑j=1Nn(wi,dj)logP(wi,dj)=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(wi∣zk)P(zk∣dj)P(dj)]=∑i=1M∑j=1Nn(wi,dj)[logP(dj)+log(∑k=1P(wi∣zk)P(zk∣dj))]=∑i=1M∑j=1Nn(wi,dj)logP(dj)+∑i=1M∑j=1Nn(wi,dj)log(∑k=1P(wi∣zk)P(zk∣dj))
\begin{aligned}
L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)logP(w_i,d_j)\\
&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)P(d_j)]\\
&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)[logP(d_j)+log(\sum_{k=1}P(w_i|z_k)P(z_k|d_j))]\\
&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)logP(d_j)+\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log(\sum_{k=1}P(w_i|z_k)P(z_k|d_j))
\end{aligned}
L=i=1∑Mj=1∑Nn(wi,dj)logP(wi,dj)=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(wi∣zk)P(zk∣dj)P(dj)]=i=1∑Mj=1∑Nn(wi,dj)[logP(dj)+log(k=1∑P(wi∣zk)P(zk∣dj))]=i=1∑Mj=1∑Nn(wi,dj)logP(dj)+i=1∑Mj=1∑Nn(wi,dj)log(k=1∑P(wi∣zk)P(zk∣dj))
又因为前半部分是一个常数,与模型参数无关,于是将其省去,就得到书上的似然函数:
L=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(wi∣zk)P(zk∣dj)]
L=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)]
L=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(wi∣zk)P(zk∣dj)]
E步:计算Q函数
L=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(wi∣zk)P(zk∣dj)]=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)P(zk∣wi,dj)]
\begin{aligned}
L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)]\\
&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(z_k|w_i,d_j)\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]
\end{aligned}
L=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(wi∣zk)P(zk∣dj)]=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(zk∣wi,dj)P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)]
上式中, 由Jensen不等式:
log∑jλjyj≥∑jλjlogyjλj≥0,∑jλj=1
\log \sum_{j} \lambda_{j} y_{j} \geq \sum_{j} \lambda_{j} \log y_{j} \quad \lambda_{j} \geq 0, \sum_{j} \lambda_{j}=1
logj∑λjyj≥j∑λjlogyjλj≥0,j∑λj=1
L=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)P(zk∣wi,dj)]⩾∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)P(zk∣wi,dj)] \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(z_k|w_i,d_j)\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]\\ &\geqslant\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}] \end{aligned} L=i=1∑Mj=1∑Nn(wi,dj)log[k=1∑P(zk∣wi,dj)P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)]⩾i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)log[P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)]
得到L的下界:
L=∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)P(zk∣wi,dj)]=∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)[log[P(wi∣zk)P(zk∣dj)]−logP(zk∣wi,dj)]=∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)]−∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)logP(zk∣wi,dj)
\begin{aligned}
L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]\\
&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)[log[P(w_i|z_k)P(z_k|d_j)]-logP(z_k|w_i,d_j)]\\
&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[P(w_i|z_k)P(z_k|d_j)]-\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)logP(z_k|w_i,d_j)
\end{aligned}
L=i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)log[P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)]=i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)[log[P(wi∣zk)P(zk∣dj)]−logP(zk∣wi,dj)]=i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)]−i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)logP(zk∣wi,dj)
又因为在极大化Q函数时,对P(wi∣zk) 和 P(zk∣dj) 求偏导数 P\left(w_{i} \mid z_{k}\right) \text { 和 } P\left(z_{k} \mid d_{j}\right) \text { 求偏导数 }P(wi∣zk) 和 P(zk∣dj) 求偏导数 ,后半部分偏导数为0,所以可以直接在这里将其省去,当然也可以留着,反正后面求导都会为0。因此
Q=∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)]
Q=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[P(w_i|z_k)P(z_k|d_j)]
Q=i=1∑Mj=1∑Nn(wi,dj)k=1∑KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)]
就得到了书上的Q′Q^{\prime}Q′函数。其中
P(zk∣wi,dj)=P(wi∣zk)P(zk∣dj)∑k=1KP(wi∣zk)P(zk∣dj)
P\left(z_{k} \mid w_{i}, d_{j}\right)=\frac{P\left(w_{i} \mid z_{k}\right) P\left(z_{k} \mid d_{j}\right)}{\sum_{k=1}^{K} P\left(w_{i} \mid z_{k}\right) P\left(z_{k} \mid d_{j}\right)}
P(zk∣wi,dj)=∑k=1KP(wi∣zk)P(zk∣dj)P(wi∣zk)P(zk∣dj)
M步:极大化Q函数
因为变量 P(wi∣zk),P(zk∣dj)P\left(w_{i} \mid z_{k}\right), P\left(z_{k} \mid d_{j}\right)P(wi∣zk),P(zk∣dj) 形成概率分布, 满足约束条件
∑i=1MP(wi∣zk)=1,k=1,2,⋯ ,K∑k=1KP(zk∣dj)=1,j=1,2,⋯ ,N
\begin{aligned}
&\sum_{i=1}^{M} P\left(w_{i} \mid z_{k}\right)=1, \quad k=1,2, \cdots, K \\
&\sum_{k=1}^{K} P\left(z_{k} \mid d_{j}\right)=1, \quad j=1,2, \cdots, N
\end{aligned}
i=1∑MP(wi∣zk)=1,k=1,2,⋯,Kk=1∑KP(zk∣dj)=1,j=1,2,⋯,N
应用拉格朗日法, 引入拉格朗日乘子 τk\tau_{k}τk 和 ρj\rho_{j}ρj, 定义拉格朗日函数 Λ\LambdaΛ
Λ=Q′+∑k=1Kτk(1−∑i=1MP(wi∣zk))+∑j=1Nρj(1−∑k=1KP(zk∣dj))
\Lambda=Q^{\prime}+\sum_{k=1}^{K} \tau_{k}\left(1-\sum_{i=1}^{M} P\left(w_{i} \mid z_{k}\right)\right)+\sum_{j=1}^{N} \rho_{j}\left(1-\sum_{k=1}^{K} P\left(z_{k} \mid d_{j}\right)\right)
Λ=Q′+k=1∑Kτk(1−i=1∑MP(wi∣zk))+j=1∑Nρj(1−k=1∑KP(zk∣dj))
将拉格朗日函数 Λ\LambdaΛ 分别对 P(wi∣zk)P\left(w_{i} \mid z_{k}\right)P(wi∣zk) 和 P(zk∣dj)P\left(z_{k} \mid d_{j}\right)P(zk∣dj) 求偏导数, 并令其等于 0 , 得到下面的方程组
∑j=1Nn(wi,dj)P(zk∣wi,dj)−τkP(wi∣zk)=0,i=1,2,⋯ ,M;k=1,2,⋯ ,K∑i=1Mn(wi,dj)P(zk∣wi,dj)−ρjP(zk∣dj)=0,j=1,2,⋯ ,N;k=1,2,⋯ ,K
\begin{aligned}
&\sum_{j=1}^{N} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)-\tau_{k} P\left(w_{i} \mid z_{k}\right)=0, \quad i=1,2, \cdots, M ; \quad k=1,2, \cdots, K\\
&\sum_{i=1}^{M} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)-\rho_{j} P\left(z_{k} \mid d_{j}\right)=0, \quad j=1,2, \cdots, N ; \quad k=1,2, \cdots, K
\end{aligned}
j=1∑Nn(wi,dj)P(zk∣wi,dj)−τkP(wi∣zk)=0,i=1,2,⋯,M;k=1,2,⋯,Ki=1∑Mn(wi,dj)P(zk∣wi,dj)−ρjP(zk∣dj)=0,j=1,2,⋯,N;k=1,2,⋯,K
现求解τk和ρj\tau_k和\rho_jτk和ρj,两边分别同时对i和k求和得到:
∑i=1M∑i=1Mn(wi,dj)P(zk∣wj,dj)=∑i=1MτkP(wi∣zk)=τk∑k=1K∑i=1Mn(wi,dj)P(zk∣wi,dj)=∑k=1KρjP(zk∣dj)=ρj
\begin{aligned}
&\sum_{i=1}^M\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_j,d_j)=\sum_{i=1}^M\tau_kP(w_i|z_k)=\tau_k\\
&\sum_{k=1}^K\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_i,d_j)=\sum_{k=1}^K\rho_jP(z_k|d_j)=\rho_j
\end{aligned}
i=1∑Mi=1∑Mn(wi,dj)P(zk∣wj,dj)=i=1∑MτkP(wi∣zk)=τkk=1∑Ki=1∑Mn(wi,dj)P(zk∣wi,dj)=k=1∑KρjP(zk∣dj)=ρj
于是得到:
ρj=∑k=1K∑i=1Mn(wi,dj)P(zk∣wj,dj)=∑i=1Mn(wi,dj)=n(dj)τk=∑j=1N∑i=1Mn(wi,dj)P(zk∣wi,dj)
\begin{aligned}
\rho_j&=\sum_{k=1}^K\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_j,d_j)=\sum_{i=1}^Mn(w_i,d_j)=n(d_j)\\
\tau_k&=\sum_{j=1}^N\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_i,d_j)
\end{aligned}
ρjτk=k=1∑Ki=1∑Mn(wi,dj)P(zk∣wj,dj)=i=1∑Mn(wi,dj)=n(dj)=j=1∑Ni=1∑Mn(wi,dj)P(zk∣wi,dj)
将求得的τk和ρj\tau_k和\rho_jτk和ρj代回方程组得参数估计公式:
P(wi∣zk)=∑j=1Nn(wi,dj)P(zk∣wi,dj)∑m=1M∑j=1Nn(wm,dj)P(zk∣wm,dj)P(zk∣dj)=∑i=1Mn(wi,dj)P(zk∣wi,dj)n(dj)
\begin{aligned}
&P\left(w_{i} \mid z_{k}\right)=\frac{\sum_{j=1}^{N} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)}{\sum_{m=1}^{M} \sum_{j=1}^{N} n\left(w_{m}, d_{j}\right) P\left(z_{k} \mid w_{m}, d_{j}\right)}\\
&P\left(z_{k} \mid d_{j}\right)=\frac{\sum_{i=1}^{M} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)}{n\left(d_{j}\right)}
\end{aligned}
P(wi∣zk)=∑m=1M∑j=1Nn(wm,dj)P(zk∣wm,dj)∑j=1Nn(wi,dj)P(zk∣wi,dj)P(zk∣dj)=n(dj)∑i=1Mn(wi,dj)P(zk∣wi,dj)