统计学习方法第十八章——概率潜在语义分析

本文详细解析了概率潜在语义分析(PLSA)中的对数似然函数,并介绍了E步计算Q函数的过程,通过拉格朗日乘子法推导出M步参数估计公式。核心内容包括似然函数的简化、Q函数的下界求解以及参数更新方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

18.2 概率潜在语义分析的算法

生成模型的对数似然函数是:
L=∑i=1M∑j=1Nn(wi,dj)logP(wi,dj)=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(wi∣zk)P(zk∣dj)P(dj)]=∑i=1M∑j=1Nn(wi,dj)[logP(dj)+log(∑k=1P(wi∣zk)P(zk∣dj))]=∑i=1M∑j=1Nn(wi,dj)logP(dj)+∑i=1M∑j=1Nn(wi,dj)log(∑k=1P(wi∣zk)P(zk∣dj)) \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)logP(w_i,d_j)\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)P(d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)[logP(d_j)+log(\sum_{k=1}P(w_i|z_k)P(z_k|d_j))]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)logP(d_j)+\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log(\sum_{k=1}P(w_i|z_k)P(z_k|d_j)) \end{aligned} L=i=1Mj=1Nn(wi,dj)logP(wi,dj)=i=1Mj=1Nn(wi,dj)log[k=1P(wizk)P(zkdj)P(dj)]=i=1Mj=1Nn(wi,dj)[logP(dj)+log(k=1P(wizk)P(zkdj))]=i=1Mj=1Nn(wi,dj)logP(dj)+i=1Mj=1Nn(wi,dj)log(k=1P(wizk)P(zkdj))
又因为前半部分是一个常数,与模型参数无关,于是将其省去,就得到书上的似然函数:
L=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(wi∣zk)P(zk∣dj)] L=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)] L=i=1Mj=1Nn(wi,dj)log[k=1P(wizk)P(zkdj)]
E步:计算Q函数
L=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(wi∣zk)P(zk∣dj)]=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)P(zk∣wi,dj)] \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(w_i|z_k)P(z_k|d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(z_k|w_i,d_j)\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}] \end{aligned} L=i=1Mj=1Nn(wi,dj)log[k=1P(wizk)P(zkdj)]=i=1Mj=1Nn(wi,dj)log[k=1P(zkwi,dj)P(zkwi,dj)P(wizk)P(zkdj)]
上式中, 由Jensen不等式:
log⁡∑jλjyj≥∑jλjlog⁡yjλj≥0,∑jλj=1 \log \sum_{j} \lambda_{j} y_{j} \geq \sum_{j} \lambda_{j} \log y_{j} \quad \lambda_{j} \geq 0, \sum_{j} \lambda_{j}=1 logjλjyjjλjlogyjλj0,jλj=1

L=∑i=1M∑j=1Nn(wi,dj)log[∑k=1P(zk∣wi,dj)P(wi∣zk)P(zk∣dj)P(zk∣wi,dj)]⩾∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)P(zk∣wi,dj)] \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)log[\sum_{k=1}P(z_k|w_i,d_j)\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]\\ &\geqslant\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}] \end{aligned} L=i=1Mj=1Nn(wi,dj)log[k=1P(zkwi,dj)P(zkwi,dj)P(wizk)P(zkdj)]i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)log[P(zkwi,dj)P(wizk)P(zkdj)]

得到L的下界:
L=∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)P(zk∣wi,dj)]=∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)[log[P(wi∣zk)P(zk∣dj)]−logP(zk∣wi,dj)]=∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)]−∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)logP(zk∣wi,dj) \begin{aligned} L&=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[\frac{P(w_i|z_k)P(z_k|d_j)}{P(z_k|w_i,d_j)}]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)[log[P(w_i|z_k)P(z_k|d_j)]-logP(z_k|w_i,d_j)]\\ &=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[P(w_i|z_k)P(z_k|d_j)]-\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)logP(z_k|w_i,d_j) \end{aligned} L=i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)log[P(zkwi,dj)P(wizk)P(zkdj)]=i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)[log[P(wizk)P(zkdj)]logP(zkwi,dj)]=i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)log[P(wizk)P(zkdj)]i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)logP(zkwi,dj)
又因为在极大化Q函数时,对P(wi∣zk) 和 P(zk∣dj) 求偏导数 P\left(w_{i} \mid z_{k}\right) \text { 和 } P\left(z_{k} \mid d_{j}\right) \text { 求偏导数 }P(wizk)  P(zkdj) 求偏导数 ,后半部分偏导数为0,所以可以直接在这里将其省去,当然也可以留着,反正后面求导都会为0。因此
Q=∑i=1M∑j=1Nn(wi,dj)∑k=1KP(zk∣wi,dj)log[P(wi∣zk)P(zk∣dj)] Q=\sum_{i=1}^M\sum_{j=1}^Nn(w_i,d_j)\sum_{k=1}^KP(z_k|w_i,d_j)log[P(w_i|z_k)P(z_k|d_j)] Q=i=1Mj=1Nn(wi,dj)k=1KP(zkwi,dj)log[P(wizk)P(zkdj)]
就得到了书上的Q′Q^{\prime}Q函数。其中
P(zk∣wi,dj)=P(wi∣zk)P(zk∣dj)∑k=1KP(wi∣zk)P(zk∣dj) P\left(z_{k} \mid w_{i}, d_{j}\right)=\frac{P\left(w_{i} \mid z_{k}\right) P\left(z_{k} \mid d_{j}\right)}{\sum_{k=1}^{K} P\left(w_{i} \mid z_{k}\right) P\left(z_{k} \mid d_{j}\right)} P(zkwi,dj)=k=1KP(wizk)P(zkdj)P(wizk)P(zkdj)
M步:极大化Q函数

因为变量 P(wi∣zk),P(zk∣dj)P\left(w_{i} \mid z_{k}\right), P\left(z_{k} \mid d_{j}\right)P(wizk),P(zkdj) 形成概率分布, 满足约束条件
∑i=1MP(wi∣zk)=1,k=1,2,⋯ ,K∑k=1KP(zk∣dj)=1,j=1,2,⋯ ,N \begin{aligned} &\sum_{i=1}^{M} P\left(w_{i} \mid z_{k}\right)=1, \quad k=1,2, \cdots, K \\ &\sum_{k=1}^{K} P\left(z_{k} \mid d_{j}\right)=1, \quad j=1,2, \cdots, N \end{aligned} i=1MP(wizk)=1,k=1,2,,Kk=1KP(zkdj)=1,j=1,2,,N
应用拉格朗日法, 引入拉格朗日乘子 τk\tau_{k}τkρj\rho_{j}ρj, 定义拉格朗日函数 Λ\LambdaΛ
Λ=Q′+∑k=1Kτk(1−∑i=1MP(wi∣zk))+∑j=1Nρj(1−∑k=1KP(zk∣dj)) \Lambda=Q^{\prime}+\sum_{k=1}^{K} \tau_{k}\left(1-\sum_{i=1}^{M} P\left(w_{i} \mid z_{k}\right)\right)+\sum_{j=1}^{N} \rho_{j}\left(1-\sum_{k=1}^{K} P\left(z_{k} \mid d_{j}\right)\right) Λ=Q+k=1Kτk(1i=1MP(wizk))+j=1Nρj(1k=1KP(zkdj))
将拉格朗日函数 Λ\LambdaΛ 分别对 P(wi∣zk)P\left(w_{i} \mid z_{k}\right)P(wizk)P(zk∣dj)P\left(z_{k} \mid d_{j}\right)P(zkdj) 求偏导数, 并令其等于 0 , 得到下面的方程组
∑j=1Nn(wi,dj)P(zk∣wi,dj)−τkP(wi∣zk)=0,i=1,2,⋯ ,M;k=1,2,⋯ ,K∑i=1Mn(wi,dj)P(zk∣wi,dj)−ρjP(zk∣dj)=0,j=1,2,⋯ ,N;k=1,2,⋯ ,K \begin{aligned} &\sum_{j=1}^{N} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)-\tau_{k} P\left(w_{i} \mid z_{k}\right)=0, \quad i=1,2, \cdots, M ; \quad k=1,2, \cdots, K\\ &\sum_{i=1}^{M} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)-\rho_{j} P\left(z_{k} \mid d_{j}\right)=0, \quad j=1,2, \cdots, N ; \quad k=1,2, \cdots, K \end{aligned} j=1Nn(wi,dj)P(zkwi,dj)τkP(wizk)=0,i=1,2,,M;k=1,2,,Ki=1Mn(wi,dj)P(zkwi,dj)ρjP(zkdj)=0,j=1,2,,N;k=1,2,,K
现求解τk和ρj\tau_k和\rho_jτkρj,两边分别同时对i和k求和得到:
∑i=1M∑i=1Mn(wi,dj)P(zk∣wj,dj)=∑i=1MτkP(wi∣zk)=τk∑k=1K∑i=1Mn(wi,dj)P(zk∣wi,dj)=∑k=1KρjP(zk∣dj)=ρj \begin{aligned} &\sum_{i=1}^M\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_j,d_j)=\sum_{i=1}^M\tau_kP(w_i|z_k)=\tau_k\\ &\sum_{k=1}^K\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_i,d_j)=\sum_{k=1}^K\rho_jP(z_k|d_j)=\rho_j \end{aligned} i=1Mi=1Mn(wi,dj)P(zkwj,dj)=i=1MτkP(wizk)=τkk=1Ki=1Mn(wi,dj)P(zkwi,dj)=k=1KρjP(zkdj)=ρj
于是得到:
ρj=∑k=1K∑i=1Mn(wi,dj)P(zk∣wj,dj)=∑i=1Mn(wi,dj)=n(dj)τk=∑j=1N∑i=1Mn(wi,dj)P(zk∣wi,dj) \begin{aligned} \rho_j&=\sum_{k=1}^K\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_j,d_j)=\sum_{i=1}^Mn(w_i,d_j)=n(d_j)\\ \tau_k&=\sum_{j=1}^N\sum_{i=1}^Mn(w_i,d_j)P(z_k|w_i,d_j) \end{aligned} ρjτk=k=1Ki=1Mn(wi,dj)P(zkwj,dj)=i=1Mn(wi,dj)=n(dj)=j=1Ni=1Mn(wi,dj)P(zkwi,dj)

将求得的τk和ρj\tau_k和\rho_jτkρj代回方程组得参数估计公式:
P(wi∣zk)=∑j=1Nn(wi,dj)P(zk∣wi,dj)∑m=1M∑j=1Nn(wm,dj)P(zk∣wm,dj)P(zk∣dj)=∑i=1Mn(wi,dj)P(zk∣wi,dj)n(dj) \begin{aligned} &P\left(w_{i} \mid z_{k}\right)=\frac{\sum_{j=1}^{N} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)}{\sum_{m=1}^{M} \sum_{j=1}^{N} n\left(w_{m}, d_{j}\right) P\left(z_{k} \mid w_{m}, d_{j}\right)}\\ &P\left(z_{k} \mid d_{j}\right)=\frac{\sum_{i=1}^{M} n\left(w_{i}, d_{j}\right) P\left(z_{k} \mid w_{i}, d_{j}\right)}{n\left(d_{j}\right)} \end{aligned} P(wizk)=m=1Mj=1Nn(wm,dj)P(zkwm,dj)j=1Nn(wi,dj)P(zkwi,dj)P(zkdj)=n(dj)i=1Mn(wi,dj)P(zkwi,dj)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值