pLSA图模型笔记

pLSA图模型

求解目标p(W,D) ,整个文档集合的出现概率,使得p(W,D) 最大化,即为plsa的目标:

下面咱们开始求解p(W,D):

 

p(W,D) = \prod_{m}^{M} \prod_{n}^{N} p(w_{n},d_{m}) \\=\prod_{i} \prod_{j} p(w_{j},d_{i})^{n(w_{j},d_{i})} 

n(w_{j},d_{i})为 词wj在文档di中出现的次数;

注释: p(w_{n},d_{m})为第m篇文档中,第n个词出现的概率,p(w_{j},d_{i})为第i篇文档中词表中第i个词出现的概率,由于di

log(p(W,D)) =\sum_{i} \sum_{j} n(w_{j},d_{i}) log(p(w_{j},d_{i}) ) \\= \sum_{i} \sum_{j} n(w_{j},d_{i}) log(p(w_{j}|d_{i}) *p(d_{i})) \\\propto \sum_{i} \sum_{j} n(w_{j},d_{i}) log(p(w_{j}|d_{i})) \\= \sum_{i} \sum_{j} n(w_{j},d_{i}) log(\sum_{k=1}^{K} p(w_{j}|z_{k})p(z_{k}|d_{i}) ) \\= \sum_{i} \sum_{j} n(w_{j},d_{i}) \sum_{k=1}^{K} p(w_{j}|z_{k})p(z_{k}|d_{i})

利用EM算法求Q(z_{k})

log(p(W,D)) = \sum_{i} \sum_{j} n(w_{j},d_{i}) \sum_{k=1}^{K} \frac{p(w_{j}|z_{k})p(z_{k}|d_{i})}{Q(z_{k})} Q(z_{k}) \\\propto \sum_{i} \sum_{j} n(w_{j},d_{i}) log( \sum_{k=1}^{K} p(w_{j}|z_{k})p(z_{k}|d_{i}) Q(z_{k})) \\\geqslant \sum_{i} \sum_{j} n(w_{j},d_{i}) \sum_{k=1}^{K} Q(z_{k}) log(p(w_{j}|z_{k})p(z_{k}|d_{i})) \\= \sum_{i} \sum_{j} n(w_{j},d_{i}) E_{z}(log(p(w_{j}|z_{k})p(z_{k}|d_{i})))

Q(z_{k}) = p(z_{k}|w_{j},d_{i}) \\= \frac{p(z_{k},w_{j},d_{i})}{\sum_{k=1}^{K} p(z_{k},w_{j},d_{i})} \\= \frac{p(z_{k},w_{j},d_{i})}{\sum_{k=1}^{K} p(z_{k},w_{j},d_{i})} \\= \frac{p(w_{j}|z_{k}) p(z_{k}|d_{i})p(d_{i})}{\sum_{k=1}^{K} p(w_{j}|z_{k}) p(z_{k}|d_{i})p(d_{i})} \\= \frac{p(w_{j}|z_{k}) p(z_{k}|d_{i})}{\sum_{k=1}^{K} p(w_{j}|z_{k}) p(z_{k}|d_{i})}

至此,我们就可以用em算法迭代求解了

E:Q(z_{k}) = \frac{p(w_{j}|z_{k}) p(z_{k}|d_{i})}{\sum_{k=1}^{K} p(w_{j}|z_{k}) p(z_{k}|d_{i})}

M:max(log(p(W,D))) = max(\sum_{i} \sum_{j} n(w_{j},d_{i}) E_{z}(log(p(w_{j}|z_{k})p(z_{k}|d_{i}))))

其中p(w_{j}|z_{k})p(z_{k}|d_{i})需要用偏导数max(log(p(W,D)))=0求出

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值