1.LDA推导
(2)似然概率
一个词
W
m
n
\mathrm{W}_{\mathrm{mn}}
Wmn初始化为一个词t的概率为
p
(
w
m
,
n
=
t
∣
ϑ
⃗
m
,
Φ
‾
)
=
∑
k
=
1
K
p
(
w
m
,
n
=
t
∣
φ
⃗
k
)
p
(
z
m
,
n
=
k
∣
ϑ
⃗
m
)
p\left(w_{m, n}=t | \vec{\vartheta}_{m}, \underline{\Phi}\right)=\sum_{k=1}^{K} p\left(w_{m, n}=t | \vec{\varphi}_{k}\right) p\left(z_{m, n}=k | \vec{\vartheta}_{m}\right)
p(wm,n=t∣ϑm,Φ)=k=1∑Kp(wm,n=t∣φk)p(zm,n=k∣ϑm)
每个文档中出现主题k的概率乘以主题k下出现词t的概率,然后枚举所有主题求和得到,整个文档集合的似然函数为:
p
(
W
∣
Θ
‾
,
Φ
‾
)
=
∏
m
=
1
M
p
(
w
⃗
m
∣
ϑ
⃗
m
,
Φ
‾
)
=
∏
m
=
1
M
∏
n
=
1
N
m
p
(
w
m
,
n
∣
ϑ
⃗
m
,
Φ
‾
)
p(\mathcal{W} | \underline{\Theta}, \underline{\Phi})=\prod_{m=1}^{M} p\left(\vec{w}_{m} | \vec{\vartheta}_{m}, \underline{\Phi}\right)=\prod_{m=1}^{M} \prod_{n=1}^{N_{m}} p\left(w_{m, n} | \vec{\vartheta}_{m}, \underline{\Phi}\right)
p(W∣Θ,Φ)=m=1∏Mp(wm∣ϑm,Φ)=m=1∏Mn=1∏Nmp(wm,n∣ϑm,Φ)
(3)Gibbs 采样
每次选取概率向量的一个维度,给定其他维度的变量值采样当前维度的值,不断迭代输出待估计的参数;
初始时随机给文本中每个词分配主题
Z
(
0
)
Z^{(0)}
Z(0),然后统计每个主题z下出现词t的数量以及每个文档m下出现主题z的数量,每一轮计算
p
(
z
i
∣
z
−
i
,
d
,
w
)
\mathrm{p}\left(\mathrm{z}_{\mathrm{i}} | \mathrm{z}_{-\mathrm{i}}, \mathrm{d}, \mathrm{w}\right)
p(zi∣z−i,d,w),即排除当前词的主题分布:根据所有其他词的主题分布估计当前词分配各个主题的概率;
当得到当前属于所有主题的z的概率分布以后,根据这个概率分布为该词采样一个新的主题;
用同样的方法更新下一个词的主题,直到发现每个文档的主题分布
θ
i
\theta_i
θi和每个主题的词分布
φ
j
\varphi_{j}
φj收敛,算法停止,输出待估计的参数
θ
\theta
θ和
ϕ
\phi
ϕ,同时每个单词的主题分布
Z
mn
Z_{\operatorname{mn}}
Zmn也得到;
实际过程中,会设置最大迭代次数。每一次计算
p
(
z
i
∣
z
−
i
,
d
,
w
)
\mathrm{p}\left(\mathrm{z}_{\mathrm{i}} | \mathrm{z}_{-\mathrm{i}}, \mathrm{d}, \mathrm{w}\right)
p(zi∣z−i,d,w)公式称为Gibbs Updating Rule。
(4)联合分布
p
(
w
⃗
,
z
⃗
∣
α
⃗
,
β
⃗
)
=
p
(
w
⃗
∣
z
⃗
,
β
⃗
)
p
(
z
⃗
∣
α
⃗
)
p(\vec{w}, \vec{z} | \vec{\alpha}, \vec{\beta})=p(\vec{w} | \vec{z}, \vec{\beta}) p(\vec{z} | \vec{\alpha})
p(w,z∣α,β)=p(w∣z,β)p(z∣α)
p
(
w
⃗
∣
z
⃗
,
β
⃗
)
=
∫
p
(
w
⃗
∣
z
⃗
,
Φ
‾
)
p
(
Φ
‾
∣
β
⃗
)
d
Φ
‾
(
β
^
)
=
∫
∏
z
=
1
K
1
Δ
(
β
⃗
)
∏
t
=
1
V
φ
z
,
t
n
z
(
t
)
+
β
t
−
1
d
φ
⃗
z
=
∏
z
=
1
K
Δ
(
n
⃗
z
+
β
⃗
)
Δ
(
β
⃗
)
,
n
⃗
z
=
{
n
z
(
t
)
}
t
=
1
V
\begin{aligned} p(\vec{w} | \vec{z}, \vec{\beta}) &=\int p(\vec{w} | \vec{z}, \underline{\Phi}) p(\underline{\Phi} | \vec{\beta}) \mathrm{d} \underline{\Phi}^{(\hat{\beta})} \\ &=\int \prod_{z=1}^{K} \frac{1}{\Delta(\vec{\beta})} \prod_{t=1}^{V} \varphi_{z, t}^{n_{z}^{(t)}+\beta_{t}-1} \mathrm{d} \vec{\varphi}_{z} \\ &=\prod_{z=1}^{K} \frac{\Delta\left(\vec{n}_{z}+\vec{\beta}\right)}{\Delta(\vec{\beta})}, \quad \vec{n}_{z}=\left\{n_{z}^{(t)}\right\}_{t=1}^{V} \end{aligned}
p(w∣z,β)=∫p(w∣z,Φ)p(Φ∣β)dΦ(β^)=∫z=1∏KΔ(β)1t=1∏Vφz,tnz(t)+βt−1dφz=z=1∏KΔ(β)Δ(nz+β),nz={nz(t)}t=1V
p
(
z
⃗
∣
α
⃗
)
=
∫
p
(
z
⃗
∣
Θ
‾
)
p
(
Θ
‾
∣
α
⃗
)
d
Θ
‾
=
∫
∏
m
=
1
M
1
Δ
(
α
⃗
)
∏
k
=
1
K
ϑ
m
,
k
n
m
(
k
)
+
α
k
−
1
d
ϑ
⃗
m
=
∏
m
=
1
M
Δ
(
n
⃗
m
+
α
⃗
)
Δ
(
α
⃗
)
,
n
⃗
m
=
{
n
m
(
k
)
}
k
=
1
K
\begin{aligned} p(\vec{z} | \vec{\alpha}) &=\int p(\vec{z} | \underline{\Theta}) p(\underline{\Theta} | \vec{\alpha}) \mathrm{d} \underline{\Theta} \\ &=\int \prod_{m=1}^{M} \frac{1}{\Delta(\vec{\alpha})} \prod_{k=1}^{K} \vartheta_{m, k}^{n_{m}^{(k)}+\alpha_{k}-1} \mathrm{d} \vec{\vartheta}_{m} \\ &=\prod_{m=1}^{M} \frac{\Delta\left(\vec{n}_{m}+\vec{\alpha}\right)}{\Delta(\vec{\alpha})}, \quad \vec{n}_{m}=\left\{n_{m}^{(k)}\right\}_{k=1}^{K} \end{aligned}
p(z∣α)=∫p(z∣Θ)p(Θ∣α)dΘ=∫m=1∏MΔ(α)1k=1∏Kϑm,knm(k)+αk−1dϑm=m=1∏MΔ(α)Δ(nm+α),nm={nm(k)}k=1K
(4)Gibbs Updating Rule
p
(
z
i
=
k
∣
z
⃗
¬
i
,
w
⃗
)
=
p
(
w
⃗
,
z
⃗
)
p
(
w
⃗
,
z
⃗
¬
i
)
=
p
(
w
⃗
∣
z
⃗
)
p
(
w
⃗
¬
i
∣
z
⃗
¬
i
)
p
(
w
i
)
⋅
p
(
z
⃗
)
p
(
z
⃗
¬
i
)
∝
Δ
(
n
⃗
z
+
β
⃗
)
Δ
(
n
⃗
z
,
i
+
β
⃗
)
⋅
Δ
(
n
⃗
m
+
α
⃗
)
Δ
(
n
⃗
m
,
−
i
+
α
⃗
)
=
Γ
(
n
k
(
t
)
+
β
t
)
Γ
(
∑
t
=
1
V
n
k
,
¬
i
(
t
)
+
β
t
)
Γ
(
n
k
,
−
i
(
t
)
+
β
t
)
Γ
(
∑
t
=
1
V
n
k
(
t
)
+
β
t
)
⋅
Γ
(
n
m
(
k
)
+
α
k
)
Γ
(
∑
k
=
1
K
n
m
,
−
i
(
k
)
+
α
k
)
Γ
(
n
m
,
¬
i
(
k
)
+
α
k
)
Γ
(
∑
k
=
1
K
n
m
(
k
)
+
α
k
)
=
n
k
,
−
i
(
t
)
+
β
t
∑
t
=
1
V
n
k
,
+
i
(
t
)
+
β
t
⋅
n
m
,
−
i
(
k
)
+
α
k
[
∑
k
=
1
K
n
m
(
k
)
+
α
k
]
−
1
∝
n
k
,
¬
i
(
t
)
+
β
t
∑
t
=
1
V
n
k
,
¬
i
(
t
)
+
β
t
(
n
m
,
−
i
(
k
)
+
α
k
)
\begin{aligned} p\left(z_{i}=k | \vec{z}_{\neg i}, \vec{w}\right)=& \frac{p(\vec{w}, \vec{z})}{p\left(\vec{w}, \vec{z}_{\neg i}\right)}=\frac{p(\vec{w} | \vec{z})}{p\left(\vec{w}_{\neg i} | \vec{z}_{\neg i}\right) p\left(w_{i}\right)} \cdot \frac{p(\vec{z})}{p\left(\vec{z}_{\neg i}\right)} \\ & \propto \frac{\Delta\left(\vec{n}_{z}+\vec{\beta}\right)}{\Delta\left(\vec{n}_{z, i}+\vec{\beta}\right)} \cdot \frac{\Delta\left(\vec{n}_{m}+\vec{\alpha}\right)}{\Delta\left(\vec{n}_{m,-i}+\vec{\alpha}\right)} \\ &=\frac{\Gamma\left(n_{k}^{(t)}+\beta_{t}\right) \Gamma\left(\sum_{t=1}^{V} n_{k, \neg i}^{(t)}+\beta_{t}\right)}{\Gamma\left(n_{k,-i}^{(t)}+\beta_{t}\right) \Gamma\left(\sum_{t=1}^{V} n_{k}^{(t)}+\beta_{t}\right)} \cdot \frac{\Gamma\left(n_{m}^{(k)}+\alpha_{k}\right) \Gamma\left(\sum_{k=1}^{K} n_{m,-i}^{(k)}+\alpha_{k}\right)}{\Gamma\left(n_{m, \neg i}^{(k)}+\alpha_{k}\right) \Gamma\left(\sum_{k=1}^{K} n_{m}^{(k)}+\alpha_{k}\right)} \\ &=\frac{n_{k,-i}^{(t)}+\beta_{t}}{\sum_{t=1}^{V} n_{k,+i}^{(t)}+\beta_{t}} \cdot \frac{n_{m,-i}^{(k)}+\alpha_{k}}{\left[\sum_{k=1}^{K} n_{m}^{(k)}+\alpha_{k}\right]-1} \\ & \propto \frac{n_{k, \neg i}^{(t)}+\beta_{t}}{\sum_{t=1}^{V} n_{k, \neg i}^{(t)}+\beta_{t}}\left(n_{m,-i}^{(k)}+\alpha_{k}\right) \end{aligned}
p(zi=k∣z¬i,w)=p(w,z¬i)p(w,z)=p(w¬i∣z¬i)p(wi)p(w∣z)⋅p(z¬i)p(z)∝Δ(nz,i+β)Δ(nz+β)⋅Δ(nm,−i+α)Δ(nm+α)=Γ(nk,−i(t)+βt)Γ(∑t=1Vnk(t)+βt)Γ(nk(t)+βt)Γ(∑t=1Vnk,¬i(t)+βt)⋅Γ(nm,¬i(k)+αk)Γ(∑k=1Knm(k)+αk)Γ(nm(k)+αk)Γ(∑k=1Knm,−i(k)+αk)=∑t=1Vnk,+i(t)+βtnk,−i(t)+βt⋅[∑k=1Knm(k)+αk]−1nm,−i(k)+αk∝∑t=1Vnk,¬i(t)+βtnk,¬i(t)+βt(nm,−i(k)+αk)
(5)词分布和主题分布
φ
k
,
t
=
n
k
(
t
)
+
β
t
∑
t
=
1
V
n
k
(
t
)
+
β
t
ϑ
m
,
k
=
n
m
(
k
)
+
α
k
∑
k
=
1
K
n
m
(
k
)
+
α
k
\begin{aligned} \varphi_{k, t} &=\frac{n_{k}^{(t)}+\beta_{t}}{\sum_{t=1}^{V} n_{k}^{(t)}+\beta_{t}} \\ \vartheta_{m, k} &=\frac{n_{m}^{(k)}+\alpha_{k}}{\sum_{k=1}^{K} n_{m}^{(k)}+\alpha_{k}} \end{aligned}
φk,tϑm,k=∑t=1Vnk(t)+βtnk(t)+βt=∑k=1Knm(k)+αknm(k)+αk
p
(
ϑ
⃗
m
∣
x
⃗
m
,
α
⃗
)
=
1
Z
ϑ
m
∏
n
=
1
N
m
p
(
z
m
,
n
∣
ϑ
⃗
m
)
⋅
p
(
ϑ
⃗
m
∣
α
⃗
)
=
Dir
(
ϑ
⃗
m
∣
n
⃗
m
+
α
⃗
)
p
(
φ
⃗
k
∣
z
⃗
,
w
⃗
,
β
⃗
)
=
1
Z
φ
k
∏
{
i
:
z
i
=
k
}
p
(
w
i
∣
φ
⃗
k
)
⋅
p
(
φ
⃗
k
∣
β
⃗
)
=
Dir
(
φ
⃗
k
∣
n
⃗
k
+
β
⃗
)
\begin{aligned} p\left(\vec{\vartheta}_{m} | \vec{x}_{m}, \vec{\alpha}\right) &=\frac{1}{Z_{\vartheta_{m}}} \prod_{n=1}^{N_{m}} p\left(z_{m, n} | \vec{\vartheta}_{m}\right) \cdot p\left(\vec{\vartheta}_{m} | \vec{\alpha}\right)=\operatorname{Dir}\left(\vec{\vartheta}_{m} | \vec{n}_{m}+\vec{\alpha}\right) \\ p\left(\vec{\varphi}_{k} | \vec{z}, \vec{w}, \vec{\beta}\right) &=\frac{1}{Z_{\varphi_{k}}} \prod_{\left\{i: z_{i}=k\right\}} p\left(w_{i} | \vec{\varphi}_{k}\right) \cdot p\left(\vec{\varphi}_{k} | \vec{\beta}\right)=\operatorname{Dir}\left(\vec{\varphi}_{k} | \vec{n}_{k}+\vec{\beta}\right) \end{aligned}
p(ϑm∣xm,α)p(φk∣z,w,β)=Zϑm1n=1∏Nmp(zm,n∣ϑm)⋅p(ϑm∣α)=Dir(ϑm∣nm+α)=Zφk1{i:zi=k}∏p(wi∣φk)⋅p(φk∣β)=Dir(φk∣nk+β)