作者课堂笔记摘录,有问题请联系 humminwang@163.com
1 因子分析(Factor Analysis)
内容参考 http://blog.youkuaiyun.com/stdcoutzyx/article/details/37559995
高斯混合模型,当训练数据样本数目小于样本维度的时候,因为协方差矩阵的非奇异性,导致不能得到概率密度函数问题,对于其他模型来说,样本数小于样本维度,也容易引发过拟合的问题。
解决办法:加强模型假设,比如对协方差矩阵的限制。第二个就是降低模型的复杂度,提出一个更少参数模型,如因子分析。
限制协方差矩阵的方法:比如假设协方差矩阵为对角矩阵,更强的假设是协方差矩阵为对角且对角线上的值都相等。当需要估计完整协方差矩阵时,样本数目必须大于样本维度,但是当有对角假设时,样本数目大于1就可以估算出限制的协方差矩阵。
高斯分布矩阵表示:
设有三个变量
x
1
∈
R
r
,
x
2
∈
R
s
,
x
∈
R
r
+
s
x_1\in R^r,x_2\in R^s,x\in R^{r+s}
x1∈Rr,x2∈Rs,x∈Rr+s.
x
=
[
x
1
x
2
]
x=\begin{bmatrix}x_1\\x_2\end{bmatrix}
x=[x1x2]
假设
x
∼
N
(
μ
,
Σ
)
x\sim \N(\mu,\Sigma)
x∼N(μ,Σ),所以:
μ
=
[
μ
1
μ
2
]
,
Σ
=
[
Σ
11
Σ
12
Σ
21
Σ
22
]
\mu=\begin{bmatrix}\mu_1\\\mu_2\end{bmatrix},\quad \Sigma=\begin{bmatrix}\Sigma_{11}&\Sigma_{12}\\\Sigma_{21}&\Sigma_{22}\end{bmatrix}
μ=[μ1μ2],Σ=[Σ11Σ21Σ12Σ22]
其中
x
1
x_1
x1的边际分布可以得到:
E
[
x
1
]
=
μ
1
,
C
o
v
(
x
1
)
=
E
[
(
x
1
−
μ
1
)
(
x
1
−
μ
1
)
T
]
=
Σ
11
E[x_1]=\mu_1,\quad Cov(x_1)=E[(x_1-\mu_1)(x_1-\mu_1)^T]=\Sigma_{11}
E[x1]=μ1,Cov(x1)=E[(x1−μ1)(x1−μ1)T]=Σ11
所以对x我们可以得到:
C
o
v
(
x
)
=
Σ
=
[
Σ
11
Σ
12
Σ
21
Σ
22
]
=
E
[
(
x
−
μ
)
(
x
−
μ
)
T
]
Cov(x)=\Sigma=\begin{bmatrix}\Sigma_{11}&\Sigma_{12}\\\Sigma_{21}&\Sigma_{22}\end{bmatrix}=E[(x-\mu)(x-\mu)^T]
Cov(x)=Σ=[Σ11Σ21Σ12Σ22]=E[(x−μ)(x−μ)T]
.
.
.
=
E
[
[
x
1
−
μ
1
x
2
−
μ
2
]
[
x
1
−
μ
1
x
2
−
μ
2
]
T
]
=
E
[
(
x
1
−
μ
1
)
(
x
1
−
μ
1
)
T
(
x
1
−
μ
1
)
(
x
2
−
μ
2
)
T
(
x
2
−
μ
2
)
(
x
1
−
μ
1
)
T
(
x
2
−
μ
2
)
(
x
2
−
μ
2
)
T
]
...=E[\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\end{bmatrix}\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\end{bmatrix}^T]=E\begin{bmatrix}(x_1-\mu_1)(x_1-\mu_1)^T&(x_1-\mu_1)(x_2-\mu_2)^T\\(x_2-\mu_2)(x_1-\mu_1)^T&(x_2-\mu_2)(x_2-\mu_2)^T\end{bmatrix}
...=E[[x1−μ1x2−μ2][x1−μ1x2−μ2]T]=E[(x1−μ1)(x1−μ1)T(x2−μ2)(x1−μ1)T(x1−μ1)(x2−μ2)T(x2−μ2)(x2−μ2)T]
在给定
X
2
X_2
X2时
x
1
x_1
x1的概率是:
p
(
x
1
∣
x
2
)
=
p
(
x
1
,
x
2
)
p
(
x
2
)
=
p
(
x
)
p
(
x
2
)
p(x_1|x_2)=\frac{p(x_1,x_2)}{p(x_2)}=\frac{p(x)}{p(x_2)}
p(x1∣x2)=p(x2)p(x1,x2)=p(x2)p(x)
x
1
∣
x
2
∼
N
(
μ
1
∣
2
,
Σ
1
∣
2
)
x_1|x_2\sim \N(\mu_{1|2},\Sigma_{1|2})
x1∣x2∼N(μ1∣2,Σ1∣2)
μ
1
∣
2
=
μ
1
+
Σ
12
Σ
22
−
1
(
x
2
−
μ
2
)
\mu_{1|2}=\mu_1+\Sigma_{12}\Sigma_{22}^{-1}(x_2-\mu_2)
μ1∣2=μ1+Σ12Σ22−1(x2−μ2)
Σ
1
∣
2
=
Σ
11
−
Σ
12
Σ
22
−
1
Σ
21
\Sigma_{1|2}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}
Σ1∣2=Σ11−Σ12Σ22−1Σ21
因子分析模型
因子分析模型的定义如下:
假设隐变量
z
∼
N
(
0
,
I
)
,
z
∼
R
d
,
(
d
<
n
)
z\sim N(0,I),z\sim R^d,(d<n)
z∼N(0,I),z∼Rd,(d<n).再假设训练样本
x
x
x由隐含变量
z
z
z生成,即
x
=
μ
+
Λ
z
+
ε
x=\mu+\Lambda z+\varepsilon
x=μ+Λz+ε. 其中
ε
∼
N
(
0
,
Ψ
)
\varepsilon\sim N(0,\Psi)
ε∼N(0,Ψ).
z
z
z已知的时候,上式
x
x
x的产生分布
x
∣
z
∼
N
(
μ
+
Λ
z
,
Ψ
)
x|z\sim N(\mu+\Lambda z,\Psi)
x∣z∼N(μ+Λz,Ψ)
其中
Ψ
\Psi
Ψ是对角矩阵。
因子分析模型可以从训练数据的生成过程来理解:
- <1> 在一个低维空间内用均值为0,协方差为单位矩阵的多元高斯分布生成m个隐变量 z ( i ) z^{(i)} z(i), z ( i ) z^{(i)} z(i)是d维向量,m是样本数目。
- <2> 然后使用变换矩阵 Λ \Lambda Λ将 z z z映射到n维空间 Λ z \Lambda z Λz。此时因子 z z z的均值为0,映射后的均值仍然是 0 0 0.
- <3> 再将n维向量 Λ z \Lambda z Λz加上一个均值 μ \mu μ,对应的意义是将变换后的 z z z的均值在n维空间上平移。
- <4> 由于真实的样例x会有误差,因此在此变换的基础上再加上误差 ε ∼ N ( 0 , Ψ ) \varepsilon \sim N(0,\Psi) ε∼N(0,Ψ).
因子分析模型推导
模型:
z
∼
N
(
0
,
I
)
z\sim N(0,I)
z∼N(0,I)
ε
∼
N
(
0
,
Ψ
)
\varepsilon \sim N(0,\Psi)
ε∼N(0,Ψ)
x
=
μ
+
Λ
z
+
ε
x=\mu+\Lambda z+\varepsilon
x=μ+Λz+ε
其中
ε
,
z
\varepsilon,z
ε,z互相独立。
使用高斯分布矩阵表示法对模型进行分析,方法认为
z
,
x
z,x
z,x符合多元高斯分布,即:
[
z
x
]
∼
N
(
μ
z
x
,
Σ
)
\begin{bmatrix}z\\x\end{bmatrix}\sim N(\mu_{zx},\Sigma)
[zx]∼N(μzx,Σ)
求解
μ
z
x
,
Σ
\mu_{zx},\Sigma
μzx,Σ.
求解
Σ
\Sigma
Σ需要计算
Σ
z
z
,
Σ
z
x
,
Σ
x
z
,
Σ
x
x
\Sigma_{zz},\Sigma_{zx},\Sigma_{xz},\Sigma_{xx}
Σzz,Σzx,Σxz,Σxx
Σ
z
z
=
E
[
(
z
−
E
[
z
]
)
(
z
−
E
[
z
]
)
T
]
\Sigma_{zz}=E[(z-E[z])(z-E[z])^T]
Σzz=E[(z−E[z])(z−E[z])T]
有定义可知
Σ
z
z
=
C
o
v
(
z
)
=
I
\Sigma_{zz}=Cov(z)=I
Σzz=Cov(z)=I,
z
z
z和
ε
\varepsilon
ε独立。
Σ
z
x
=
Σ
x
z
=
E
[
(
z
−
E
[
z
]
)
(
x
−
E
[
x
]
)
T
]
=
E
[
z
(
μ
+
Λ
z
+
ε
−
μ
)
T
]
=
E
[
z
z
T
]
Λ
T
+
E
[
z
ε
T
]
=
Λ
T
\Sigma_{zx}=\Sigma_{xz}=E[(z-E[z])(x-E[x])^T]=E[z(\mu+\Lambda z+\varepsilon-\mu)^T]=E[zz^T]\Lambda^T+E[z\varepsilon^T]=\Lambda^T
Σzx=Σxz=E[(z−E[z])(x−E[x])T]=E[z(μ+Λz+ε−μ)T]=E[zzT]ΛT+E[zεT]=ΛT
Σ
x
x
=
E
[
(
x
−
E
[
x
]
)
(
x
−
E
[
x
]
)
T
]
=
E
[
(
Λ
z
+
ε
)
(
Λ
z
+
ε
)
T
]
=
E
[
Λ
z
z
T
Λ
T
+
ε
z
T
Λ
T
+
Λ
z
ε
T
+
ε
ε
T
]
=
Λ
E
[
z
z
T
]
Λ
T
+
E
[
ε
ε
T
]
=
Λ
Λ
T
+
Ψ
\Sigma_{xx}=E[(x-E[x])(x-E[x])^T]=E[(\Lambda z+\varepsilon)(\Lambda z+\varepsilon)^T]=E[\Lambda zz^T\Lambda^T+\varepsilon z^T\Lambda^T+\Lambda z\varepsilon^T+\varepsilon\varepsilon^T]=\Lambda E[zz^T]\Lambda^T+E[\varepsilon\varepsilon^T]=\Lambda\Lambda ^T+\Psi
Σxx=E[(x−E[x])(x−E[x])T]=E[(Λz+ε)(Λz+ε)T]=E[ΛzzTΛT+εzTΛT+ΛzεT+εεT]=ΛE[zzT]ΛT+E[εεT]=ΛΛT+Ψ
得:
[
z
x
]
∼
N
(
[
0
μ
]
,
[
I
Λ
T
Λ
Λ
Λ
T
+
Ψ
]
)
\begin{bmatrix}z\\x\end{bmatrix}\sim N(\begin{bmatrix}0\\\mu\end{bmatrix},\begin{bmatrix}I&\Lambda^T\\\Lambda&\Lambda\Lambda^T+\Psi\end{bmatrix})
[zx]∼N([0μ],[IΛΛTΛΛT+Ψ])
所以我们得到
x
x
x的边际分布为:
x
∼
N
(
μ
,
Λ
Λ
T
+
Ψ
)
x\sim N(\mu,\Lambda\Lambda^T+\Psi)
x∼N(μ,ΛΛT+Ψ)
对于一个训练集,
{
x
(
1
)
,
.
.
.
.
,
x
(
m
)
}
\{x^{(1)},....,x^{(m)}\}
{x(1),....,x(m)},可以得出似然函数,但是用最大化似然函数的方法求参数很复杂,因为含有隐变量,因此我们用EM算法。
EM算法求解因子分析模型
E
−
S
t
e
p
:
Q
i
(
z
(
i
)
∣
x
(
i
)
;
μ
,
Λ
,
Ψ
)
E-Step:Q_i(z^{(i)}|x^{(i)};\mu,\Lambda,\Psi)
E−Step:Qi(z(i)∣x(i);μ,Λ,Ψ)
通过之前的高斯分布矩阵写法,我们可以计算条件分布概率期望和方差。
μ
z
(
i
)
∣
x
(
i
)
=
Λ
T
(
Λ
Λ
T
+
Ψ
)
−
1
(
x
(
i
)
−
μ
)
\mu_{z^{(i)}|x^{(i)}}=\Lambda^T(\Lambda\Lambda^T+\Psi)^{-1}(x^{(i)}-\mu)
μz(i)∣x(i)=ΛT(ΛΛT+Ψ)−1(x(i)−μ)
Σ
z
(
i
)
∣
x
(
i
)
=
I
−
Λ
T
(
Λ
Λ
T
+
Ψ
)
−
1
Λ
\Sigma_{z^{(i)}|x^{(i)}}=I-\Lambda^T(\Lambda\Lambda^T+\Psi)^{-1}\Lambda
Σz(i)∣x(i)=I−ΛT(ΛΛT+Ψ)−1Λ
带入公式 就可得到
Q
i
(
z
(
i
)
∣
x
(
i
)
)
Q_i(z^{(i)}|x^{(i)})
Qi(z(i)∣x(i))的概率密度函数,即:
Q
i
(
z
(
i
)
∣
x
(
i
)
)
=
1
(
2
π
)
n
/
2
∣
Σ
z
(
i
)
∣
x
(
i
)
∣
1
/
2
e
x
p
(
−
1
2
(
x
(
i
)
−
μ
z
(
i
)
∣
x
(
i
)
)
Σ
z
(
i
)
∣
x
(
i
)
−
1
(
x
(
i
)
−
μ
z
(
i
)
∣
x
(
i
)
)
T
)
Q_i(z^{(i)}|x^{(i)})=\frac{1}{(2\pi)^{n/2}|\Sigma_{z^{(i)}|x^{(i)}}|^{1/2}}exp(-\frac{1}{2}(x^{(i)}-\mu_{z^{(i)}|x^{(i)}})\Sigma_{{z^{(i)}}|x^{(i)}}^{-1}(x^{(i)}-\mu_{z^{(i)}|x^{(i)}})^T)
Qi(z(i)∣x(i))=(2π)n/2∣Σz(i)∣x(i)∣1/21exp(−21(x(i)−μz(i)∣x(i))Σz(i)∣x(i)−1(x(i)−μz(i)∣x(i))T)
M
−
S
t
e
p
:
M-Step:
M−Step:最大化下列公式来求取参数
μ
,
Λ
,
Ψ
\mu,\Lambda,\Psi
μ,Λ,Ψ.
∑
i
=
1
m
∫
Q
i
(
z
(
i
)
)
l
o
g
p
(
z
(
i
)
,
x
(
i
)
;
μ
,
Λ
,
Ψ
)
Q
i
(
z
(
i
)
)
d
z
(
i
)
\sum_{i=1}^m \int Q_i(z^{(i)})log\frac{p(z^{(i)},x^{(i)};\mu,\Lambda,\Psi)}{Q_i(z^{(i)})}dz^{(i)}
i=1∑m∫Qi(z(i))logQi(z(i))p(z(i),x(i);μ,Λ,Ψ)dz(i)
=
∑
i
=
1
m
∫
Q
i
(
z
(
i
)
)
[
l
o
g
p
(
x
(
i
)
∣
z
(
i
)
;
μ
,
Λ
,
Ψ
)
+
l
o
g
p
(
z
(
i
)
)
−
l
o
g
Q
i
(
z
(
i
)
)
]
d
z
(
i
)
=\sum_{i=1}^m \int Q_i(z^{(i)})[logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)+logp(z^{(i)})-logQ_i(z^{(i)})]dz^{(i)}
=i=1∑m∫Qi(z(i))[logp(x(i)∣z(i);μ,Λ,Ψ)+logp(z(i))−logQi(z(i))]dz(i)
=
∑
i
=
1
m
E
z
(
i
)
∼
Q
i
[
l
o
g
p
(
x
(
i
)
∣
z
(
i
)
;
μ
,
Λ
,
Ψ
)
+
l
o
g
p
(
z
(
i
)
)
−
l
o
g
Q
i
(
z
(
i
)
)
]
=\sum_{i=1}^mE_{z^{(i)}\sim Q_i}[logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)+logp(z^{(i)})-logQ_i(z^{(i)})]
=i=1∑mEz(i)∼Qi[logp(x(i)∣z(i);μ,Λ,Ψ)+logp(z(i))−logQi(z(i))]
上面公式中第一步先利用条件概率,将log函数分解开。第二步将积分转变为求z服从Q分布的时候,函数
l
o
g
p
(
x
(
i
)
∣
z
(
i
)
;
μ
,
Λ
,
Ψ
)
+
l
o
g
p
(
z
(
i
)
)
−
l
o
g
Q
i
(
z
(
i
)
)
logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)+logp(z^{(i)})-logQ_i(z^{(i)})
logp(x(i)∣z(i);μ,Λ,Ψ)+logp(z(i))−logQi(z(i))的期望。
对
Λ
\Lambda
Λ求解:
▽
Λ
∑
i
=
1
m
E
[
l
o
g
p
(
x
(
i
)
∣
z
(
i
)
;
μ
,
Λ
,
Ψ
)
+
l
o
g
p
(
z
(
i
)
)
−
l
o
g
Q
i
(
z
(
i
)
)
]
\bigtriangledown_\Lambda \sum_{i=1}^mE[logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)+logp(z^{(i)})-logQ_i(z^{(i)})]
▽Λi=1∑mE[logp(x(i)∣z(i);μ,Λ,Ψ)+logp(z(i))−logQi(z(i))]
=
▽
Λ
∑
i
=
1
m
E
[
l
o
g
p
(
x
(
i
)
∣
z
(
i
)
;
μ
,
Λ
,
Ψ
)
]
=\bigtriangledown_\Lambda \sum_{i=1}^mE[logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)]
=▽Λi=1∑mE[logp(x(i)∣z(i);μ,Λ,Ψ)]
去除与参数
Λ
\Lambda
Λ无关的项。
▽
Λ
∑
i
=
1
m
E
[
l
o
g
(
1
(
2
π
)
n
/
2
∣
Ψ
∣
1
/
2
e
x
p
(
−
1
2
(
x
(
i
)
−
μ
−
Λ
z
(
i
)
)
Ψ
−
1
(
x
(
i
)
−
μ
−
Λ
z
(
i
)
)
T
)
]
\bigtriangledown_\Lambda\sum_{i=1}^mE[log(\frac{1}{(2\pi)^{n/2}|\Psi|^{1/2}}exp(-\frac{1}{2}(x^{(i)}-\mu-\Lambda z^{(i)})\Psi^{-1}(x^{(i)}-\mu-\Lambda z^{(i)})^T)]
▽Λi=1∑mE[log((2π)n/2∣Ψ∣1/21exp(−21(x(i)−μ−Λz(i))Ψ−1(x(i)−μ−Λz(i))T)]
期望为
μ
+
Λ
z
(
i
)
\mu+\Lambda z^{(i)}
μ+Λz(i),方差为
Ψ
\Psi
Ψ.
=
▽
Λ
∑
i
=
1
m
E
[
−
1
2
l
o
g
∣
Ψ
∣
−
n
2
l
o
g
(
2
π
)
−
1
2
(
x
(
i
)
−
μ
−
Λ
z
(
i
)
)
Ψ
−
1
(
x
(
i
)
−
μ
−
Λ
z
(
i
)
)
T
]
=\bigtriangledown_\Lambda\sum_{i=1}^mE[-\frac{1}{2}log|\Psi|-\frac{n}{2}log(2\pi)-\frac{1}{2}(x^{(i)}-\mu-\Lambda z^{(i)})\Psi^{-1}(x^{(i)}-\mu-\Lambda z^{(i)})^T]
=▽Λi=1∑mE[−21log∣Ψ∣−2nlog(2π)−21(x(i)−μ−Λz(i))Ψ−1(x(i)−μ−Λz(i))T]
=
▽
Λ
∑
i
=
1
m
−
E
[
1
2
(
x
(
i
)
−
μ
−
Λ
z
(
i
)
)
Ψ
−
1
(
x
(
i
)
−
μ
−
Λ
z
(
i
)
)
T
]
=\bigtriangledown_\Lambda\sum_{i=1}^m-E[\frac{1}{2}(x^{(i)}-\mu-\Lambda z^{(i)})\Psi^{-1}(x^{(i)}-\mu-\Lambda z^{(i)})^T]
=▽Λi=1∑m−E[21(x(i)−μ−Λz(i))Ψ−1(x(i)−μ−Λz(i))T]
=
∑
i
=
1
m
▽
Λ
E
[
−
t
r
(
1
2
z
(
i
)
T
Λ
T
Ψ
−
1
Λ
z
(
i
)
)
+
t
r
(
z
(
i
)
T
Λ
T
Ψ
−
1
(
x
(
i
)
−
μ
)
)
]
=\sum_{i=1}^m\bigtriangledown_\Lambda E[-tr(\frac{1}{2}{z^{(i)}}^T\Lambda^T\Psi^{-1}\Lambda z^{(i)})+tr({z^{(i)}}^T\Lambda^T\Psi^{-1}(x^{(i)}-\mu))]
=i=1∑m▽ΛE[−tr(21z(i)TΛTΨ−1Λz(i))+tr(z(i)TΛTΨ−1(x(i)−μ))]
利用矩阵迹的性质
t
r
(
a
)
=
a
tr(a)=a
tr(a)=a.
=
∑
i
=
1
m
▽
Λ
E
[
−
t
r
(
1
2
Λ
T
Ψ
−
1
Λ
z
(
i
)
z
(
i
)
T
)
+
t
r
(
Λ
T
Ψ
−
1
(
x
(
i
)
−
μ
)
z
(
i
)
T
)
]
=\sum_{i=1}^m\bigtriangledown_\Lambda E[-tr(\frac{1}{2}\Lambda^T\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T)+tr(\Lambda^T\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)]
=i=1∑m▽ΛE[−tr(21ΛTΨ−1Λz(i)z(i)T)+tr(ΛTΨ−1(x(i)−μ)z(i)T)]
利用矩阵迹的性质
t
r
(
A
B
)
=
B
A
tr(AB)=BA
tr(AB)=BA.
=
∑
i
=
1
m
(
▽
Λ
E
[
−
t
r
(
1
2
Λ
T
Ψ
−
1
Λ
z
(
i
)
z
(
i
)
T
)
]
+
▽
Λ
E
[
t
r
(
Λ
T
Ψ
−
1
(
x
(
i
)
−
μ
)
z
(
i
)
T
)
]
)
=\sum_{i=1}^m(\bigtriangledown_\Lambda E[-tr(\frac{1}{2}\Lambda^T\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T)]+\bigtriangledown_\Lambda E[tr(\Lambda^T\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)])
=i=1∑m(▽ΛE[−tr(21ΛTΨ−1Λz(i)z(i)T)]+▽ΛE[tr(ΛTΨ−1(x(i)−μ)z(i)T)])
=
∑
i
=
1
m
(
E
[
−
▽
Λ
t
r
(
1
2
Λ
T
Ψ
−
1
Λ
z
(
i
)
z
(
i
)
T
)
]
+
E
[
▽
Λ
t
r
(
Λ
T
Ψ
−
1
(
x
(
i
)
−
μ
)
z
(
i
)
T
)
]
)
=\sum_{i=1}^m( E[-\bigtriangledown_\Lambda tr(\frac{1}{2}\Lambda^T\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T)]+ E[\bigtriangledown_\Lambda tr(\Lambda^T\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)])
=i=1∑m(E[−▽Λtr(21ΛTΨ−1Λz(i)z(i)T)]+E[▽Λtr(ΛTΨ−1(x(i)−μ)z(i)T)])
求导与期望交换位置。
=
∑
i
=
1
m
(
E
[
−
▽
Λ
T
t
r
(
1
2
Λ
T
Ψ
−
1
Λ
z
(
i
)
z
(
i
)
T
)
T
]
+
E
[
▽
Λ
T
t
r
(
Λ
T
Ψ
−
1
(
x
(
i
)
−
μ
)
z
(
i
)
T
)
T
]
)
=\sum_{i=1}^m( E[-\bigtriangledown_\Lambda^T tr(\frac{1}{2}\Lambda^T\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T)^T]+ E[\bigtriangledown_\Lambda^T tr(\Lambda^T\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)^T])
=i=1∑m(E[−▽ΛTtr(21ΛTΨ−1Λz(i)z(i)T)T]+E[▽ΛTtr(ΛTΨ−1(x(i)−μ)z(i)T)T])
利用矩阵迹的性质
▽
Λ
T
f
(
A
)
=
(
▽
Λ
f
(
A
)
T
)
\bigtriangledown_\Lambda^T f(A)=(\bigtriangledown_\Lambda f(A)^T)
▽ΛTf(A)=(▽Λf(A)T).
=
∑
i
=
1
m
(
E
[
−
1
2
(
2
z
(
i
)
z
(
i
)
T
Λ
T
Ψ
−
1
)
T
]
+
E
[
(
(
Ψ
−
1
(
x
(
i
)
−
μ
)
z
(
i
)
T
)
T
)
T
]
)
=\sum_{i=1}^m(E[-\frac{1}{2}(2 z^{(i)}{z^{(i)}}^T\Lambda^T\Psi^{-1})^T]+ E[((\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)^T)^T])
=i=1∑m(E[−21(2z(i)z(i)TΛTΨ−1)T]+E[((Ψ−1(x(i)−μ)z(i)T)T)T])
第一项利用矩阵
▽
Λ
t
r
(
A
B
A
T
C
)
=
C
A
B
+
C
T
A
B
T
\bigtriangledown_\Lambda tr(ABA^TC)=CAB+C^TAB^T
▽Λtr(ABATC)=CAB+CTABT
第二项利用
▽
Λ
t
r
(
A
B
)
=
B
T
\bigtriangledown_\Lambda tr(AB)=B^T
▽Λtr(AB)=BT
=
∑
i
=
1
m
(
E
[
−
Ψ
−
1
Λ
z
(
i
)
z
(
i
)
T
]
+
E
[
Ψ
−
1
(
x
(
i
)
−
μ
)
z
(
i
)
T
]
)
=\sum_{i=1}^m(E[-\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T]+E[\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T])
=i=1∑m(E[−Ψ−1Λz(i)z(i)T]+E[Ψ−1(x(i)−μ)z(i)T])
∑
i
=
1
m
(
E
[
−
Ψ
−
1
Λ
z
(
i
)
z
(
i
)
T
+
Ψ
−
1
(
x
(
i
)
−
μ
)
z
(
i
)
T
]
)
\sum_{i=1}^m(E[-\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T+\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T])
i=1∑m(E[−Ψ−1Λz(i)z(i)T+Ψ−1(x(i)−μ)z(i)T])
打开期望。将最后结果设为0,化简。
∑
i
=
1
m
Λ
E
z
(
i
)
∼
Q
i
[
z
(
i
)
z
(
i
)
T
]
=
∑
i
=
1
m
(
x
(
i
)
−
μ
)
E
z
(
i
)
∼
Q
i
[
z
(
i
)
T
]
\sum_{i=1}^m\Lambda E_{z^{(i)}\sim Q_i}[ z^{(i)}{z^{(i)}}^T]=\sum_{i=1}^m(x^{(i)}-\mu)E_{z^{(i)}\sim Q_i}[{z^{(i)}}^T]
i=1∑mΛEz(i)∼Qi[z(i)z(i)T]=i=1∑m(x(i)−μ)Ez(i)∼Qi[z(i)T]
Λ
=
(
∑
i
=
1
m
(
x
(
i
)
−
μ
)
E
z
(
i
)
∼
Q
i
[
z
(
i
)
T
]
)
(
∑
i
=
1
m
E
z
(
i
)
∼
Q
i
[
z
(
i
)
z
(
i
)
T
]
)
−
1
\Lambda=(\sum_{i=1}^m(x^{(i)}-\mu)E_{z^{(i)}\sim Q_i}[{z^{(i)}}^T])(\sum_{i=1}^mE_{z^{(i)}\sim Q_i}[ z^{(i)}{z^{(i)}}^T])^{-1}
Λ=(i=1∑m(x(i)−μ)Ez(i)∼Qi[z(i)T])(i=1∑mEz(i)∼Qi[z(i)z(i)T])−1
E
z
(
i
)
∼
Q
i
[
z
(
i
)
T
]
=
μ
z
(
i
)
∣
x
(
i
)
T
E_{z^{(i)}\sim Q_i}[{z^{(i)}}^T]=\mu^T_{z^{(i)}|x^{(i)}}
Ez(i)∼Qi[z(i)T]=μz(i)∣x(i)T
E
z
(
i
)
∼
Q
i
[
z
(
i
)
z
(
i
)
T
]
=
μ
z
(
i
)
∣
x
(
i
)
μ
z
(
i
)
∣
x
(
i
)
T
+
Σ
z
(
i
)
∣
x
(
i
)
E_{z^{(i)}\sim Q_i}[ z^{(i)}{z^{(i)}}^T]=\mu_{z^{(i)}|x^{(i)}}\mu^T_{z^{(i)}|x^{(i)}}+\Sigma_{z^{(i)}|x^{(i)}}
Ez(i)∼Qi[z(i)z(i)T]=μz(i)∣x(i)μz(i)∣x(i)T+Σz(i)∣x(i)
使用性质
C
o
v
(
X
)
=
E
[
X
X
T
]
−
E
[
X
]
E
[
X
T
]
Cov(X)=E[XX^T]-E[X]E[X^T]
Cov(X)=E[XXT]−E[X]E[XT].
最后
Λ
=
(
∑
i
=
1
m
(
x
(
i
)
−
μ
)
μ
z
(
i
)
∣
x
(
i
)
T
)
(
∑
i
=
1
m
μ
z
(
i
)
∣
x
(
i
)
μ
z
(
i
)
∣
x
(
i
)
T
+
Σ
z
(
i
)
∣
x
(
i
)
)
−
1
\Lambda=(\sum_{i=1}^m(x^{(i)}-\mu)\mu^T_{z^{(i)}|x^{(i)}})(\sum_{i=1}^m\mu_{z^{(i)}|x^{(i)}}\mu^T_{z^{(i)}|x^{(i)}}+\Sigma_{z^{(i)}|x^{(i)}})^{-1}
Λ=(i=1∑m(x(i)−μ)μz(i)∣x(i)T)(i=1∑mμz(i)∣x(i)μz(i)∣x(i)T+Σz(i)∣x(i))−1
对
μ
\mu
μ和
Ψ
\Psi
Ψ,同理求解。
μ
=
1
m
∑
i
=
1
m
x
(
i
)
\mu=\frac{1}{m}\sum_{i=1}^mx^{(i)}
μ=m1i=1∑mx(i)
Ψ
=
1
m
∑
i
=
1
m
x
(
i
)
x
(
i
)
T
−
x
(
i
)
μ
z
(
i
)
∣
x
(
i
)
T
Λ
T
−
Λ
μ
z
(
i
)
∣
x
(
i
)
x
(
i
)
T
+
Λ
(
μ
z
(
i
)
∣
x
(
i
)
μ
z
(
i
)
∣
x
(
i
)
T
+
Σ
z
(
i
)
∣
x
(
i
)
)
Λ
T
\Psi=\frac{1}{m}\sum_{i=1}^mx^{(i)}{x^{(i)}}^T-x^{(i)}\mu^T_{z^{(i)}|x^{(i)}}\Lambda^T-\Lambda\mu_{z^{(i)}|x^{(i)}}{x^{(i)}}^T+\Lambda(\mu_{z^{(i)}|x^{(i)}}\mu^T_{z^{(i)}|x^{(i)}}+\Sigma_{z^{(i)}|x^{(i)}})\Lambda^T
Ψ=m1i=1∑mx(i)x(i)T−x(i)μz(i)∣x(i)TΛT−Λμz(i)∣x(i)x(i)T+Λ(μz(i)∣x(i)μz(i)∣x(i)T+Σz(i)∣x(i))ΛT
取对角线上的元素即可。