tppca的理论推导-1

本文详细介绍了tppca模型,涉及预备知识、模型形式、t分布及其相关分布,重点展示了参数估计过程,特别是如何利用EM算法计算后验分布和Q函数。通过推导,读者能理解tppca模型的构造与参数估计方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


(该博客是自己在完成老师任务时经过查阅资料自行推导并整理的笔记,在变量的字母符号设置上前后有所差异,但不影响阅读,如有错误,请联系我,此笔记仅用于学习,如需转载,请注明来源,谢谢!)

1 tppca模型

1.1 预备知识

tppca模型是ppca模型在t分布下的推广,因此,要定义tppca模型,必先了解多元t分布 t ( μ , Σ , ν ) t(\mu,\Sigma,\nu) t(μ,Σ,ν)的概率密度:

p ( t ∣ μ , Σ , v ) = Γ ( ( v + d ) / 2 ) ∣ Σ ∣ − 1 / 2 Γ ( v / 2 ) ( v π ) d / 2 × [ 1 + ( t − μ ) T Σ − 1 ( t − μ ) / v ] − ( v + d ) / 2 \begin{aligned} p(\mathbf{t} \mid \boldsymbol{\mu}, \mathbf{\Sigma}, v)=& \frac{\Gamma((v+d) / 2)|\mathbf{\Sigma}|^{-1 / 2}}{\Gamma(v / 2)(v \pi)^{d / 2}} \\ & \times\left[1+(\mathbf{t}-\boldsymbol{\mu})^{\mathrm{T}} \mathbf{\Sigma}^{-1}(\mathbf{t}-\boldsymbol{\mu}) / v\right]^{-(v+d) / 2} \end{aligned} p(tμ,Σ,v)=Γ(v/2)(vπ)d/2Γ((v+d)/2)Σ1/2×[1+(tμ)TΣ1(tμ)/v](v+d)/2

其中, Σ \mathbf{\Sigma} Σ是对称和正定的,如果 ν > 1 \nu>1 ν>1 μ \boldsymbol{\mu} μ是均值向量,如果 ν > 2 \nu>2 ν>2时, Σ ν ν − 2 \frac{\mathbf{\Sigma} \nu}{\nu-2} ν2Σν是协方差矩阵;多元正态分布 N ( μ , Σ ) \mathscr{N}(\boldsymbol{\mu}, \mathbf{\Sigma}) N(μ,Σ) t ( μ , Σ , ∞ ) t(\boldsymbol{\mu}, \mathbf{\Sigma}, \infty) t(μ,Σ,)的极限。

通常直接对t分布求解参数是困难的,因此常常把多元t分布看作是一个特殊的高斯混合。例如:要从多元t分布 t ( μ , Σ , ν ) t(\mu,\Sigma,\nu) t(μ,Σ,ν)中抽取一个t向量,我们引入一个潜变量 τ \tau τ,且 τ ∼ G a m m a ( v / 2 , v / 2 ) \tau \sim Gamma(v / 2, v / 2) τGamma(v/2,v/2),概率密度函数如下:

p ( τ ) = p ( τ ; ν 2 , ν 2 ) = ν 2 ν 2 τ ν 2 − 1 Γ ( ν 2 ) exp ⁡ { − ν 2 τ } p(\tau)=p(\tau ; \frac{\nu}{2}, \frac{\nu}{2})=\frac{\frac{\nu}{2}^{\frac{\nu}{2}} \tau^{\frac{\nu}{2}-1}}{\Gamma(\frac{\nu}{2})} \exp \{-\frac{\nu}{2} \tau\} p(τ)=p(τ;2ν,2ν)=Γ(2ν)2ν2ντ2ν1exp{2ντ}

τ \tau τ给定时,向量 t t t是从多元正态分布 N ( μ , Σ / τ ) \mathscr{N}(\boldsymbol{\mu}, \mathbf{\Sigma} / \tau) N(μ,Σ/τ)中抽取的,则容易验证向量 t t t的边际分布就是 t ( μ , Σ , ν ) t(\mu,\Sigma,\nu) t(μ,Σ,ν)

具体验证过程如下

p ( t ) = ∫ 0 + ∞ p ( t ∣ τ ) p ( τ ) d τ = ∫ 0 + ∞ ( 2 π ) − d 2 ∣ Σ τ ∣ − 1 2 exp ⁡ { − 1 2 ( t − μ ) ⊤ ( Σ τ ) − 1 ( t − μ ) } ⋅ ( ν 2 ) ν 2 Γ ( ν 2 ) ⋅ τ ν 2 − 1 exp ⁡ { − ν 2 τ } d τ = ( 2 π ) − d 2 ∣ Σ ∣ − 1 2 ( ν 2 ) ν 2 Γ ( ν 2 ) ∫ 0 + ∞ τ d + v 2 − 1 exp ⁡ { − ( t − μ ) T Σ − 1 ( t − μ ) 2 τ } d τ = ( 2 π ) d 2 ∣ Σ ∣ 1 2 ( ν 2 ) ν 2 Γ ( ν 2 ) Γ ( d + v 2 ) [ ( t − μ ) T Σ − 1 ( t − μ ) 2 + ν ] d + v 2 ⋅ ∫ 0 + ∞ [ ( t − μ ) T Σ − 1 ( t − μ ) 2 ] d + v 2 Γ ( d + v 2 ) τ d + v 2 − 1 exp ⁡ { − ( t − μ ) T Σ − 1 ( t − μ ) 2 τ } d τ = Γ ( ( v + d ) / 2 ) ∣ Σ ∣ − 1 / 2 Γ ( v / 2 ) ( v π ) d / 2 × [ 1 + ( t − μ ) T Σ − 1 ( t − μ ) / v ] − ( v + d ) / 2 \begin{aligned} p(t) &=\int_{0}^{+\infty} p(t \mid \tau) p(\tau) d \tau \\ &=\int_{0}^{+\infty}(2 \pi)^{-\frac{d}{2}}\left|\frac{\Sigma}{\tau}\right|^{-\frac{1}{2}} \exp \left\{-\frac{1}{2}(t-\mu)^{\top}\left(\frac{\Sigma}{\tau}\right)^{-1}(t-\mu)\right\}\\ &\cdot \frac{\left(\frac{\nu}{2}\right)^{\frac{\nu}{2}}}{\Gamma\left(\frac{\nu}{2}\right)} \cdot \tau^{\frac{\nu}{2}-1} \exp \left\{-\frac{\nu}{2} \tau\right\} d \tau \\ &=(2 \pi)^{-\frac{d}{2}} \mid \Sigma \mid^{-\frac{1}{2}} \frac{\left(\frac{\nu}{2}\right)^{\frac{\nu}{2}}}{\Gamma(\frac{\nu}{2})}\int_{0}^{+\infty} \tau^{\frac{d+v}{2}-1} \exp\{-\frac{(t-\mu)^T \Sigma^{-1} (t-\mu)}{2} \tau\} d \tau \\ &=(2\pi)^{\frac{d}{2}} |\Sigma|^{\frac{1}{2}} \frac{\left(\frac{\nu}{2}\right)^{\frac{\nu}{2}}}{\Gamma(\frac{\nu}{2})} \frac{\Gamma(\frac{d+v}{2})}{[\frac{(t-\mu)^T \Sigma^{-1} (t-\mu)}{2}+\nu]^{\frac{d+v}{2}}} \\ &\cdot \int_{0}^{+\infty} \frac{[\frac{(t-\mu)^T \Sigma^{-1} (t-\mu)}{2}]^{\frac{d+v}{2}}}{\Gamma(\frac{d+v}{2})} \tau^{\frac{d+v}{2}-1} \exp\{-\frac{(t-\mu)^T \Sigma^{-1} (t-\mu)}{2} \tau\} d \tau \\ &= \frac{\Gamma((v+d) / 2)|\mathbf{\Sigma}|^{-1 / 2}}{\Gamma(v / 2)(v \pi)^{d / 2}} \\ & \times\left[1+(\mathbf{t}-\boldsymbol{\mu})^{\mathrm{T}} \mathbf{\Sigma}^{-1}(\mathbf{t}-\boldsymbol{\mu}) / v\right]^{-(v+d) / 2} \end{aligned} p(t)=0+p(tτ)p(τ)dτ=0+(2π)2d τΣ 21exp{21(tμ)(τΣ)1(tμ)}Γ(2ν)(2ν)2ντ2ν1exp{2ντ}dτ=(2π)2dΣ21Γ(2ν)(2ν)2ν0+τ2d+v1exp{2(tμ)TΣ1(tμ)τ}dτ=(2π)2d∣Σ21Γ(2ν)(2ν)2ν[2(tμ)TΣ1(tμ)+ν]2d+vΓ(2d+v)0+Γ(2d+v)[2(tμ)TΣ1(tμ)]2d+vτ2d+v1exp{2(tμ)TΣ1(tμ)τ}dτ=Γ(v/2)(vπ)d/2Γ((v+d)/2)Σ1/2×[1+(tμ)TΣ1(tμ)/v](v+d)/2

也就是说t的边际分布就是对 p ( t , τ ) p(t,\tau) p(t,τ)的联合分布关于 τ \tau τ积分所得,这个分布便是 t ( μ , Σ , ν ) t(\mu,\Sigma,\nu) t(μ,Σ,ν)

1.2 模型形式

t n ∣ τ n , x n = W x n + μ n + ε n x n ∣ τ n ∼ N ( 0 , I / τ n ) , ε n ∣ τ n ∼ N ( 0 , σ 2 I / τ n ) τ n ∼ G a ( v / 2 , v / 2 ) \begin{array}{l} \mathbf{t}_n \mid \tau_n, \mathbf{x}_n=\mathbf{W} \mathbf{x}_n+\boldsymbol{\mu_n}+\boldsymbol{\varepsilon_n} \\ \mathbf{x}_n|\tau_n \sim \mathscr{N}(\mathbf{0}, \mathbf{I} / \tau_n), \quad \boldsymbol{\varepsilon_n}| \tau_n \sim \mathscr{N}\left(\mathbf{0}, \sigma^{2} \mathbf{I} / \tau_n \right) \\ \tau_n \sim Ga(v / 2, v / 2) \end{array} tnτn,xn=Wxn+μn+εnxnτnN(0,I/τn),εnτnN(0,σ2I/τn)τnGa(v/2,v/2)
其中, t n \mathbf{t}_n tn表示第 n n n个观测数据,维度为 d × 1 d \times 1 d×1 x n \mathbf{x}_n xn代表潜在因子,维度为 q × 1 q \times 1 q×1,噪声 ε n \varepsilon_n εn假设是同方差的,且因子与噪声之间是互不相关的。

该模型中待估参数是: σ 2 , v , μ , W \sigma^{2} ,v ,\boldsymbol{\mu},\mathbf{W} σ2vμW

2 推导要用的分布

2.1 ( t n ∣ x n , τ n ) (t_{n} \mid x_{n}, \tau_{n}) (tnxn,τn)的分布

x n 和 τ n x_{n}和 \tau_{n} xnτn给定时,两者可以看作是固定的,所以模型公式中
t n ∣ τ n , x n = W x n + μ + ε n \mathbf{t}_n \mid \tau_n, \mathbf{x}_n=\mathbf{W} \mathbf{x}_n+\boldsymbol{\mu}+\boldsymbol{\varepsilon_n} tnτn,xn=Wxn+μ+εn只有 ε n \boldsymbol{\varepsilon_n} εn是随机的,且是正态分布,因此 t n t_n tn也应该服从正态分布。

E [ t n ] = E [ w x n + μ + ϵ n ] = x n + μ + E ( ε n ) = w x n + μ Cov ⁡ ( t n ) = Cov ⁡ ( ε n ) = σ 2 I / τ n \begin{array}{l} E\left[t_{n}\right]=E\left[w x_{n}+\mu+\epsilon_{n}\right]= x_{n}+\mu+E\left(\varepsilon_{n}\right)=w x_{n}+\mu \\ \operatorname{Cov}\left(t_{n}\right)=\operatorname{Cov}\left(\varepsilon_{n}\right)=\sigma^{2} I / \tau_n \end{array} E[tn]=E[wxn+μ+ϵn]=xn+μ+E(εn)=wxn+μCov(tn)=Cov(εn)=σ2I/τn
故:
t n ⁡ ∣ x n , τ n ∼ N ( μ + w x n , σ 2 I / τ n ) \operatorname{t_n} \mid x_{n}, \tau _n \sim N\left(\mu+w x_{n}, \sigma^{2} I / \tau_n\right) tnxn,τnN(μ+wxn,σ2I/τn)

2.2 ( t n ∣ τ n ) (t_{n} \mid \tau_{n}) (tnτn)的分布

只给定 τ n \tau_n τn时, x n x_n xn ε n \varepsilon_n εn都是随机变量,且都服从正态分布,故 ( t n ∣ τ n ) (t_{n} \mid \tau_{n}) (tnτn)也服从正态分布。

x n ∣ τ n ∼ N ( 0 , I / τ n ) , ε n ∣ τ n ∼ N ( 0 , σ 2 I / τ n ) x_{n}\left|\tau_{n} \sim N\left(0, I / \tau_{n}\right), \quad \varepsilon_{n}\right| \tau_{n} \sim N\left(0, \sigma^{2} I / \tau_{n}\right) xnτnN(0,I/τn),εnτnN(0,σ2I/τn)

E [ t n ∣ τ n ] = E [ W x n + μ + ϵ n ] = W E x n + μ + E ( ε n ) = μ \begin{aligned} E\left[t_{n} \mid \tau_{n}\right]=E\left[W x_{n}+\mu+\epsilon_{n}\right] &=W E x_{n}+\mu+E\left(\varepsilon_{n}\right) \\ &=\mu \end{aligned} E[tnτn]=E[Wxn+μ+ϵn]=WExn+μ+E(εn)=μ

c o v ( t n ∣ τ n ) = c o v ( W x n + μ + ε n ) = W W T + σ 2 I τ n cov(t_n \mid \tau_n) = cov(W x_n +\mu +\varepsilon_n) = \frac{WW^T+\sigma^2 I}{\tau_n} cov(tnτn)=cov(Wxn+μ+εn)=τnWWT+σ2I

故: ( t n ∣ τ n ) ∼ N ( μ , Σ / τ n ) (t_{n} \mid \tau_{n}) \sim N(\mu,\Sigma/\tau_n) (tnτn)N(μ,Σ/τn)

2.3 相关分布

由预备知识可知, τ n ∼ G a ( ν 2 , ν 2 ) \tau_n \sim Ga(\frac{\nu}{2},\frac{\nu}{2}) τnGa(2ν,2ν) t n ∣ τ n ∼ N ( μ , Σ / τ n ) t_n | \tau_n \sim N(\mu,\Sigma/\tau_n) tnτnN(μ,Σ/τn) x n ∣ τ n ∼ N ( 0 , I / τ n ) x_n | \tau_n \sim N(0,I/\tau_n) xnτnN(0,I/τn) ε n ∣ τ n ∼ N ( 0 , σ 2 I / τ n ) \varepsilon_n | \tau_n \sim N(0,\sigma^2 I/\tau_n) εnτnN(0,σ2I/τn),由以上分布及t分布的性质,可知 t n ∼ t ( μ , Σ , ν ) t_n \sim t(\mu,\Sigma,\nu) tnt(μ,Σ,ν) x n ∼ t ( 0 , I , ν ) x_n \sim t(0,I,\nu) xnt(0,I,ν) ε n ∼ t ( 0 , σ 2 I , ν ) \varepsilon_n \sim t(0,\sigma^2 I,\nu) εnt(0,σ2I,ν)

由此,我们可以看到在模型公式中 t n = W x n + μ n + ε n \mathbf{t}_n=\mathbf{W} \mathbf{x}_n+\boldsymbol{\mu_n}+\boldsymbol{\varepsilon_n} tn=Wxn+μn+εn,潜变量 x n x_n xn,噪声 ε n \varepsilon_n εn以及观测数据 t n t_n tn均服从t分布,因此ppca模型在t分布下的推广便是今天所要讨论的tppca模型。

3 tppca模型的参数估计

要计算模型的参数估计,主要的方法是极大似然估计。由上文可知,观测数据的分布为 t n ∼ t ( μ , Σ , ν ) t_n \sim t(\mu,\Sigma,\nu) tnt(μ,Σ,ν),因此对数似然函数
L ( Θ ) = ∑ n = 1 N log ⁡ ( p ( t n ∣ μ , Σ , ν ) ) ∝ ∑ n = 1 N { log ⁡ ( ν + d 2 ) − 1 2 log ⁡ ( ∣ Σ ∣ ) + ν 2 log ⁡ ( ν ) − log ⁡ ( Γ ( ν 2 ) ) − ν + d 2 log ⁡ ( ν + ( t n − μ ) T Σ − 1 ( t n − μ ) ) } \begin{aligned} \textrm{L}(\Theta) &= \sum_{n=1}^{N} \log(p(t_n \mid \mu,\Sigma,\nu)) \\ &\propto \sum_{n=1}^{N} \{ \log(\frac{\nu+d}{2}) - \frac{1}{2} \log(|\Sigma|) + \frac{\nu}{2}\log(\nu) - \log(\Gamma(\frac{\nu}{2})) -\frac{\nu+d}{2}\log(\nu+(t_n-\mu)^{T}\Sigma^{-1}(t_n-\mu))\} \end{aligned} L(Θ)=n=1Nlog(p(tnμ,Σ,ν))n=1N{log(2ν+d)21log(∣Σ∣)+2νlog(ν)log(Γ(2ν))2ν+dlog(ν+(tnμ)TΣ1(tnμ))}
上式中,由于其中 Σ = W W T + σ 2 I \Sigma = WW^T+\sigma^2 I Σ=WWT+σ2I,故参数空间 Θ = ( μ , W , σ 2 , ν ) \Theta = (\mu,W,\sigma^2,\nu) Θ=(μ,W,σ2,ν),要得到上述参数的MLE,必须分别对每个参数求导,真实似然很难求导得到想要的参数估计,因此我们利用t分布的性质,引入潜在变量 τ \tau τ,利用EM类型的算法进行求解。

3.1 完全数据的对数似然函数

观测数据为 t n t_n tn,潜在因子 x n x_n xn,新引入的潜变量为 τ n \tau_n τn,因此完全数据为 ( t n , x n , τ n ) (t_n,x_n,\tau_n) (tn,xn,τn),参数向量为 Θ = ( μ , W , σ 2 , ν ) \Theta = (\mu,W,\sigma^2,\nu) Θ=(μ,W,σ2,ν),因此完全数据的对数似然函数为:
L c ( Θ ) = ∑ n = 1 N log ⁡ ( p ( t n , x n , τ n ) ) = ∑ n = 1 N log ⁡ [ p ( t n ∣ x n , τ n ) p ( x n ∣ τ n ) p ( τ n ) ] = ∑ n = 1 N { − d 2 log ⁡ ( 2 π ) − d 2 log ⁡ ( σ 2 ) + d 2 log ⁡ ( τ n ) − τ n 2 σ 2 ( t n − W x n − μ ) T ( t n − W x n − μ ) − q 2 log ⁡ ( 2 π ) + d 2 log ⁡ ( τ n ) − t n 2 x n T x n + ν 2 log ⁡ ( ν 2 ) − log ⁡ ( Γ ( ν 2 ) ) + ν 2 ( log ⁡ ( τ n ) − τ n ) − log ⁡ ( t n ) } ( 去掉与参数无关项 ) ∝ − N d 2 log ⁡ ( σ 2 ) − 1 2 σ 2 ∑ n = 1 N { τ n ∥ t n − μ ∥ 2 − 2 τ n x n T W T ( t n − μ ) + t r [ W τ n x n x n T W T ] } + N ν 2 log ⁡ ( ν 2 ) − N log ⁡ ( Γ ( ν 2 ) ) + ν 2 ∑ n = 1 N ( log ⁡ τ n − τ n ) = L ( θ ) + L ( ν ) \begin{aligned} L_c(\Theta) &= \sum_{n=1}^{N} \log(p(t_n,x_n,\tau_n)) \\ &= \sum_{n=1}^{N} \log[p(t_n|x_n,\tau_n)p(x_n|\tau_n)p(\tau_n)] \\ &=\sum_{n=1}^{N} \{-\frac{d}{2} \log(2\pi) - \frac{d}{2} \log(\sigma^2) + \frac{d}{2} \log(\tau_n) - \frac{\tau_n}{2\sigma^2}(t_n - W x_n - \mu)^{T}(t_n - W x_n - \mu) \\ &-\frac{q}{2} \log(2\pi) + \frac{d}{2} \log(\tau_n) - \frac{t_n} {2}x_n^{T}x_n \\ &+\frac{\nu}{2} \log(\frac{\nu}{2}) - \log(\Gamma(\frac{\nu}{2})) + \frac{\nu}{2}(\log(\tau_n)-\tau_n)-\log(t_n)\}(去掉与参数无关项)\\ &\propto-\frac{Nd}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{n=1}^{N}\{\tau_n\|t_n-\mu\|^2 -2\tau_n x_n^{T}W^{T}(t_n-\mu) + tr[W\tau_n x_n x_n^{T}W^T] \}\\ &+\frac{N\nu}{2}\log(\frac{\nu}{2}) - N\log(\Gamma(\frac{\nu}{2})) + \frac{\nu}{2} \sum_{n=1}^{N}(\log\tau_n - \tau_n)\\ &=L(\theta) +L(\nu) \end{aligned} Lc(Θ)=n=1Nlog(p(tn,xn,τn))=n=1Nlog[p(tnxn,τn)p(xnτn)p(τn)]=n=1N{2dlog(2π)2dlog(σ2)+2dlog(τn)2σ2τn(tnWxnμ)T(tnWxnμ)2qlog(2π)+2dlog(τn)2tnxnTxn+2νlog(2ν)log(Γ(2ν))+2ν(log(τn)τn)log(tn)}(去掉与参数无关项)2Ndlog(σ2)2σ21n=1N{τntnμ22τnxnTWT(tnμ)+tr[WτnxnxnTWT]}+2Nνlog(2ν)Nlog(Γ(2ν))+2νn=1N(logτnτn)=L(θ)+L(ν)
根据以上完全数据的似然函数,要想使用EM算法,在观测数据 t n t_n tn下, τ n \tau_n τn x n x_n xn的后验分布。

3.2 给定观测样本,两个潜变量的后验分布

3.2.1 更新 τ n \tau_n τn的分布

p ( τ n ∣ t n ) ∝ p ( t n ∣ τ n ) p ( τ n ) = ( 2 π ) − d 2 ∣ Σ ∣ − 1 2 τ d + ν 2 − 1 exp ⁡ { − ( t n − μ ) T Σ − 1 ( t n − μ ) + ν 2 τ n } ∝ τ d + ν 2 − 1 exp ⁡ { − ( t n − μ ) T Σ − 1 ( t n − μ ) + ν 2 τ n } = G a ( d + ν 2 , ( t n − μ ) T Σ − 1 ( t n − μ ) + ν 2 ) \begin{aligned} p(\tau_n|t_n) &\propto p(t_n|\tau_n)p(\tau_n) \\ &=(2\pi)^{-\frac{d}{2}}|\Sigma|^{-\frac{1}{2}}\tau^{\frac{d+\nu}{2}-1} \exp\{-\frac{(t_n-\mu)^{T}\Sigma^{-1}(t_n-\mu)+\nu}{2} \tau_n\} \\ &\propto \tau^{\frac{d+\nu}{2}-1} \exp\{-\frac{(t_n-\mu)^{T}\Sigma^{-1}(t_n-\mu)+\nu}{2} \tau_n\} \\ &=Ga(\frac{d+\nu}{2},\frac{(t_n-\mu)^{T}\Sigma^{-1}(t_n-\mu)+\nu}{2}) \end{aligned} p(τntn)p(tnτn)p(τn)=(2π)2d∣Σ21τ2d+ν1exp{2(tnμ)TΣ1(tnμ)+ντn}τ2d+ν1exp{2(tnμ)TΣ1(tnμ)+ντn}=Ga(2d+ν,2(tnμ)TΣ1(tnμ)+ν)

3.2.2 更新潜在因子 x n x_n xn的分布

p ( x n ∣ t n , τ n ) ∝ p ( t n ∣ x n , τ n ) p ( x n ∣ τ n ) ∝ exp ⁡ { − 1 2 [ τ n x n T ( W t W + σ 2 I ) x n σ 2 − 2 τ n x n T W T ( t n − μ ) σ 2 ] } \begin{aligned} p(x_n|t_n,\tau_n) &\propto p(t_n|x_n,\tau_n)p(x_n|\tau_n) \\ &\propto \exp\{-\frac{1}{2}[\frac{\tau_n x_n^{T}(W^{t}W + \sigma^2 I)x_n}{\sigma^2}-\frac{2\tau_n x_n^{T}W^{T}(t_n-\mu)}{\sigma^2}]\} \end{aligned} p(xntn,τn)p(tnxn,τn)p(xnτn)exp{21[σ2τnxnT(WtW+σ2I)xnσ22τnxnTWT(tnμ)]}
由于 t n ∣ x n , τ n t_n|x_n,\tau_n tnxn,τn是正态分布, x n ∣ τ n x_n|\tau_n xnτn也是正态分布,因此后验分布 x n ∣ t n , τ n x_n|t_n,\tau_n xntn,τn必然也是正态分布,假设 x n ∣ t n , τ n ∼ N ( m , Ψ ) x_n|t_n,\tau_n \sim N(m,\Psi) xntn,τnN(m,Ψ),则有其概率密度函数的形式为:
p ( x n ∣ t n , τ n ) ∝ exp ⁡ { − 1 2 ( x n T Ψ − 1 x n − 2 x n T Ψ − 1 m ) } \begin{aligned} p(x_n|t_n,\tau_n) &\propto \exp\{-\frac{1}{2}(x_n^{T}\Psi^{-1}x_n - 2 x_n^{T}\Psi^{-1}m)\} \end{aligned} p(xntn,τn)exp{21(xnTΨ1xn2xnTΨ1m)}
故由上式可得:
Ψ − 1 = τ n W T W + σ 2 I σ 2 Ψ = τ n − 1 σ 2 M − 1 = τ n − 1 P = τ n − 1 [ I − W T Σ − 1 W ] m = W T Σ − 1 ( t n − μ ) \begin{aligned} \Psi^{-1} = \tau_n \frac{W^{T}W + \sigma^2 I}{\sigma^2} \\ \Psi = \tau_n^{-1} \sigma^2M^{-1} = \tau_n^{-1} P = \tau_n^{-1} [I-W^T \Sigma^{-1}W] \\ m = W^T \Sigma^{-1}(t_n-\mu) \end{aligned} Ψ1=τnσ2WTW+σ2IΨ=τn1σ2M1=τn1P=τn1[IWTΣ1W]m=WTΣ1(tnμ)

3.3 Q函数及其最大化

Q函数是完全数据的对数似然关于两个潜变量 x n x_n xn τ n \tau_n τn的后验期望。

假设 Z ∼ G a ( α , β ) Z \sim Ga(\alpha,\beta) ZGa(α,β),故 E ( Z ) = α / β E(Z) = \alpha/\beta E(Z)=α/β,因此, E ( τ n ∣ t n ) = d + ν ( t n − μ ) T Σ − 1 ( t n − μ ) + ν E(\tau_n|t_n)=\frac{d+\nu}{(t_n-\mu)^{T}\Sigma^{-1}(t_n-\mu)+\nu} E(τntn)=(tnμ)TΣ1(tnμ)+νd+ν

E ( log ⁡ τ n ) = ψ ( d + ν 2 ) − log ⁡ [ ( t n − μ ) T Σ − 1 ( t n − μ ) + ν 2 ] E(\log\tau_n) = \psi(\frac{d+\nu}{2}) - \log [\frac{(t_n-\mu)^{T}\Sigma^{-1}(t_n-\mu) + \nu}{2}] E(logτn)=ψ(2d+ν)log[2(tnμ)TΣ1(tnμ)+ν]

τ n \tau_n τn服从的是伽马分布,伽马分布属于指数族分布,而 log ⁡ τ n \log\tau_n logτn是伽马分布的一个充分统计量,因此该公式的计算使用了指数组分布的充分统计量的期望,即 E ( T ) = b ( θ ) ˙ E(T)=\dot{b(\theta)} E(T)=b(θ)˙,利用此公式即可轻松计算 E ( log ⁡ τ n ) E(\log\tau_n) E(logτn)

E ( x n ∣ t n , τ n ) = W T Σ − 1 ( t n − μ ) E(x_n|t_n,\tau_n) = W^{T}\Sigma^{-1}(t_n-\mu) E(xntn,τn)=WTΣ1(tnμ)

E ( x n x n T ∣ t n , τ n ) = τ n − 1 σ 2 ( σ 2 I + W T W ) − 1 + E ( x n ∣ t n , τ n ) E ( x n T ∣ t n , τ n ) E(x_n x_n^{T}|t_n,\tau_n) = \tau_n^{-1} \sigma^2 (\sigma^2I + W^{T}W)^{-1} + E(x_n|t_n,\tau_n)E(x_n^{T}|t_n,\tau_n) E(xnxnTtn,τn)=τn1σ2(σ2I+WTW)1+E(xntn,τn)E(xnTtn,τn)

E ( τ n x n T ∣ t n ) = E ( τ n E ( x n T ∣ t n ) ∣ t n ) = E ( τ n ∣ t n ) E ( x n T ∣ t n ) ( 因为 E ( x n T ∣ t n ) 是常数 ) E(\tau_n x_n^{T}|t_n) = E(\tau_n E(x_n^{T}|t_n)|t_n)=E(\tau_n|t_n)E(x_n^{T}|t_n) (因为E(x_n^{T}|t_n)是常数) E(τnxnTtn)=E(τnE(xnTtn)tn)=E(τntn)E(xnTtn)(因为E(xnTtn)是常数)

E ( τ n x n x n T ∣ t n ) = E ( τ n E ( x n x n T ∣ t n , τ n ) ∣ t n ) = σ 2 M − 1 + E ( τ n ∣ t n ) E ( x n ∣ t n , τ n ) E ( x n T ∣ t n , τ n ) E(\tau_n x_n x_n^{T}|t_n) = E(\tau_n E(x_n x_n^{T}|t_n,\tau_n)|t_n)=\sigma^2 M^{-1} + E(\tau_n|t_n)E(x_n|t_n,\tau_n)E(x_n^{T}|t_n,\tau_n) E(τnxnxnTtn)=E(τnE(xnxnTtn,τn)tn)=σ2M1+E(τntn)E(xntn,τn)E(xnTtn,τn)

其中, M = σ 2 I + W T W M = \sigma^{2} I +W^{T}W M=σ2I+WTW

根据上面的后验期望公式,完全数据的对数似然函数转化为Q函数为:

Q 1 ( Θ ) = Q 1 ( μ , W , σ 2 ) = − N d 2 log ⁡ σ 2 − 1 2 σ 2 ∑ n = 1 N { E ( τ n ∣ t n ) ∥ t n − μ ∥ 2 − 2 E ( τ n x n T ∣ t n ) W T ( t n − μ ) + t r [ W E ( τ n x n x n T ) W T ] } \begin{aligned} Q_1(\Theta) &= Q_1(\mu,W,\sigma^2) \\ &=-\frac{Nd}{2}\log\sigma^2 - \frac{1}{2\sigma^2}\sum_{n=1}^{N}\{E(\tau_n|t_n)\|t_n-\mu\|^{2} - 2E(\tau_n x_n^{T}|t_n)W^{T}(t_n-\mu) + tr[WE(\tau_n x_n x_n^{T})W^{T}]\} \end{aligned} Q1(Θ)=Q1(μ,W,σ2)=2Ndlogσ22σ21n=1N{E(τntn)tnμ22E(τnxnTtn)WT(tnμ)+tr[WE(τnxnxnT)WT]}

Q 2 ( ν ) = N ν 2 log ⁡ ν 2 − N log ⁡ Γ ( ν 2 ) + ν 2 ∑ n = 1 N E ( log ⁡ τ n − τ n ) \begin{aligned} Q_2(\nu) = \frac{N\nu}{2}\log\frac{\nu}{2} - N\log\Gamma(\frac{\nu}{2}) + \frac{\nu}{2} \sum_{n=1}^{N}E(\log\tau_n - \tau_n) \end{aligned} Q2(ν)=2Nνlog2νNlogΓ(2ν)+2νn=1NE(logτnτn)

对上述的 Q 1 Q_1 Q1 Q 2 Q_2 Q2函数分别关于 ( μ , W , σ 2 ) (\mu,W,\sigma^2) (μ,W,σ2) ν \nu ν求导数:

∂ Q 1 ∂ μ = 1 2 σ 2 ∑ n = 1 N { 2 E ( τ n ∣ t n ) ( t n − μ ) − 2 W E ( τ n x n ∣ t n ) } = 0 \frac{\partial Q_1}{\partial \mu} = \frac{1}{2\sigma^2} \sum_{n=1}^{N}\{2E(\tau_n|t_n)(t_n-\mu) - 2WE(\tau_n x_n|t_n)\} = 0 μQ1=2σ21n=1N{2E(τntn)(tnμ)2WE(τnxntn)}=0

∑ n = 1 N E ( τ n ∣ t n ) ( t n − μ ) = W ∑ n = 1 N E ( τ n ∣ t n ) E ( x n ∣ τ n ) \sum_{n=1}^{N}E(\tau_n|t_n)(t_n-\mu) =W \sum_{n=1}^{N} E(\tau_n|t_n) E(x_n|\tau_n) n=1NE(τntn)(tnμ)=Wn=1NE(τntn)E(xnτn)

μ ( t + 1 ) = ∑ n = 1 N E ( τ n ∣ t n ) [ t n − W ( t + 1 ) E ( x n ∣ t n ) ] ∑ n = 1 N E ( τ n ∣ t n ) \mu^{(t+1)} = \frac{\sum_{n=1}^{N} E(\tau_n|t_n) [t_n - W^{(t+1)} E(x_n|t_n)]}{\sum_{n=1}^{N} E(\tau_n|t_n)} μ(t+1)=n=1NE(τntn)n=1NE(τntn)[tnW(t+1)E(xntn)]

ρ = 1 N ∑ n = 1 N E ( τ n ∣ t n ) \rho = \frac{1}{N} \sum_{n=1}^{N}E(\tau_n|t_n) ρ=N1n=1NE(τntn), μ ∗ = 1 N ρ ∑ n = 1 N E ( τ n ∣ t n ) t n \mu^{*} = \frac{1}{N \rho} \sum_{n=1}^{N} E(\tau_n|t_n)t_n μ=1n=1NE(τntn)tn,则有 μ \mu μ的更新迭代公式为:

μ ( t + 1 ) = μ ( t ) + σ 2 ( t ) [ Σ ( t ) ] − 1 ( μ ∗ ( t ) − μ ( t ) ) \mu^{(t+1)} = \mu^{(t)} + \sigma^{2(t)}[\Sigma^{(t)}]^{-1} (\mu^{*(t)} - \mu^{(t)}) μ(t+1)=μ(t)+σ2(t)[Σ(t)]1(μ(t)μ(t))

求解W:
d Q 1 ∣ W = 1 2 σ 2 ∑ n = 1 N { 2 t r [ ( t n − μ ) E ( τ n x n T ∣ t n ) d W T ] − 2 t r [ W E ( τ n x n x n T d W T ) ] } = 1 σ 2 t r { ∑ n = 1 N [ ( t n − μ ) E ( τ n x n T ) − W E ( τ n x n x n T ) ] d W T } \begin{aligned} dQ1_{|W} &= \frac{1}{2\sigma^2} \sum_{n=1}^{N} \{2tr[(t_n-\mu) E(\tau_n x_n^{T}|t_n) dW^{T}] -2tr[W E(\tau_n x_n x_n^{T} dW^{T})]\} \\ &=\frac{1}{\sigma^2} tr\{\sum_{n=1}^{N}[(t_n-\mu)E(\tau_n x_n^{T}) - W E(\tau_n x_n x_n^{T})] dW^T \} \end{aligned} dQ1W=2σ21n=1N{2tr[(tnμ)E(τnxnTtn)dWT]2tr[WE(τnxnxnTdWT)]}=σ21tr{n=1N[(tnμ)E(τnxnT)WE(τnxnxnT)]dWT}

∂ Q 1 ∂ W = 1 σ 2 ∑ n = 1 N { ( t n − μ ) E ( τ n x n T ) − W E ( τ n x n x n T ) } = 0 \frac{\partial Q_1}{\partial W} = \frac{1}{\sigma^2} \sum_{n=1}^{N} \{(t_n-\mu)E(\tau_n x_n^{T}) - W E(\tau_n x_n x_n^{T}) \}=0 WQ1=σ21n=1N{(tnμ)E(τnxnT)WE(τnxnxnT)}=0

W ( t + 1 ) = [ ∑ n = 1 N ( t n − μ ( t + 1 ) ) E ( τ n x n T ∣ t n ) ] × [ ∑ n = 1 N E ( τ n x n x n T ∣ t n ) ] − 1 W^{(t+1)} = [\sum_{n=1}^{N} (t_n - \mu^{(t+1)}) E(\tau_n x_n^{T}|t_n)] \times [\sum_{n=1}^{N} E(\tau_n x_n x_n^{T}|t_n)]^{-1} W(t+1)=[n=1N(tnμ(t+1))E(τnxnTtn)]×[n=1NE(τnxnxnTtn)]1

求解 σ 2 \sigma^2 σ2:
∂ Q 1 ∂ σ 2 = − N d 2 σ 2 + 1 σ 4 ∑ n = 1 N { E ( τ n ∣ t n ) ∥ t n − μ ∥ 2 − 2 E ( τ n x n T ∣ t n ) W T ( t n − μ ) + t r [ W E ( τ n x n x n T ) W T ] } = 0 \frac{\partial Q_1}{\partial \sigma^2} = -\frac{Nd}{2\sigma^2} + \frac{1}{\sigma^4} \sum_{n=1}^{N} \{ E(\tau_n|t_n)\|t_n-\mu\|^{2} - 2E(\tau_n x_n^{T}|t_n)W^{T}(t_n-\mu) + tr[WE(\tau_n x_n x_n^{T})W^{T}] \} = 0 σ2Q1=2σ2Nd+σ41n=1N{E(τntn)tnμ22E(τnxnTtn)WT(tnμ)+tr[WE(τnxnxnT)WT]}=0

σ 2 ( t + 1 ) = 1 N d ∑ n = 1 N { E ( τ n ∣ t n ) ∥ t n − μ ( t + 1 ) ∥ 2 − 2 E ( τ n x n T ∣ t n ) W ( t + 1 ) T ( t n − μ ( t + 1 ) ) + t r [ W ( t + 1 ) E ( τ n x n x n T ) W ( t + 1 ) T ] } \sigma^{2(t+1)} = \frac{1}{Nd} \sum_{n=1}^{N} \{ E(\tau_n|t_n)\|t_n-\mu^{(t+1)} \|^{2} - 2E(\tau_n x_n^{T}|t_n)W^{(t+1)T}(t_n-\mu^{(t+1)}) + tr[W^{(t+1)} E(\tau_n x_n x_n^{T})W^{(t+1)T}] \} σ2(t+1)=Nd1n=1N{E(τntn)tnμ(t+1)22E(τnxnTtn)W(t+1)T(tnμ(t+1))+tr[W(t+1)E(τnxnxnT)W(t+1)T]}

求解 ν \nu ν:

d Q 2 ( ν ) d ν = 1 2 ∑ n = 1 N [ 1 + log ⁡ ν 2 − ψ ( ν 2 ) + E ( log ⁡ τ n ∣ t n ) − E ( τ n ∣ t n ) ] = 0 \frac{dQ_2(\nu)}{d \nu} = \frac{1}{2} \sum_{n=1}^{N}[1 + \log\frac{\nu}{2} - \psi(\frac{\nu}{2}) + E(\log\tau_n|t_n) - E(\tau_n|t_n)]=0 dνdQ2(ν)=21n=1N[1+log2νψ(2ν)+E(logτntn)E(τntn)]=0

上述式子不能直接通过反解得到 ν \nu ν的数学表达式,因此需要对该式通过一元搜索得到最佳的零解,在此使用tppca论文中的近似解表达式:

y = − 1 N ∑ n = 1 N [ E ( log ⁡ ( τ n ∣ t n ) − E ( τ n ∣ t n ) ] y = - \frac{1}{N} \sum_{n=1}^{N} [E(\log(\tau_n|t_n) - E(\tau_n|t_n)] y=N1n=1N[E(log(τntn)E(τntn)]

ν ( t + 1 ) = 2 y + log ⁡ y − 1 + 0.0416 × [ 1 + erf ( 0.6594 log ⁡ ( 2.1971 y + log ⁡ y − 1 ) ) ] \nu^{(t+1)} = \frac{2}{y+\log y -1} + 0.0416 \times [ 1 + \textrm{erf}(0.6594\log(\frac{2.1971}{y + \log y -1 }))] ν(t+1)=y+logy12+0.0416×[1+erf(0.6594log(y+logy12.1971))]

其中, erf ( x ) = ( 2 π ) ∫ 0 x exp ⁡ ( − t 2 ) d t \textrm{erf}(x) = (\frac{2}{\sqrt{\pi}})\int_{0}^{x} \exp(-t^2)\textrm{d}t erf(x)=(π 2)0xexp(t2)dt

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

大浪淘沙_scc

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值