VAE概率图模型如下
实线为生成过程:pθ(z)pθ(x∣z)p_{\boldsymbol{\theta}}(\mathbf{z}) p_{\boldsymbol{\theta}}(\mathbf{x} | \mathbf{z})pθ(z)pθ(x∣z),虚线为推断过程:qϕ(z∣x)q_{\phi}(\mathbf{z} | \mathbf{x})qϕ(z∣x)近似真实后验分布pθ(z∣x)p_{\boldsymbol{\theta}}(\mathbf{z} | \mathbf{x})pθ(z∣x)。
对于数据X={x(i)}i=1N\mathbf{X}=\left\{\mathbf{x}^{(i)}\right\}_{i=1}^{N}X={x(i)}i=1N,我们的目标是使likelihood∑i=1Nlogpθ(x(i))\sum_{i=1}^{N} \log p_{\theta}\left(\mathbf x^{(i)}\right)∑i=1Nlogpθ(x(i))最大,但∑i=1Nlogpθ(x(i))=∑i=1Nlog∫z(i)pθ(x(i),z(i))dz(i)\sum_{i=1}^{N} \log p_{\theta}\left(\mathbf x^{(i)}\right)=\sum_{i=1}^{N} \log \int_{\mathbf z^{(i)}} p_{\theta}\left(\mathbf x^{(i)},\mathbf z^{(i)}\right) d \mathbf z^{(i)}i=1∑Nlogpθ(x(i))=i=1∑Nlog∫z(i)pθ(x(i),z(i))dz(i)是没法直接优化的。为了描述方便,先略去x,z\mathbf x,\mathbf zx,z的上标。
首先考虑引入一个变分后验qϕ(z∣x)→pθ(z∣x)q_{\phi}(\mathbf z | \mathbf x) \rightarrow p_{\theta}(\mathbf z | \mathbf x)qϕ(z∣x)→pθ(z∣x),则KL(qϕ(z∣x)∥pθ(z∣x))=Eqϕ(z∣x)[logqϕ(z∣x)pθ(z∣x)]=Eqϕ(z∣x)[logqϕ(z∣x)pθ(z,x)]+logpθ(x)
\begin{aligned} K L\left(q_{\phi}(\mathbf z | \mathbf x) \| p_{\theta}(\mathbf z | \mathbf x)\right) &=E_{q_{\phi}(\mathbf z | \mathbf x)}\left[\log \frac{q_{\phi}(\mathbf z | \mathbf x)}{p_{\theta}(\mathbf z | \mathbf x)}\right] \\ &=E_{q_{\phi}(\mathbf z | \mathbf x)}\left[\log \frac{q_{\phi}(\mathbf z | \mathbf x)}{p_{\theta}(\mathbf z, \mathbf x)}\right]+\log p_{\theta}(\mathbf x) \end{aligned}
KL(qϕ(z∣x)∥pθ(z∣x))=Eqϕ(z∣x)[logpθ(z∣x)qϕ(z∣x)]=Eqϕ(z∣x)[logpθ(z,x)qϕ(z∣x)]+logpθ(x)所以,logpθ(x)=KL(qϕ(z∣x)∥pθ(z∣x))−Eqϕ(z∣x)[logqϕ(z∣x)pθ(z,x)]
\log p_{\theta}(\mathbf x)=K L\left(q_{\phi}(\mathbf z | \mathbf x) \| p_{\theta}(\mathbf z | \mathbf x)\right)-E_{q_{\phi}(\mathbf z | \mathbf x)}\left[\log \frac{q_{\phi}(\mathbf z | \mathbf x)}{p_{\theta}(\mathbf z, \mathbf x)}\right]
logpθ(x)=KL(qϕ(z∣x)∥pθ(z∣x))−Eqϕ(z∣x)[logpθ(z,x)qϕ(z∣x)]而KL(⋅∥⋅)≥0K L(\cdot \| \cdot) \geq 0KL(⋅∥⋅)≥0,那么logpθ(x)≥−Eqϕ(z∣x)[logqϕ(z∣x)pθ(z,x)]
\log p_{\theta}(\mathbf x) \geq-E_{q_{\phi}(\mathbf z | \mathbf x)}\left[\log \frac{q_{\phi}(\mathbf z | \mathbf x)}{p_{\theta}(\mathbf z, \mathbf x)}\right]
logpθ(x)≥−Eqϕ(z∣x)[logpθ(z,x)qϕ(z∣x)]这样我们就得到了ELBOELBOELBOELBO=Eqϕ(z∣x)[logpθ(z,x)qϕ(z∣x)]=Eqϕ(z∣x)[logpθ(x∣z)]−KL(qϕ(z∣x)∥pθ(z))
\begin{aligned} E L B O &=E_{q_{\phi}(\mathbf z | \mathbf x)}\left[\log \frac{p_{\theta}(\mathbf z, \mathbf x)}{q_{\phi}(\mathbf z | \mathbf x)}\right] \\ &=E_{q_{\phi}(\mathbf z | \mathbf x)}\left[\log p_{\theta}(\mathbf x | \mathbf z)\right]-K L\left(q_{\phi}(\mathbf z | \mathbf x) \| p_{\theta}(\mathbf z)\right) \end{aligned}
ELBO=Eqϕ(z∣x)[logqϕ(z∣x)pθ(z,x)]=Eqϕ(z∣x)[logpθ(x∣z)]−KL(qϕ(z∣x)∥pθ(z))上式的第二部分可以得到解析解,而第一部分只能通过MCMC方法近似得到,再采用reparameterization trick,可以很大程度得到较为精确的近似结果ELB~O=1L∑l=1Llogpθ(x∣zl)−KL(qϕ(z∣x)∥pθ(z))
E \widetilde{L B} O=\frac{1}{L} \sum_{l=1}^{L} \log p_{\theta}\left(\mathbf x | \mathbf z_{l}\right)-K L\left(q_{\phi}(\mathbf z | \mathbf x) \| p_{\theta}(\mathbf z)\right)
ELBO=L1l=1∑Llogpθ(x∣zl)−KL(qϕ(z∣x)∥pθ(z))其中zl=gϕ(ϵl,x),ϵl∼p(ϵ)\mathbf z_{l}=g_{\phi}\left(\epsilon_{l}, \mathbf x\right), \epsilon_{l} \sim p(\epsilon)zl=gϕ(ϵl,x),ϵl∼p(ϵ)。以上都是分析一个数据,对于一个数据集的数据,可以每次采样一个mini-batch近似整个数据即可L=NM∑i=1MELB~O(θ,ϕ;x(i))
L=\frac{N}{M} \sum_{i=1}^{M} E \widetilde{L B} O\left(\theta, \phi ; \mathbf x^{(i)}\right)
L=MNi=1∑MELBO(θ,ϕ;x(i))以上就是VAE的核心内容,一般来说可以取pθ(z)∼N(0,I)pθ(x∣z)∼N(μz,Σz) or Bernulli(pz)qϕ(z∣x)∼N(μx,Σx)
\begin{aligned} p_{\theta}(\mathbf z) & \sim N(0, I) \\ p_{\theta}(\mathbf x | \mathbf z) & \sim N\left(\mu_{\mathbf z}, \Sigma_{\mathbf z}\right) \text { or Bernulli}\left(p_{\mathbf z}\right) \\ q_{\phi}(\mathbf z | \mathbf x) & \sim N\left(\mu_{\mathbf x}, \Sigma_{\mathbf x}\right) \end{aligned}
pθ(z)pθ(x∣z)qϕ(z∣x)∼N(0,I)∼N(μz,Σz) or Bernulli(pz)∼N(μx,Σx)