Lecture 13: Generative Models

最新推荐文章于 2024-11-18 11:46:51 发布

qq_36356761

最新推荐文章于 2024-11-18 11:46:51 发布

阅读量343

点赞数

CC 4.0 BY-SA版权

分类专栏： CS231n

本文链接：https://blog.youkuaiyun.com/qq_36356761/article/details/80112562

CS231n 专栏收录该内容

14 篇文章

订阅专栏

本文探讨了无监督学习中的生成模型，重点介绍了显式和隐式密度估计方法，包括PixelRNN/CNN、变分自编码器(VAE)等。详细解释了这些模型如何用于图像生成及潜在表示的学习。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

CS231n

Lecture 13: Generative Models

Unsupervised Learning

相比于从有标注的训练数据中学习 $f:x\mapsto y$ 的有监督学习，无监督学习旨在学习无标注数据的隐含结构，包括聚类(K-means)、降维(PCA)、特征学习(Auto-encode)、密度估计等

Generative Models: Given training data, generate new samples from same distribution
实际上是一个密度估计问题（学习 $p_\mathrm{model}(x) \sim p_\mathrm{data}(x)$ ），包括两种方式

显式：显式定义并求解 $p_\mathrm{model}(x)$
隐式：学习一个依 $p_\mathrm{model}(x)$ 采样的模型而不显示定义它

应用

artwork, super-resolution, colorization, etc
Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!)
Training generative models can also enable inference of latent representations that can be useful as general features（这是个好思路）

分类

Generative models ⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ explicit density ⎧ ⎩ ⎨ ⎪ ⎪ Tractable density:Pixel RNN/CNN approximate density {Variational:Variational Auto-Encoder Markov chain: Boltzmann Machine implicit density {Direct: GAN Markov chain: GSN

$\text{Generative models}\begin{cases} \text{explicit density}\begin{cases} \text{Tractable density:Pixel RNN/CNN}\\ \text{approximate density}\begin{cases} \text{Variational:Variational Auto-Encoder}\\\text{Markov chain: Boltzmann Machine} \end{cases}\end{cases}\\\text{implicit density} \begin{cases}\text{Direct: GAN}\\\text{Markov chain: GSN}\end{cases}\end{cases}$

PixelRNN

从图像的左上开始从左到右从上到下依次遍历所有像素点，用RNN建模其中的依赖关系

PixelCNN

同PixelRNN只不过用CNN建模依赖关系
Training is faster than PixelRNN
显式建模的优势

explicitly compute likelihood
p(x)
Explicit likelihood of training
data gives good evaluation
metric
Good samples

缺点：Sequential generation $\Rightarrow$ slow

Variational Auto-Encoder

Autoencoder: $x\xrightarrow[]{\text{encoder}}z\xrightarrow[]{\text{decoder}}\hat{x}, L(x) = \lVert x - \hat{x}\rVert^2$ , learning a lower-dimensional feature representation $z$ from unlabeled training data $x$ . After training, throw away decoder, Encoder can be used to initialize a supervised model
Try generating new images from an autoencoder $\Rightarrow$ VAE

z \sim p θ * (z) - \to - - VAE x \sim p θ * (x | z)

$z\sim p_{\theta^*}(z)\xrightarrow[]{\text{VAE}}x\sim p_{\theta^*}(x|z)$

Choose prior $p(z)$ to be simple, e.g. Gaussian $\Rightarrow p(z)\sim N(0,1)$
Conditional $p(x|z)$ is complex (generates image) $\Rightarrow$ represent with neural network

Train
理论上 $p(z)=\int p_{\theta}(z)p_{\theta}(x|z)\mathrm{d}z$ ，但是没法对每一个 $z$ 求解相应的 $p(x|z)$ ，而且后验概率

p θ (z | x) = p θ ( x | z ) p θ ( z ) p θ ( x )

$p_\theta(z|x)=\frac{p_\theta(x|z)p_\theta(z)}{p_\theta(x)}$ 也无法求解
解决方案：在VAE decoder模型的基础上再定义一个encoder

qϕ(z|x)qϕ(z|x) $q_\phi (z|x)$ 以逼近

pθ(z|x)pθ(z|x) $p_\theta(z|x)$ 。
编码器和解码器都是概率上的，输入

xx $x$ 编码器后根据

q_{ϕ} (z | x)

$q_\phi(z|x)$ 得到

z|x∼N(μz|x,Σz|x)z|x∼N(μz|x,Σz|x) $z|x \sim N(\mu_{z|x}, \Sigma_{z|x})$ ，即隐空间中的表示；再由解码器

pθ(x|z)pθ(x|z) $p_\theta(x|z)$ 将隐空间中的变量

zz $z$ 映射回输入空间得到

x | z \sim N (μ_{x | z}, Σ_{x | z})

$x|z \sim N(\mu_{x|z},\Sigma_{x|z})$ ，所以编码器和解码器也叫recognition/inference和generation网络
推导

log p θ (x) = E z \sim q ϕ (z | x) (log p θ (x)) = E z (log p θ ( x | z ) p θ ( z ) p θ ( z | x )) \Rightarrow max E z (log p θ (x | z)) - D K L (q ϕ (z | x) | | p θ (z))

$\log p_\theta(x) = E_{z\sim q_\phi(z|x)}(\log p_\theta(x)) = E_z (\log \frac{p_\theta(x|z)p_\theta(z)}{p_\theta(z|x)}) \\ \Rightarrow\max E_z(\log p_\theta(x|z)) - D_{KL}(q\phi(z|x)||p_\theta(z))$
前一部分是编码器的loss，后一部分是解码器的loss，二者同时训练。训练完毕后，仅使用解码器，就可以进行数据生成