CS231n
Lecture 13: Generative Models
Unsupervised Learning
相比于从有标注的训练数据中学习f:x↦yf:x↦y的有监督学习,无监督学习旨在学习无标注数据的隐含结构,包括聚类(K-means)、降维(PCA)、特征学习(Auto-encode)、密度估计等
Generative Models: Given training data, generate new samples from same distribution
实际上是一个密度估计问题(学习pmodel(x)∼pdata(x)pmodel(x)∼pdata(x)),包括两种方式
- 显式:显式定义并求解pmodel(x)pmodel(x)
- 隐式:学习一个依pmodel(x)pmodel(x)采样的模型而不显示定义它
应用
- artwork, super-resolution, colorization, etc
- Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!)
- Training generative models can also enable inference of latent representations that can be useful as general features(这是个好思路)
分类
PixelRNN
从图像的左上开始从左到右从上到下依次遍历所有像素点,用RNN建模其中的依赖关系
PixelCNN
同PixelRNN只不过用CNN建模依赖关系
Training is faster than PixelRNN
显式建模的优势
- explicitly compute likelihood
p(x) - Explicit likelihood of training
data gives good evaluation
metric - Good samples
缺点:Sequential generation ⇒⇒ slow
Variational Auto-Encoder
Autoencoder: x−→−−−encoderz−→−−−decoderx^,L(x)=∥x−x^∥2x→encoderz→decoderx^,L(x)=‖x−x^‖2, learning a lower-dimensional feature representation zz from unlabeled training data . After training, throw away decoder, Encoder can be used to initialize a supervised model
Try generating new images from an autoencoder ⇒⇒ VAE
- Choose prior p(z)p(z) to be simple, e.g. Gaussian ⇒p(z)∼N(0,1)⇒p(z)∼N(0,1)
- Conditional p(x|z)p(x|z) is complex (generates image) ⇒⇒ represent with neural network
Train
理论上p(z)=∫pθ(z)pθ(x|z)dzp(z)=∫pθ(z)pθ(x|z)dz,但是没法对每一个zz求解相应的,而且后验概率
解决方案:在VAE decoder模型的基础上再定义一个encoder qϕ(z|x)qϕ(z|x)以逼近pθ(z|x)pθ(z|x)。
编码器和解码器都是概率上的,输入xx编码器后根据得到z|x∼N(μz|x,Σz|x)z|x∼N(μz|x,Σz|x),即隐空间中的表示;再由解码器pθ(x|z)pθ(x|z)将隐空间中的变量zz映射回输入空间得到,所以编码器和解码器也叫recognition/inference和generation网络
推导
前一部分是编码器的loss,后一部分是解码器的loss,二者同时训练。训练完毕后,仅使用解码器,就可以进行数据生成