Lecture 13: Generative Models

本文探讨了无监督学习中的生成模型,重点介绍了显式和隐式密度估计方法,包括PixelRNN/CNN、变分自编码器(VAE)等。详细解释了这些模型如何用于图像生成及潜在表示的学习。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

CS231n

Lecture 13: Generative Models

Unsupervised Learning

相比于从有标注的训练数据中学习f:xyf:x↦y的有监督学习,无监督学习旨在学习无标注数据的隐含结构,包括聚类(K-means)、降维(PCA)、特征学习(Auto-encode)、密度估计等

Generative Models: Given training data, generate new samples from same distribution
实际上是一个密度估计问题(学习pmodel(x)pdata(x)pmodel(x)∼pdata(x)),包括两种方式

  • 显式:显式定义并求解pmodel(x)pmodel(x)
  • 隐式:学习一个依pmodel(x)pmodel(x)采样的模型而不显示定义它

应用

  • artwork, super-resolution, colorization, etc
  • Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!)
  • Training generative models can also enable inference of latent representations that can be useful as general features(这是个好思路)

分类

Generative modelsexplicit densityTractable density:Pixel RNN/CNNapproximate density{Variational:Variational Auto-EncoderMarkov chain: Boltzmann Machineimplicit density{Direct: GANMarkov chain: GSNGenerative models{explicit density{Tractable density:Pixel RNN/CNNapproximate density{Variational:Variational Auto-EncoderMarkov chain: Boltzmann Machineimplicit density{Direct: GANMarkov chain: GSN
PixelRNN

从图像的左上开始从左到右从上到下依次遍历所有像素点,用RNN建模其中的依赖关系

PixelCNN

同PixelRNN只不过用CNN建模依赖关系
Training is faster than PixelRNN
显式建模的优势

  • explicitly compute likelihood
    p(x)
  • Explicit likelihood of training
    data gives good evaluation
    metric
  • Good samples

缺点:Sequential generation slow

Variational Auto-Encoder

Autoencoder: xencoderzdecoderx^,L(x)=xx^2x→encoderz→decoderx^,L(x)=‖x−x^‖2, learning a lower-dimensional feature representation zz from unlabeled training data x. After training, throw away decoder, Encoder can be used to initialize a supervised model
Try generating new images from an autoencoder VAE

zpθ(z)VAExpθ(x|z)z∼pθ∗(z)→VAEx∼pθ∗(x|z)
  • Choose prior p(z)p(z) to be simple, e.g. Gaussian p(z)N(0,1)⇒p(z)∼N(0,1)
  • Conditional p(x|z)p(x|z) is complex (generates image) represent with neural network

Train
理论上p(z)=pθ(z)pθ(x|z)dzp(z)=∫pθ(z)pθ(x|z)dz,但是没法对每一个zz求解相应的p(x|z),而且后验概率

pθ(z|x)=pθ(x|z)pθ(z)pθ(x)pθ(z|x)=pθ(x|z)pθ(z)pθ(x)
也无法求解
解决方案:在VAE decoder模型的基础上再定义一个encoder qϕ(z|x)qϕ(z|x)以逼近pθ(z|x)pθ(z|x)
编码器和解码器都是概率上的,输入xx编码器后根据qϕ(z|x)得到z|xN(μz|x,Σz|x)z|x∼N(μz|x,Σz|x),即隐空间中的表示;再由解码器pθ(x|z)pθ(x|z)将隐空间中的变量zz映射回输入空间得到x|zN(μx|z,Σx|z),所以编码器和解码器也叫recognition/inference和generation网络
推导
logpθ(x)=Ezqϕ(z|x)(logpθ(x))=Ez(logpθ(x|z)pθ(z)pθ(z|x))maxEz(logpθ(x|z))DKL(qϕ(z|x)||pθ(z))log⁡pθ(x)=Ez∼qϕ(z|x)(log⁡pθ(x))=Ez(log⁡pθ(x|z)pθ(z)pθ(z|x))⇒maxEz(log⁡pθ(x|z))−DKL(qϕ(z|x)||pθ(z))

前一部分是编码器的loss,后一部分是解码器的loss,二者同时训练。训练完毕后,仅使用解码器,就可以进行数据生成
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值