斯坦福cs231课程笔记(下)

Lecture 8:    Deep Learning Software

Lecture 9:    CNN Architectures

AlexNet


VGGNet


GoogLeNet

     - 22 total layers with weights (including each parallel layer in an Inception module)
    -  “Inception module”: design a good local network topology(network within a network) and then stack these modules on top of each other
    

Naive Inception module: 原始的inception module 计算复杂度太高
Apply parallel filter operations on the input from previous layer:
  • - Multiple receptive field sizes for convolution (1x1, 3x3,5x5)
  • - Pooling operation (3x3)
    Concatenate all filter outputs together depth-wise
What is the problem with this?      Computational complexity
Solution: “bottleneck” layers that use 1x1 convolutions to reduce feature depth
    -  Preserves spatial dimensions, reduces depth! 减小深度
    -  Projects depth to lower dimension (combination of 32  feature maps)



ResNet

当我们继续在plain CNN上stacking deeper layers,会发生什么?

56-layer model performs worse on both training and test error
-> The deeper model performs worse, but it’s not caused by overfitting!
更深的网络在训练集和测试集上都表现更差,说明不是由overfitting造成的。
问题出在 optimization上,更深的网络更难优化
- The deeper model should be able to perform at least as well as the shallower model.
- A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping.


Training ResNet in practice:
- Batch Normalization after every CONV layer
- Xavier/2 initialization from He et al.
- SGD + Momentum (0.9)
- Learning rate: 0.1, divided by 10 when validation error plateaus
- Mini-batch size 256
- Weight decay of 1e-5
- No dropout used 


Lecture 10

RNN


LSTM



Lecture 11:    Detection and Segmentation

R-CNN
目标检测

Lecture 12:    Visualizing and Understanding

Lecture 13:    生成模型

生成模型是一种无监督(不需要external label)
Given training data, generate new samples from same distribution
Want to learn p model(x) similar to p data(x)
  • pixelRNN/CNN  显示密度估计,Explicit density estimation: explicitly define and solve for pmodel(x)
  • Variational Autoencoder 
  • GAN     隐式密度估计,Implicit density estimation: learn model that can sample from pmodel(x) w/o explicitly defining it

PixelRNN and PixelCNN




Variational Autoencoders(VAE)



After training, throw away decoder


Auto encoders can reconstruct  data, and can learn features to  initialize a supervised model
Features capture factors of  variation in training data. Can we  generate new images from an  autoencoder?

We want to estimate the true parameters of this generative model.

How should we represent this model?
Choose prior p(z) to be simple, e.g.Gaussian. 
Reasonable for latent attributes, e.g. pose, how much smile.
Conditional p(x|z) is complex (generates image) => represent with neural network 

Let's look at computing the bound (forward pass) for a given minibatch of input data
Diagonal prior on z  => independent  latent variables
Different  dimensions of z  encode  interpretable factors  of variation


Generative Adversarial Networks(GANs)

GANs: don’t work with any explicit density function!

Instead, take game-theoretic approach: learn to generate from training distribution  through 2-player game

Aside: Jointly training two networks is challenging, can be unstable. Choosing objectives with better loss landscapes helps training, is an active area of research.

GAN从结构上来讲巧妙而简单(尽管有与其他经典工作Idea相似的争议[6~7]),也非常易于理解,
整个模型只有两个部件:1.生成器G;2.判别器D。
生成模型其实由来已久,所以生成器也并不新鲜, 生成器G的目标是生成出最接近于真实样本的假样本分布,
在以前没有判别器D的时候,生成器的训练依靠每轮迭代返回当前生成样本与真实样本的差异(把这个差异转化成loss)来进行参数优化,
而判别器D的出现改变了这一点,判别器D的目标是尽可能准确地辨别生成样本和真实样本,
而这时生成器G的训练目标就由最小化“生成-真实样本差异”变为了尽量弱化判别器D的辨别能力(这时候训练的目标函数中包含了判别器D的输出)。
GAN模型的大体框架如下图所示:


总结:


Lecture 14:    Reinforcement Learning

Lecture 15:    硬件加速


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值