斯坦福cs231课程笔记（下）

最新推荐文章于 2024-10-03 13:28:55 发布

cookie_17

最新推荐文章于 2024-10-03 13:28:55 发布

阅读量543

点赞数

分类专栏：深度学习

本文链接：https://blog.youkuaiyun.com/u012892939/article/details/79596913

版权

深度学习专栏收录该内容

10 篇文章

订阅专栏

Lecture 8: Deep Learning Software

Lecture 9: CNN Architectures

AlexNet

VGGNet

GoogLeNet

- 22 total layers with weights (including each parallel layer in an Inception module)

- “Inception module”: design a good local network topology(network within a network) and then stack these modules on top of each other

Naive Inception module：原始的inception module 计算复杂度太高

Apply parallel filter operations on the input from previous layer:

- Multiple receptive field sizes for convolution (1x1, 3x3,5x5)
- Pooling operation (3x3)
Concatenate all filter outputs together depth-wise

What is the problem with this? Computational complexity

Solution: “bottleneck” layers that use 1x1 convolutions to reduce feature depth

- Preserves spatial dimensions, reduces depth! 减小深度

- Projects depth to lower dimension (combination of 32 feature maps)

ResNet

当我们继续在plain CNN上stacking deeper layers，会发生什么？

56-layer model performs worse on both training and test error

-> The deeper model performs worse, but it’s not caused by overfitting!

更深的网络在训练集和测试集上都表现更差，说明不是由overfitting造成的。

问题出在 optimization上，更深的网络更难优化

- The deeper model should be able to perform at least as well as the shallower model.

- A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping.

Training ResNet in practice:

- Batch Normalization after every CONV layer

- Xavier/2 initialization from He et al.

- SGD + Momentum (0.9)

- Learning rate: 0.1, divided by 10 when validation error plateaus

- Mini-batch size 256

- Weight decay of 1e-5

- No dropout used

Lecture 10

RNN

LSTM

Lecture 11: Detection and Segmentation

R-CNN

目标检测

Lecture 12: Visualizing and Understanding

Lecture 13: 生成模型

生成模型是一种无监督（不需要external label）

Given training data, generate new samples from same distribution

Want to learn p model(x) similar to p data(x)

pixelRNN/CNN 显示密度估计，Explicit density estimation: explicitly define and solve for pmodel(x)
Variational Autoencoder
GAN 隐式密度估计，Implicit density estimation: learn model that can sample from pmodel(x) w/o explicitly defining it

PixelRNN and PixelCNN

Variational Autoencoders（VAE）

After training, throw away decoder

Auto encoders can reconstruct data, and can learn features to initialize a supervised model

Features capture factors of variation in training data. Can we generate new images from an autoencoder?

We want to estimate the true parameters of this generative model.

How should we represent this model?

Choose prior p(z) to be simple, e.g.Gaussian.

Reasonable for latent attributes, e.g. pose, how much smile.

Conditional p(x|z) is complex (generates image) => represent with neural network

Let's look at computing the bound (forward pass) for a given minibatch of input data

Diagonal prior on z => independent latent variables

Different dimensions of z encode interpretable factors of variation

Generative Adversarial Networks（GANs）

GANs: don’t work with any explicit density function!

Instead, take game-theoretic approach: learn to generate from training distribution through 2-player game

Aside: Jointly training two networks is challenging, can be unstable. Choosing objectives with better loss landscapes helps training, is an active area of research.