[深度学习论文笔记][Visualizing] Visualizing and Understanding Convolutional Networks

Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” European Conference on Computer Vision. Springer International Publishing, 2014.(Citations: 1207).


Occlusion Experiments

Idea Occlude portions of the input image, revealing which parts of the scene are important for classification.


Method Occlude different portions of the input image with a grey square, and monitor the probability output of correct class of the classifier, plot as a function of the position of the grey square in the original image.


Result See Fig. 4.1. It can be seen that model is localizing the objects within the scene, as the probability of the correct class drops significantly when the object is occluded. In the third image, if we occlude the person’s head, the probability of the correct class goes up.


Deconv Approach

DeconvNet

For the relu layer


The backward pass is


Method For each layer, random select a subset of feature maps. For each feature map, find the top 9 neurons that have the highest activations. Projecting each separately down to pixel space by deconvnet reveals the different structures that excite the a given feature map.



Result Can be seen in Fig. 4.2, 4.3, 4.4. Alongside these visualizations we show the corresponding image patches. 

• The the strong grouping within each feature map.
• Hierarchical nature of the features in the network (layer 2: corners and other edge/color conjunctions; layer 3: textures, mesh patterns (r1, c1), and text (r2, c4); layer 4: more class-specific, like dog faces (r1, c1) and bird’s legs (r4, c2); layer 5: entire objects, like keyboards (r1, c11) and dogs (r4)). 

• Greater invariance at higher layers.

• Exaggeration of discriminative parts of the image, e.g. eyes and noses of dogs (layer 4, r1, c1).





Feature Evolution During Training The lower layers of the model can be seen to converge within a few epochs. However, the upper layers only develop after a considerable number of epochs (40-50), demonstrating the need to let the models train until fully converged. 


Feature Invariance Small transformations have a dramatic effect in the first layer of the model, but a lesser impact at the top feature layer, being quasi-linear for translation and scaling. However, the output is not invariant to rotation.

References
[1]. M. Zeiler. https://www.youtube.com/watch?v=ghEmQSxT6tw.

[2]. F.-F. Li, A. Karpathy, and J. Johnson. http://cs231n.stanford.edu/slides/winter1516_lecture9.pdf.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值