Foundation系列共有三个部分,是对《Evasion attacks against machine learning at test time》《Intriguing properties of neural networks》《Explaining and harnessing adversarial examples》三篇文章的阅读笔记整理。本文介绍《Intriguing properties of neural networks》。
文章目录
摘要(Abstract)
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties.
First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains the semantic information in the high layers of neural networks.
Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent. We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.
简而言之,本文提出深度神经网络模型存在学习到uninterpertable和counter-intuitive的特征表达的缺陷,并从中取出两个典型属性加以分析。一是高层网络节点个体不包含语义信息,语义信息包含在激活空间中;二是对输入数据进行微小变动会导致神经网络分类错误,并且这种对抗样本对于其他网络也具有对抗性。
框架(Framework)
本文基于三个数据集,部署多种神经网络进行研究。
Mnist dataset
a) 全连接网络+softmax (FC)
b) 基于自动编码器的分类器 (AE)
ImageNet dataset
a) AlexNet
~10M image samples from Youtube
a) QuocNet (unsupervised)
高层特征单元(Units of ϕ ( x ) \phi(x) ϕ(x))
ϕ ( x ) \phi(x) ϕ(x)是神经网络某一层的激活值(该层所有节点激活值组成的向量)。
靶子:一些研究试图说明某一层中每个节点都有semantic meaning,他们通过寻找使得某个特征(某个节点激活值)最大化的样本来说明。即:
x ′ = arg max x ∈ I ⟨ ϕ ( x ) , e i ⟩ x'=\arg\max \limits_{x\in \mathcal{I}}\langle \phi(x),e_i\rangle x′=argx∈Imax⟨ϕ(x),ei⟩ I \mathcal{I} I与训练集不相交(我觉得可以简单理解为测试集)。
驳论:本文提出某一层中所有节点的任意线性组合也可以得到相似的semantic meaning,即:
x ′ = arg max x ∈ I ⟨ ϕ ( x ) , v ⟩ x'=\arg\max \limits_{x\in \mathcal{I}}\langle \phi(x),v\rangle x′