对抗机器学习基础（论文学习笔记二）

最新推荐文章于 2024-03-08 11:57:35 发布

南庆诗仙

最新推荐文章于 2024-03-08 11:57:35 发布

阅读量1k

点赞数 3

CC 4.0 BY-SA版权

分类专栏：对抗学习文章标签：神经网络机器学习深度学习人工智能网络

本文链接：https://blog.youkuaiyun.com/weixin_43352957/article/details/105015895

本文探讨了深度神经网络的两个显著特性：高层单位的等价性和不稳定性。研究发现，高层网络单元之间的线性组合可以捕获语义信息，而非单独单元。此外，对输入图像微小扰动可导致网络误分类，揭示了神经网络的对抗性。通过实验表明，这种对抗性在不同模型和训练集间具有普遍性，并提出了改进网络稳定性的分析方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Foundation系列共有三个部分，是对《Evasion attacks against machine learning at test time》《Intriguing properties of neural networks》《Explaining and harnessing adversarial examples》三篇文章的阅读笔记整理。本文介绍《Intriguing properties of neural networks》。

文章目录

摘要（Abstract）

Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties.
First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains the semantic information in the high layers of neural networks.
Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent. We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

简而言之，本文提出深度神经网络模型存在学习到uninterpertable和counter-intuitive的特征表达的缺陷，并从中取出两个典型属性加以分析。一是高层网络节点个体不包含语义信息，语义信息包含在激活空间中；二是对输入数据进行微小变动会导致神经网络分类错误，并且这种对抗样本对于其他网络也具有对抗性。

框架（Framework）

本文基于三个数据集，部署多种神经网络进行研究。

Mnist dataset

a) 全连接网络+softmax (FC)
b) 基于自动编码器的分类器 (AE)

ImageNet dataset

a) AlexNet

~10M image samples from Youtube

a) QuocNet (unsupervised)

高层特征单元（Units of $\phi(x)$ ）

$\phi(x)$ 是神经网络某一层的激活值（该层所有节点激活值组成的向量）。
靶子：一些研究试图说明某一层中每个节点都有semantic meaning，他们通过寻找使得某个特征（某个节点激活值）最大化的样本来说明。即：
$x'=\arg\max \limits_{x\in \mathcal{I}}\langle \phi(x),e_i\rangle$ $\mathcal{I}$ 与训练集不相交（我觉得可以简单理解为测试集）。
驳论：本文提出某一层中所有节点的任意线性组合也可以得到相似的semantic meaning，即：
$x'=\arg\max \limits_{x\in \mathcal{I}}\langle \phi(x),v\rangle$