Towards Evaluating the Robustness of Neural Networks论文解读集合

最新推荐文章于 2024-09-11 08:42:01 发布

原创最新推荐文章于 2024-09-11 08:42:01 发布 · 761 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#对抗神经网络

机器学习专栏收录该内容

34 篇文章

订阅专栏

这篇博客是对论文'Towards Evaluating the Robustness of Neural Networks'的解读，作者探讨了如何将对抗样本的生成转化为凸优化问题，以及构建损失函数的多种方式。论文还提出了评估攻击效果的两个标准，并讨论了在黑盒攻击中的应用。此外，Q&A环节解答了关于攻击成本、选择像素修改、模型转移能力等相关问题。

比较好一点的论文解读：https://lifengjun.xin/2020/03/14/%E3%80%90%E8%AE%BA%E6%96%87%E7%AC%94%E8%AE%B0%E3%80%91-Towards-Evaluating-the-Robustness-of-Neural-Networks/

下面这个视频是作者本人的oral presentation ，讲的是论文思路，很不错。

https://www.bilibili.com/video/av884481653/

首先是将生成对抗样本的过程formulate为一个凸优化的式子并尝试去解

然后将C(x+^) = t 转化为分类为t的的损失函数最小

然后是构建损失函数的过程，如何构建损失函数呢？

文章提出了7种不同的构造方式。

接着是如何评价论文提出的攻击呢？

1、和之前已有的攻击进行攻击效果对比

2、攻击现有的Defense模型（Breaking current Defenses）论文证明他们的攻击模型成功Break了去年2016提出的蒸馏防御模型，并且distortion很小

然后作者提出当大家新构建了一种防守方式如何评估defense是否有效

1、release source code

2、evluate against the strongest attacks as a baseline

Q&A:

1、Can you comment on the cost slash overhead of creating the attacks that we talked about

(谈谈降低攻击成本的技术)

The approach that I take is relatively slow it taks maybe 30seconds in one minute to generate an attack.

there are approaching system that generate attacks more rapidly but the results are four or about five times worse depending or depending on which maybe you can get around to exports. It depends on what your goal is .如果是线上系统的话才比较考虑时间问题

2、How do you choose the pixels that changes the values?

There are a bunch of ways you can call the image of adjust cloaks we look at three different distance metrics

one of them i show you is L0.

In this way,letting you modify every pixel and slowly shrink this set of modified pixels until you end up modifying very few pixels

and finally the thing that only we can do is we can modify under a different distance metric where we instead say that we want to not change the fewest number of pixels but make a small change to each individual pixel

3、Your presentation assumed white box,could you comment on the strength of your attack in black box?

So basically to use the attacks where you dont assume the model what you do is you use the approach that you've actually done and you train your own model .

so I assume the defender has their model when I am going to train my own model in sort of a similar way that they use the train bears ,and hope that the two models and up learning sort of the same decision boundary

.and then what i'm going to do is on the agenda attack on my end model and then it turns out there is a property close transfer ability that the attack on my model is effective on your model and so if i dont actually know what your model is then i can use this approach to still attack your model even if i dont have actual access to parameters

4、Have you compared the main perturbation of the inputs that you produce in black box settings and that transfer to other models? 你有没有比较过你在黑匣子设置中产生的输入的主要扰动，以及转移到其他模型的扰动？

So do these approaches work to transfer so you can actually with the approach that i've outlined control the contents of the attacks and use it to generate attacks that do transfer. in order to make attacks that transfer effectively you do need to increase with the sorption (distortion?) and how much do you think the distortion depends on the defense and depends on how close you can actually get to the other model so if you assume that i know nothing about the other model now this distortion needs to maybe be increased by a factor of two or three but if you assume that i have so i can sort of Oracle queries against the other model then you can reduce that by fairly large amount .(如果不了解模型，那么需要增加2-3倍的扰动)

5、I notice one thing about your technique is that It will always eventually perturb the image to a point where something will be changed so there way that you can assess a matric by which there's the amount to which we need to return the image so that we can actually form a model ?

我注意到你的技术有一点是，它最终总是会扰乱图像，直到某个地方发生变化。这样你就可以计算出一个矩阵，根据这个矩阵，我们得到需要返回图像的数量，这样我们就可以真正地形成一个模型

yes so there are two types of model there are attacks that always will succeed by the slightest eventually just turning the indegenous actually recgonizes the other class and just youre trying to minimize the total distortion and there are another type of attacks that say you can only distort by most this much and let's do as best as you can and so my approach take's the formers approach and says that all time minimizing total distortion and if you want you can just set a threshhold at any given point and say if i've done more than this to just report failture but since the objective is to minimize the total amount of distortion it very rarely fails.

是的，所以有两种类型的攻击，总有一点点会成功，最终只是把不同质的攻击转化成另一种类型的攻击，你只是想尽量减少总的扭曲，还有一种攻击说你只能扭曲这么多，让我们尽你所能做到最好所以我的方法是formers方法，它说所有的时间都最小化总失真，如果你想的话，你可以在任何给定的点设置一个阈值，然后说如果我做了更多的事情来报告失败，但是由于目标是最小化总的失真，所以很少失败。