【总结】Robust physical-world attacks on deep learning models

本文研究了物理世界中对抗样本的有效性,通过在交通标志上实施对抗性干扰,验证了这些干扰可以误导分类器,即使在不同的距离和角度下依然有效。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、相关工作:

A. 对抗样本:
Goodfellow: the fast gradient sign (FGS)
这里写图片描述
iterative optimization-based algorithm
这里写图片描述
universal perturbations: 生成perturbation vector,但是不通用

共同点:
1、都假设有digital access to the input vectorrs of the DNN, 但在深度学习模型中这个假设太强了
2、优化扰动到最小,使得人眼观测不到,但是在物理世界里,来自相机环境等各种因素会破坏这种扰动

B. Physical-Realizability of Adversarial Perturbations
Sharif::攻击了人脸识别系统 by 在眼镜上print 对抗干扰
Kurakin:证明通过只能收即的项集也能增加对抗样本达到误分类
Athalye and Sutskever:改进了Kurakin的,使得对抗样本对一系列二维转动鲁棒,但都没有在物理物体上实现过
Lu:在道路交通标志上进行了实验,他们得出结论对抗样本在实际中无效。而我们证明了也有效

二、Problem statement

用了US和German两国的交通指示标志作为训练集
分别用LISA-CNN和GTSRB*CNN
专注在不知道训练及来源的攻击
关注在白盒攻击
证明了物理上对标志进行改动也能迷惑分类器
流程:
1、得到干净的各个角度各个距离的目标道路指示标志
2、预处理作为RP2的输入,并生成对抗样本
3、在物理上作出结果干扰(poster-printing attacks或者贴贴纸)
4、应用到目标道路标志上

三、鲁棒的物理扰动:

A. RP2

用了optimization-based approach
为了生成目标对抗样本,objective function 为:

argminδλδpJ(fθ(x+δ),y)argminδλ‖δ‖p−J(fθ(x+δ),y∗)

在图像领域,对于二维扰动向量δ=[δ(1,1),...,δ(H,W)]δ=[δ(1,1),...,δ(H,W)], the lplp norm for p>0 为:
δp=(i,j|δ(i,j)|p)1p‖δ‖p=(∑i,j|δ(i,j)|p)1p

一般,l0l0 norm 是扰动像素的总数,ll∞ norm 是最大扰动的量级
J()J(·,·)是损失函数

计算perturbation masks:
定义mask的形状是涂鸦或者抽象画(为了让扰动无意义)
mask是矩阵MXMX,无添加扰动的地方为0,有扰动的地方为1
优化 spatially-constrained perturbations:
用Adam optimizer来优化,优化后的objective
function为:

argminδλMxδp+NPS(Mxδ)+1ki=1kJ(fθ(xi+Mxδ),y)argminδλ‖Mx·δ‖p+NPS(Mx·δ)+1k∑i=1kJ(fθ(xi+Mx·δ),y∗)

为了增加扰动的可打印性,增加了一项,用了Sharif提出的方法,计算对抗扰动的NPS值:
NPS(δ)=p^δpP|p^p|NPS(δ)=∑p^∈δ∏p′∈P|p^−p′|
物理对抗样本类型:

subtle poster
camouflage sticker

四、评估:

evaluation components:
环境条件
空间限制
制造中产生的error
分辨率改变
看不到的物理限制
具体指标:距离、角度、分辨率

evaluation methodology:
step 1:lab tests( stationary tests)
step 2:field tests (drive-by tests)

stationary tests

在不同距离和角度下静止测试:

cC{fθ(A(c)d,g)=yfθ(cd,g)=y}cC{fθ(cd,g)=y}∑c∈C{fθ(A(c)d,g)=y∗∩fθ(cd,g)=y}∑c∈C{fθ(cd,g)=y}
drive-by tests

vV{fθ(A(vk))=yfθ(vk)=y}vV{fθ(vk)=y}∑v∈V{fθ(A(vk))=y∗∩fθ(vk)=y}∑v∈V{fθ(vk)=y}

讨论:

全覆盖下效果最好,但容易被察觉,最后采用涂鸦方式或贴图方式
白色打印后颜色不接近,最后用相机纸打印
在大角度和大距离下,白色容易被周围颜色影响,最后加大白色贴纸
future work:
解决容易被看到的问题
黑盒攻击是否有效
找到防御方法

Adversarial attacks are a major concern in the field of deep learning as they can cause misclassification and undermine the reliability of deep learning models. In recent years, researchers have proposed several techniques to improve the robustness of deep learning models against adversarial attacks. Here are some of the approaches: 1. Adversarial training: This involves generating adversarial examples during training and using them to augment the training data. This helps the model learn to be more robust to adversarial attacks. 2. Defensive distillation: This is a technique that involves training a second model to mimic the behavior of the original model. The second model is then used to make predictions, making it more difficult for an adversary to generate adversarial examples that can fool the model. 3. Feature squeezing: This involves converting the input data to a lower dimensionality, making it more difficult for an adversary to generate adversarial examples. 4. Gradient masking: This involves adding noise to the gradients during training to prevent an adversary from estimating the gradients accurately and generating adversarial examples. 5. Adversarial detection: This involves training a separate model to detect adversarial examples and reject them before they can be used to fool the main model. 6. Model compression: This involves reducing the complexity of the model, making it more difficult for an adversary to generate adversarial examples. In conclusion, improving the robustness of deep learning models against adversarial attacks is an active area of research. Researchers are continually developing new techniques and approaches to make deep learning models more resistant to adversarial attacks.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值