机器学习24：对抗攻击（Adversarial Attack）（下）

原创

已于 2025-12-14 16:21:54 修改 · 713 阅读

27 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #人工智能

于 2025-11-30 17:47:15 首次发布

摘要

本周课程深入探讨了对抗攻击（Adversarial Attack）的进阶内容，系统介绍了白盒攻击与黑盒攻击的区别与实现方式，分析了黑盒攻击中代理模型与集成攻击的有效性。课程进一步扩展了攻击的多样性，包括单像素攻击、通用对抗攻击以及在语音、自然语言处理与真实物理世界中的应用实例。此外，还介绍了对抗性重编程与后门攻击等特殊攻击形式。在防御方面，课程详细讲解了被动防御（如输入预处理与随机化滤波）与主动防御（如对抗训练）的原理与局限性，全面揭示了对抗攻击与防御在现实部署中的挑战与应对策略。

Abstract

This week's lesson delves into advanced topics of adversarial attacks, systematically explaining the distinctions and implementations between white-box and black-box attacks, and analyzing the effectiveness of proxy models and ensemble attacks in black-box settings. The course further expands on the diversity of attacks, including one-pixel attacks, universal adversarial attacks, and their applications in speech, natural language processing, and real-world physical scenarios. Additionally, specialized attack forms such as adversarial reprogramming and backdoor attacks are introduced. On the defense side, the lesson details passive defense methods (e.g., input preprocessing and randomized filtering) and active defense strategies (e.g., adversarial training), along with their limitations, comprehensively revealing the challenges and countermeasures in real-world deployment of adversarial attacks and defenses.

接上次学习我们了解了对抗攻击的基本概念与必要性，以及攻击的类型，包括无目标攻击（Non-targeted）和有目标攻击（Targeted）。下面接着上次继续学习。

一．白盒攻击与黑盒攻击

前面学习的其实都是白盒攻击（White Box Attack），也就是当我们要计算梯度时，是在知道模型参数下计算的。对于白盒攻击既然是要知道模型参数才能够进行攻击，那对于线上模型不知其参数情况或者不要将自己的模型公开则模型是否安全呢？

这就要说到黑盒攻击（Black Box Attack），其是指在不知道模型参数下的攻击。黑盒攻击在不知道目标模型的情况下若知道目标模型的训练资料，则可以去训练一个代理网络

最低0.47元/天解锁文章