CS231n学习笔记--16. Adversarial Examples and Adversarial Training

Overview

What are adversarial examples?
Why do they happen?
How can they be used to compromise machine learning
systems?
What are the defenses?
How to use adversarial examples to improve machine
learning, even when there is no adversary

Fool neural nets from Panda to Gibbon

Turning Objects into “Airplanes”

Attacking a Linear Model

黄框内的数字被神经网络误识别！

以下类型的分类器也存在这样的问题：

原因猜测：Adversarial Examples from Overfitting

Adversarial Examples from Excessive Linearity：

Modern deep nets are very piecewise linear

Small inter-class distances

High-Dimensional Linear Models

Linear Models of ImageNet

Cross-model, cross-dataset generalization

不同的模型使用同样的数据产生的权重几乎相同！

Cross-technique transferability

Transferability Attack

Cross-Training Data Transferability

Adversarial Examples in the Human Brain

Practical Attacks

Fool real classifiers trained by remotely hosted API(MetaMind, Amazon, Google)
Fool malware detector networks
Display adversarial examples in the physical world and fool machine learning systems that perceive them through a camera

Failed defenses

以下方法均解决不了：

Training on Adversarial Examples

Adversarial Training of other Models

Linear models: SVM / linear regression cannot learn a step function, so adversarial training is less useful, very similar to weight decay
k-NN: adversarial training is prone to overfitting.
Takeway: neural nets can actually become more secure than other models. Adversarially trained neural nets have the best empirical success rate on adversarial examples of any machine learning model.

Adversarial Training

Virtual Adversarial Training

Text Classification with VAT

Attacking is easy
Defending is difficult
Adversarial training provides regularization and semi-supervised learning
The out-of-domain input problem is a bottleneck for model-based optimization generally