Overview
- What are adversarial examples?
- Why do they happen?
- How can they be used to compromise machine learning
systems? - What are the defenses?
- How to use adversarial examples to improve machine
learning, even when there is no adversary
1. Adversarial Examples
Fool neural nets from Panda to Gibbon
Turning Objects into “Airplanes”
Attacking a Linear Model
黄框内的数字被神经网络误识别!
以下类型的分类器也存在这样的问题:
- Linear models:Logistic regression,Softmax regression,SVMs
- Decision trees
- Nearest neighbors
2. Reason
原因猜测:Adversarial Examples from Overfitting
Adversarial Examples from Excessive Linearity:
Modern deep nets are very piecewise linear
Small inter-class distances
High-Dimensional Linear Models
Linear Models of ImageNet
3. compromise machine learning systems
Cross-model, cross-dataset generalization
不同的模型使用同样的数据产生的权重几乎相同!
Cross-technique transferability
Transferability Attack
Cross-Training Data Transferability
Adversarial Examples in the Human Brain
Practical Attacks
Fool real classifiers trained by remotely hosted API(MetaMind, Amazon, Google)
Fool malware detector networks
Display adversarial examples in the physical world and fool machine learning systems that perceive them through a camera
Failed defenses
以下方法均解决不了:
4. Use adversarial examples
Training on Adversarial Examples
Adversarial Training of other Models
Linear models: SVM / linear regression cannot learn a step function, so adversarial training is less useful, very similar to weight decay
k-NN: adversarial training is prone to overfitting.
Takeway: neural nets can actually become more secure than other models. Adversarially trained neural nets have the best empirical success rate on adversarial examples of any machine learning model.
Adversarial Training
Virtual Adversarial Training
Text Classification with VAT
Conclusion
Attacking is easy
Defending is difficult
Adversarial training provides regularization and semi-supervised learning
The out-of-domain input problem is a bottleneck for model-based optimization generally