• Keine Ergebnisse gefunden

This thesis investigates the following hypothesis:

Deep One-Class Learning, a deep learning approach to anomaly detection that is based on the one-class classification paradigm, by learning (or transferring) data representations via one-class learning objectives, can significantly improve anomaly detection performance—especially on semantic detection tasks.

Based on the results we present in this thesis, we will see that we can affirm this hypothesis. We summarize the main contributions and findings in the following.

1.2.1 Contributions and Findings

The main contributions and findings of this thesis are the following:

We introduce Deep SVDD, one of the first deep one-class classification methods for unsupervised anomaly detection. The objective of Deep SVDD is to learn a neural network transformation that minimizes the volume of a data-enclosing hypersphere in feature space. Through this, normal data points get closely mapped to the hypersphere center, whereas anomalies are mapped away. We further identify a key challenge of deep one-class classification, namely the regularization against a trivial, constant solution, which we theoretically an-alyze for Deep SVDD. We demonstrate the practical value of Deep SVDD experimentally.

We generalize Deep SVDD to the semi-supervised anomaly detection setting, where we introduce the Deep SAD method as well as Hypersphere Classification.

We experimentally demonstrate the value of including two types of negative examples with these methods: (i) few labeled ground-truth anomalies, and (ii) many weakly-labeled auxiliary anomalies, which we both find can significantly improve anomaly detection performance.

We introduce an explainable deep one-class classification variant for anomaly detection on images, called FCDD, which uses a fully convolutional architecture to incorporate the property of spatial coherence important in computer vision.

For FCDD, the mapped images directly correspond to an anomaly heatmap.

We evaluate the method experimentally and find that FCDD yields competitive detection performance while providing transparent explanations. In an appli-cation on detecting defects in manufacturing, FCDD achieves state-of-the-art anomaly segmentation results.

We introduce a multi-context one-class classification variant for anomaly detec-tion on text, called CVDD, which uses a multi-head self-attendetec-tion mechanism to learn contextual sentence embeddings based on pre-trained embeddings of words. The objective of CVDD is to learn these embeddings together with a set of context vectors, such that these are closely aligned, while regularizing the context vectors to be diverse. In experiments, we find that this enables CVDD to capture multiple distinct themes present in an unlabeled text corpus, which allows to perform contextual anomaly detection.

We present a unifying view on deep and “shallow” anomaly detection, where we distinguish the one-class classification approach from reconstruction-based methods and methods based on density estimation or generative modeling. For each of the three main approaches, we establish connections between their deep and “shallow” variants based on common underlying principles. This view contributes to a systematic understanding of existing methods and shows promising paths for future research. In a comparative evaluation, we find that the detection strategies of the various approaches are very diverse and show, using techniques for explaining anomalies, that anomaly detection models are also prone to the “Clever Hans” effect, which occurs when a model correctly detects an anomaly, but based on the “wrong” features.

Overall, the contributions and findings above demonstrate that deep one-class learning is a useful approach to anomaly detection.

1.2.2 List of Publications

The primary contributions and findings of this thesis are based on the following peer-reviewed publications:

L. Ruff*, R. A. Vandermeulen*, N. Görnitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft. Deep One-Class Classification. InProceedings of the 35th International Conference on Machine Learning, volume 80, pages 4390–4399, 2018.

L. Ruff, Y. Zemlyanskiy, R. A. Vandermeulen, T. Schnake, M. Kloft. Self-Attentive, Multi-Context One-Class Classification for Unsupervised Anomaly Detection on Text.

InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4061–4071, 2019.

L. Ruff, R. A. Vandermeulen, N. Görnitz, A. Binder, E. Müller, K.-R. Müller, M. Kloft. Deep Semi-Supervised Anomaly Detection. InInternational Conference on Learning Representations, 2020.

P. Liznerski*, L. Ruff*, R. A. Vandermeulen*, B. J. Franks, M. Kloft, and K.-R. Müller. Explainable Deep One-Class Classification. InInternational Conference on Learning Representations, 2021.

L. Ruff, J. R. Kauffmann, R. A. Vandermeulen, G. Montavon, W. Samek, M. Kloft, T. G. Dietterich, and K.-R. Müller. A Unifying Review of Deep and Shallow Anomaly Detection.Proceedings of the IEEE, 109(5):756–795, 2021.

The thesis also includes additional contents from the following papers:

L. Deecke, R. A. Vandermeulen, L. Ruff, S. Mandt, and M. Kloft. Image Anomaly Detection with Generative Adversarial Networks. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 3–17, 2018.

L. Ruff, R. A. Vandermeulen, N. Görnitz, A. Binder, E. Müller, and M. Kloft. Deep Support Vector Data Description for Unsupervised and Semi-Supervised Anomaly Detection. InICML 2019 Workshop on Uncertainty & Robustness in Deep Learning, 2019.

P. Chong, L. Ruff, M. Kloft, and A. Binder. Simple and Effective Prevention of Mode Collapse in Deep One-Class Classification. InInternational Joint Conference on Neural Networks, pages 1–9, 2020.

J. R. Kauffmann, L. Ruff, G. Montavon, and K.-R. Müller. The Clever Hans Effect in Anomaly Detection.Preprint (under review), 2020.

L. Ruff, R. A. Vandermeulen, B. J. Franks, K.-R. Müller, and M. Kloft. Rethinking Assumptions in Deep Anomaly Detection. InICML 2021 Workshop on Uncertainty

& Robustness in Deep Learning, 2021.

We note that all co-authors of these works have agreed to borrowing ideas, figures, and results from the works above for this thesis.

*Equal contribution

1.2.3 Organization of the Thesis This thesis comprises three main chapters:

Chapter 2 (One-Class Learning) In this chapter, we introduce a deep learning approach to one-class classification. We first discuss the general one-class classification objective and briefly review established shallow one-class classification methods. We then introduce the Deep SVDD method, demonstrate theoretical properties of Deep SVDD, and evaluate the method experimentally. Afterwards, we introduce the Deep SAD method and Hypersphere Classification, which constitute generalizations of Deep SVDD to the semi-supervised setting. We present an experimental evaluation on the usefulness of having few labeled ground-truth anomalies and many weakly-labeled auxiliary anomalies available.

Chapter 3 (Applications to Computer Vision and NLP) In this chapter, we intro-duce two deep one-class classification variants that take advantage of their specific domains. We first introduce the FCDD method for image data, which utilizes fully convolutional networks for explainable deep one-class classification. In an exper-imental evaluation, we show that FCDD performs competitively while providing transparent explanations and yields state-of-the-art results on a defect detecting application in manufacturing. We then introduce the CVDD method for text data, which uses a self-attention mechanism to learn a multi-context one-class classification model. We evaluate CVDD experimentally on detecting novel topics and anomalous movie reviews.

Chapter 4 (A Unifying View of Anomaly Detection) In this chapter, we present a unifying view on deep and shallow anomaly detection methods. We first discuss methods based on density estimation and generative modeling followed by recon-struction methods, where we establish connections between their respective deep and shallow variants. We then present the unifying view, which also includes the one-class classification approach. Finally, we close this chapter with a comparative evaluation that includes canonical methods from the three main approaches (one-class classification, density estimation/generative modeling, reconstruction) which employ different feature representations (raw input, kernel, and neural network) respectively.

Utilizing techniques for explaining anomalies, we demonstrate that the “Clever Hans”

effect also occurs in anomaly detection.

We conclude and discuss limits of the thesis, and provide detailed paths for future research in Chapter 5. But before we turn to the main chapters of the thesis, we complete this introduction and overview with a formal introduction to the anomaly detection problem.