The FCDD Method - Explainable One-Class Classification for Images

3.1 Explainable One-Class Classification for Images

3.1.1 The FCDD Method

FCDD is the utilization of a fully convolutional network in conjunction with Deep SVDD (Section 2.2.1) and HSC (Section 2.3.5), so that the mapped images are themselves an image corresponding to a downsampled anomaly heatmap. The pixels in this heatmap that are far from the center correspond to anomalous regions in the input image. FCDD does this by only using convolutional and pooling layers, thereby limiting the receptive field of each output pixel.

Fully Convolutional Architecture FCDD uses a fully convolutional network (FCN) [342, 398]φ_ω :R^c×h×w →R^u×v that maps an image to a matrix of features, using alternating convolutional and pooling layers only, and does not contain any fully connected layers. In this context, pooling can be seen as a special kind of convolution with fixed parameters.

Figure 3.1: Visualization of a 3×3 convolution followed by a 3×3 transposed convolution with a Gaussian kernel, both using a stride of 2.

A defining property of a convolu-tional layer is that each pixel of its out-put only depends on a small region of its input, known as the output pixel’s receptive field. Since the output of a convolution is produced by moving a fil-ter over the input image, each output pixel has the same relative position as its associated receptive field in the input.

For instance, the lower-left corner of the output representation has a corresponding receptive field in the lower-left corner of the input image, etc. (see Figure 3.1). The output of several stacked convolutions also has receptive fields of limited size and consistent relative position, though their size grows with the number of layers. Due to this, FCNs incorporate the assumption of spatial coherence.

FCDD Objective Let φω : R^c×h×w → R^u×v be a FCN with network weights ω. Moreover, define A_ω(x) := (^pφ_ω(x)²+ 1−1), that is A_ω(x) is the pseudo-Huber loss (see also Section 2.3.5) applied to the FCN output matrix φ_ω(x), where all operations are applied element-wise. Given nunlabeled images x1, . . . ,xn∈ X with X ⊆R^c×h×w andm labeled images (˜x₁,y˜₁), . . . ,(˜x_m,y˜_m)∈ X × Y withY ={±1}, where again ˜y= +1 denotes known normal images and ˜y=−1 known anomalies, respectively, we define the FCDD objective as:

minω hyperparameter η > 0 again controls the balance between the labeled and the unlabeled term (see also (2.13) in Section 2.3.2).

We omit the usual center c in the FCDD objective (3.1), since we always train FCDD using (true, auxiliary, or artificial) negative examples, which prevents a hypersphere collapse (see Sections 2.2.2 and 2.2.3). In our implementation, we include and optimize a bias term in the last layer of our networks that corresponds to c. As discussed in Section 2.3.1, labeled anomalous samples can be a collection of auxiliary images which are different from the collection of normal images (Outlier Exposure), for example one of the many large collections of images that are freely available like 80 Million Tiny Images [548] or ImageNet [133]. When one has access to ground-truth anomalies, that is, images that are representative of anomalies that will be seen at testing time, we find that even using a few examples as the corpus of labeled anomalies performs well. Furthermore, in the absence of any sort of known anomalies, one can artificially generate synthetic anomalies, which we find to be very effective for anomaly localization as well.

Objective (3.1) maximizes kA_ω(x)k₁ for anomalies and minimizes it for normal instances, thus we use kA_ω(x)k₁ as the anomaly score. Entries of A_ω(x) that contribute to kA_ω(x)k₁ correspond to regions of the input image xthat add to the anomaly score. The shape of these regions depends on the receptive field of the FCN.

We include a sensitivity analysis on the size of the receptive field in Appendix A.3, where we find that detection performance is not much affected within a reasonable range of sizes. Note thatA_ω(x) has spatial dimensions u×v and is smaller than the original image dimensions h×w. One could useA_ω(x) directly as a low-resolution heatmap of the image, however it is often desirable to have full-resolution heatmaps.

Because we usually lack ground-truth anomaly maps in an anomaly detection setting during training, it is not possible to train an FCN in a supervised way to upsample the low-resolution heatmap Aω(x) (e.g., as in [398]). For this reason we introduce an upsampling methodology based on the properties of receptive fields.

Figure 3.2:FCDD uses a fully convolutional network φω with a deep one-class classification objective to produce an anomaly heatmapAof an inputx. The lower-resolution heatmap Acan be upsampled to a full-resolution anomaly heatmapA⁰ via a transposed Gaussian convolution.

Algorithm 1 Gaussian Receptive Field Upsampling Input:A∈R^u×v (low-res anomaly heatmap) Output:A⁰∈R^h×w (full-res anomaly heatmap) Define: [G₂(µ, σ)]x,y := _2πσ¹2exp

−^(x−µ¹⁾²_2σ^+(y−µ₂ ²⁾²

A⁰←0

for alloutput pixelsain Ado f ←receptive field ofa c←center of fieldf A⁰ ←A⁰+a·G2(c, σ) end for

returnA⁰ Heatmap Upsampling Since

we generally do not have ac-cess to ground-truth pixel an-notations in anomaly detection during training, we cannot learn a deconvolutional type of struc-ture for upsampling. Instead, we suggest a principled way to upsample the lower resolution anomaly heatmap. For every output pixel inA_ω(x) there is a unique input pixel which lies at the center of its receptive field.

It has been observed before that the influence of the receptive field for an output pixel decays in a Gaussian manner as one moves away from the center of the recep-tive field [345]. We use this fact to upsampleA_ω(x) by using a strided transposed convolution with a fixed Gaussian kernel (see Figure 3.1 right side). This operation and procedure is described in Algorithm 1, which simply corresponds to a strided transposed convolution. The kernel size is set to the receptive field range of FCDD and the stride to the cumulative stride of FCDD. The variance of the Gaussian kernel can be picked empirically (see Appendix A.4 for details). In Figure 3.2, we give a complete overview of the FCDD method and the process of generating full-resolution anomaly heatmaps.

Im Dokument Deep One-Class Learning A Deep Learning Approach to Anomaly Detection (Seite 68-71)