• Keine Ergebnisse gefunden

Neural Architecture Search on Low-Dimensional Data

3.3 Neural Networks

4.1.1 Neural Architecture Search on Low-Dimensional Data

Designing CNNs for lower-dimensional 2D image data has been addressed extensively.

Also, the reduction to 1D is straightforward as one dimension can be removed from all mathematical operations. Moving from 2D to 1D also leads to lower computational requirements and fewer trainable model parameters. Thus, 1D and 2D CNN design for medical learning problems is generally not problematic as 2D approaches can be readily adapted from the natural image domain. However, the property of being computationally cheap opens up new opportunities for CNN model design that could also benefit more challenging, higher-dimensional problems.

Manual feature engineering has been largely replaced by deep learning approaches for numerous medical, image-based learning problems over the last few years. As most CNN architectures are designed manually, there has been a shift from handcrafting features to handcrafting architectures. CNNs themselves are often difficult to design, and it is unclear what kind of architecture is suitable for which learning problem. Thus, the next intuitive step is to move from handcrafting architectures to learning architectures.

In Chapter 3, we introduced the bilevel optimization problem given in Equation 3.4.

Formally, the design choices for the architecture can be considered a subsethAMhM of hyperparameters that need to be selected. As described in the previous chapter, the machine learning engineer usually chooses hyperparametershAM.

Approaches for learning hAM are often termed neural architecture search (NAS).

Typical NAS approaches include grid search, genetic algorithms, bayesian optimization, or random search [232]. Recently, reinforcement learning (RL) methods have been proposed where a recurrent controller is trained to predict an architecture’s structure by maximizing the architecture’s expected validation performance as a reward [602]. This approach has been successful for 2D image classification problems, however, the amount of computing resources required are often enormous [303, 603]. Early NAS approaches required thousands of GPU hours for learning an architecture with 2D image data [377].

The concept of NAS is very promising for the medical image domain as there is a vast amount of imaging modalities and learning problems that require architecture design.

However, the time and resource requirements of NAS are problematic for medical image data, which is often 3D or 4D in nature [291].

We propose an efficient NAS approach for segmentation with multi-dimensional medical image data. To overcome long architecture search times, we perform the search on lower-dimensional data, which leads to shorter search times. Then, we transfer the learned architecture to the higher target dimension. So far, NAS approaches have

4.1 1D, 2D, 3D and 4D CNNs

2×ModuleBlock 4nc/4 2×ModuleBlock 2nc/2

1×ModuleBlock nc/1 1×ModuleBlock 4nc/4 1×ModuleBlock 2nc/2 1×ModuleBlock nc/1

+ +

Input Image + Label Map

ResNet Block

Conv3 Conv3

+

2×ModuleBlock 8nc/8

Conv ConvDown ConvDown ConvDown ConvUp ConvUp ConvUp bbb

Module Block

Fig. 4.1: The baseline segmentation architecture for 1D and 2D image data, shown for the example of retinal layer segmentation. We employ ResNet blocks as the main processing units.ncis the base number of feature maps that is doubled every time we downsample. /2 indicates that the original images’ spatial dimensions have been reduced with a stride ofr= 2.

largely been explored in the natural image domain, not for medical learning problems.

Therefore, we propose a strategy for typical U-Net-like [409] architectures, where we learn the submodules within the architecture at each level. Given the relatively cheap computational requirements for the processing of lower-dimensional data, we investigate the approach for 1D and 2D image data and the task of medical image segmentation.

Baseline Model. As a baseline we use a U-Net-like model, as described in Sec-tion 3.3.6. The model takes a 1D or a 2D image as its input and predicts a segmentaSec-tion map with the same size as the input. For the long-range connections, we use summation, following [575]. Inside the network, we use ResNet [193] blocks, which we introduced in Section 3.3.2. Convolutions use a kernel size of3, following Simonyan et al. [450], and extensions from 1D to 2D are performed by extending all kernels isotropically by an additional dimension. At each level, we employ one or several ResNet blocks. Thus, the architecture shown in Figure 4.1 is a slightly modified U-Net with ResNet blocks and summation for the long-range connections.

Neural Architecture Search.We follow the general framework of neural architecture with a recurrent LSTM controller and a reinforcement learning-based approach for learning new architectures [602]. The general idea of this concept is shown in Figure 4.2.

The controller is an RNN-based architecture that provides probabilistic predictions of a CNN architecture in a sequential way. The controller’s parameters are denoted aswcM. Given the controller’s predictions, a single architecture is sampled with probabilitypa. This child architecture is constructed, and its parameterswmM are trained for the task to be solved. After convergence, the child CNN’s performance is determined by a reward metricRon a validation set. This metric is used to scale the policy gradient ofpa, which can then be used for updating the controller’s parameterswcM.

In detail, the goal of training is to maximize the controller’s expected reward which is defined as

JP(wMc ) = EP(a1:T;wc

M)[R] (4.1)

wherea1:T is a set of actions that can be taken by the controller. The reward Ris

LSTM Controller Module

Child CNN

Train child

Compute reward metric Sample an Architecture

with probabilityp

Gradient ofp scaled by reward

Fig. 4.2: The general concept of neural architecture search with a recurrent LSTM controller and reinforcement learning is shown.

not differentiable with respect to wMc , therefore a policy gradient method needs to be used. The REINFORCE method proposed by Williams [538] provides an empirical approximation of the gradient:

wc

MJP(wMc )≈ 1

mwclogP(at|a(t−1):1;wcM)(Rkbe) (4.2) Here,beis an exponential moving average of previous architecture’s reward metrics, which is used for reducing variance during training. Nsampleis the number of architec-tures that are sampled by the controller and trained to convergence afterward. T is the number of possible actions and thus represents the number of hyperparameter choices available to the controller.

While this concept has led to architectures that outperformed all handcrafted alterna-tives at that time [602], the procedure is very costly. For every controller update,Nsample CNN architectures need to be trained from scratch until convergence. To overcome this problem, efficient neural architecture search (ENAS) [377] has been proposed. The key idea behind the improvement is that child CNN architectures are not retrained from scratch at every iteration, but the trained weights are kept instead. Controller and child CNN are trained in an interleaved way where each is trained for one epoch while the other one’s weights remain fixed. This strategy reduces computational effort by a factor of1000and has also led to competitive architectures [377]. In terms of implementation, this method is more challenging as all possible model configurations need to be imple-mented simultaneously. Given the current action a, the architectures connections are rerouted, and individual operations are activated or deactivated. Throughout the entire search process, all possible architectures and their weights are maintained.

ENAS U-Net.Next, we propose ENAS U-Net, an adaptation of the ENAS framework [377] for image segmentation tasks with a U-Net. To keep computational effort bounded, we simplify the architecture search space by keeping the general U-Net structure fixed and only learning new module blocks, similar to the micro search space in ENAS.

The input/output and downsampling/upsampling convolutional layers also stay fixed.

Considering our baseline architecture in Figure 4.1, we now learn a module block that replaces the ResNet block. For the module block search space, we let the controller learn the properties of several cells, where each cell contains 2subcells. The cells’ output is the summation of the subcells’ output. For each subcell, the controller defines its input, which is the module input or another cell’s output, and its operation. Similar to ENAS, we allow five basic operations for the controller to choose from: convolutions

4.1 1D, 2D, 3D and 4D CNNs

Fig. 4.3: An example module block prediction by the current controller is shown. The controller sequentially predicts both the connectivity and operations within the module block (top). The connectivity can be visualized by a directed acyclic graph (bottom left). The final implementation of the module block at layerlis shown right.

with kernel size3or 5, average- and max-pooling with kernel size 3and the identity transform. We explore search scenarios with different numbers of cells to learn. Our controller setup for an example prediction of a module block is shown in Figure 4.3.

Overall, we carefully engineer our search space to be small such that search times remain within a reasonable bound.

Training Strategy. We perform simultaneous training of model and architecture parameters as follows. First, an initial U-Net architecture is sampled based on random initialization. This architecture is trained by gradient descent for one epoch using a training set. Then, the current model weights are fixed, and the controller is trained for one epoch using a dice score reward, computed from a reward training set. Training is performed by following the REINFORCE algorithm described above. Then, the controller weights are fixed once again, a new architecture is sampled from the controller, and normal model training is performed using the training data set. Thus, the training procedure follows an interleaved schedule where all parameters are reused in each epoch. After training forNeepochs, we fixed all weights and sampleNsamplecandidate architectures from the controller. Each architecture is evaluated on a validation set. The best-performing model is selected based on its validation performance. This model is trained from scratch with randomly initialized parameters, and finally, it is evaluated on a test set. In contrast, the baseline U-Net model is trained on the training set for Ne epochs. The hyperparameters for the baseline U-Net model are selected based on validation set performance. Finally, the model is evaluated on the test set. The training

set, reward training set, validation set and test set are disjoint.

We explore both searching for architectures on 1D data and 2D data. Thus, for both searches we obtain architecture hyperparametersh1DM and h2DM respectively. After the search, we evaluate the architecture found with a search on 1D data both as a 1D and as a 2D CNN architecture. We consider the machine learning models:

fM1D(hRNM ) :Rnd →Rnd (4.3) fM1D(h1DM ) :Rnd →Rnd (4.4) fM2D(hRNM ) :Rnw×nd →Rnw×nd (4.5) fM2D(h1DM ) :Rnw×nd →Rnw×nd (4.6) fM2D(h2DM ) :Rnw×nd →Rnw×nd (4.7) where hRNM are the architecture hyperparameters of the baseline model with Res-Blocks. As described above, the 1D CNN found by ENAS U-Net is extended to 2D by isotropically extended all mathematical operations of the CNN to 2D. Architectures found through NAS are always trained from scratch.

Summary. Lower-dimensional CNNs, in particular, 1D CNNs come with the ad-vantageous property of being computationally cheap. This makes architecture design comfortable as existing 2D architecture from the natural image domain can be easily adopted. At the same time, the low computational requirements open up a new path for architecture design by automatically learning the architecture’s structure with an additional optimization level. Typically, neural architecture search is severely limited by its immense computational requirements of up to thousands of GPU hours. We explore a neural architecture search strategy that is particularly efficient by searching on low-dimensional data before extending architectures to higher dimensions. This approach is promising for the medical imaging domain, as there are many problems that require problem-specific architecture design. Therefore, we propose an adaptation of the ENAS framework for segmentation problems, the most common problem in the medical image domain.