• Keine Ergebnisse gefunden

In this work, we propose several novel deep learning methods and apply them to various open multi-dimensional deep learning problems with regard to our two research questions. First, we adapt and introduce deep learning methods for different types of multi-dimensional medical image data. Second, we apply the adapted and our proposed methods to biomedical applications where data dimensionality is a key problem that has not been addressed so far. We perform extensive and systematic experiments to validate our adapted and proposed methods across multiple application scenarios.

1.4.1 Multi-Dimensional Deep Learning Methods

1D, 2D, 3D, and 4D CNNs. Convolutional neural networks are the most common method for machine learning-based image processing. Typically, they are applied to 2D images as it is a frequent practice in the natural image domain. As a first step, we design CNNs for 1D OCT image data and explore their capabilities in processing lower-dimensional data. As processing lower-lower-dimensional data is cheap, we explore automized architecture search on 1D image data and the resulting architecture’s transferability to higher dimensions. Moving to 2D data and 2D CNNs, we consider both the typical case of 2D spatial data processing as well as 2D spatio-temporal data processing where

1.4 Primary Contributions

convolution operations also process the temporal dimension. Next, we extend spatial 2D CNNs to 3D. Previously, there have been hardly any spatial 3D CNNs, in particular, none for regression problems and OCT data. Therefore, we adopt several concepts such as Inception [476], ResNet [193], and ResNeXt [553] to design 3D CNNs for 3D image data. As an alternative, we consider the immediate extension of existing 2D CNN architectures to 3D. We enable this approach by proposing a multi-dimensional transfer learning strategy with weight scaling for reusing 2D kernels in a 3D CNN. Finally, we also design 4D CNNs for processing 4D spatio-temporal data, a field that is mostly unexplored. We design several different 4D variants that perform efficient processing of the spatial and temporal data dimensions in different ways.

2.5D and 3.5D CNNs. There are many different learning problems where there are exactly two different states or representations that need to be processed. If each of these states is 2D or 3D in nature, we refer to the problem as 2.5D or 3.5D, respectively. In the natural image domain, a particular class of CNNs has been presented for this type of problem, called Siamese CNNs [508]. Here, the idea is to learn similar features for similar images, for example, for matching tasks. We adapt and extend this concept for biomedical learning problems with two 2D or 3D input images. We exploit image similarity with shared processing paths and explore how much parameter sharing or individual learning is beneficial with this type of architecture. Furthermore, we study the properties of feature fusion and propose a novel attention-guided interaction method for improved information exchange between the two paths.

Recurrent-Convolutional Models. Spatio-temporal processing can be performed using convolutions both for spatial and temporal data dimensions. Another class of methods that is suitable for temporal processing is named gated recurrent neural networks [200]. Here, temporal dependencies are learned in a recurrent fashion where relevant information within the sequence is preserved through gating and a state. Previous methods have used CNNs for extraction of feature vectors for each image in a spatio-temporal sequence, which is then processed by recurrent models [108]. We extend this approach by using convolutional gated recurrent units (cGRU) instead, followed by a CNN. Thus, instead of aggregating information from an abstract feature vector, we fuse local information in the initial spatio-temporal sequence while preserving the spatial data structure. We successfully apply this approach for 2D, 3D, and 4D deep learning problems, showing promising results. Based on this idea, we also propose an architecture with cGRU units between encoder and decoder for 4D segmentation problems. Here, we also aggregate temporal information using cGRUs while preserving spatial context for decoding into a segmentation map.

A selection of our proposed 4D deep learning methods is shown in Figure 1.5.

1.4.2 Multi-Dimensional Deep Learning Problems

All our adapted and proposed deep learning methods are tied to one or multiple biomedi-cal applications. For each application, we study the effects of using data representations with different dimensionality. Here, we briefly describe the different problems and our respective insights, followed by our generalized insights.

OCT Fiber-Based Force Estimation. Precise placement of needles is a challenge in several clinical applications, such as brachytherapy or biopsy. Forces acting at the needle

ResBlock3D

Res.Block/2ResBlock3D Res.Block/2ResBlock3D

1×323@128 1×643@64 1×1283@32

ResBlock3D 3×1283@16 2×ResBlock3D 3×643@32 2×ResBlock3D 3×323@64 4×ResBlock3D 3×163@128

Conv ConvDown ConvDown ConvDown ConvUp ConvUp ConvUp Conv

cGRU

Fig. 1.5: Overview of our central method contribution to 4D deep learning: Top, we show several 4D deep learning architectures that we propose for regression problems, including 4D CNNs (RN4D, facRN4D) and recurrent-convolutional models (RN3D-GRU, cGRU-RN3D). Bottom, we show our cGRU-RN3D-U architecture for segmentation problems with 4D input data.

cause tissue deformation and needle deflection, which in turn may lead to misplacement or injury. Hence, many approaches to estimate the forces at the needle have been proposed. However, integrating sensors into the needle tip is challenging, and a careful calibration is required to obtain good force estimates. For this purpose, we propose a fiber-optical needle tip force sensor design using a single OCT fiber for measurement.

The fiber images the deformation of an epoxy layer placed below the needle tip which results in a stream of 1D depth profiles. We study different deep learning approaches to facilitate calibration between this spatio-temporal image data and the related forces. For this application, we apply 1D and 2D CNNs, as well as convolutional-recurrent models, finding that the latter are most effective for the problem.

OCT-based Tissue Classification. A common tissue classification task is the seg-mentation of different layers in the human retina. Diseases such as diabetic retinopathy lead to the accumulation of fluids in between retina layers, requiring continuous mon-itoring of the retinal layer structure. Most approaches use CNNs for processing 2D cross-sectional images using custom, hand-crafted architectures [416]. We investigate whether improved 2D CNN architectures can be found with the concept of neural archi-tecture search. As this method is computationally costly, we study whether searching for architectures in the space of 1D CNN models using depth profiles is effective. We demonstrate that architectures found on 1D data transfer well to higher-dimensional 2D data.

Similar to retina imaging, OCT tissue classification in coronary arteries is primarily performed using 2D cross-sectional images and 2D CNNs. Here, the goal is to detect

1.4 Primary Contributions

plaque deposits within the arterial walls in order to guide treatment decisions for prevent-ing stenosis or rupture of vulnerable plaques. Here, we consider 2D Cartesian and 2D polar data representations for processing with a 2D CNN. Also, we extend the 2D data problem to 2.5D by combining the two data representations. The two representations are processed by one of our 2.5D Siamese CNN architectures. We find that Cartesian data representations appear to be preferable if enabled by data representation-specific data augmentation. Furthermore, we show that combining two data representations leads to improved performance.

MRI-Based Left Ventricle Quantification. Another 2D learning problem on tissue data is left ventricle quantification, where clinically relevant parameters are extracted from 2D cardiac MR images. Although the entire relevant anatomical structure is available in a single frame, neighboring temporal frames within the cardiac cycle might allow for more consistent estimates. In this context, we study the use of 2D spatial and 3D spatio-temporal CNNs. In particular, we employ our multi-dimensional transfer learning technique for immediate transfer of architectures to higher dimensions. Furthermore, we propose a segmentation-based regularization scheme to improve geometric left ventricle parameter estimation.

OCT-Based Pose and Motion Estimation. Another deep learning problem that is often addressed with 2D images and 2D projections is pose estimation. Here, the goal is to derive an object’s pose from several images or 2D projections. We extend this problem to 3D using spatial 3D OCT volumes and several new 3D CNNs. We find that the additional volumetric information is more beneficial than simply using 2D projections, encoding 3D space. When performing tasks such as tracking or motion compensation, entire motion vectors need to be estimated instead of just individual poses. First, we address this problem in 3.5D with two 3D OCT volumes, processed by our Siamese CNN models. Second, we extend this approach to a full 4D problem, employing our proposed 4D CNN architectures.

OCT Volume-Based Force Estimation. Besides pose and motion estimation, force estimation is an important task for computer- and robot-assisted interventions. In contrast to needle-based force measurement, volume-based force estimation is performed with an external imaging modality. Here, we investigate the use of OCT as an imaging modality.

Similar to OCT-based pose estimation, we first demonstrate that 3D volumetric data representations are preferable over 2D projections for deep learning-based estimation.

Second, we extend this problem to full 4D spatio-temporal deep learning, finding that the use of 4D data is preferable. Moreover, we find that encoding lower-dimensional data representations in a higher-dimensional space improves force estimation performance.

This indicates that higher-dimensional processing might often be preferable.

MRI-Based Multiple Sclerosis Lesion Activity Segmentation. 4D data is also relevant for the problem of longitudinal tracking of disease progression. We address this problem in the context of lesion activity segmentation, where the change in brain lesions between two MRI scans needs to be detected. First, we address this problem with our 3.5D Siamese CNNs using the two MRI scans as the model input. Here, we employ our attention-guided interaction modules for effective information exchange between the two states. Then, we extend the problem to 4D using our architecture with cGRU units between encoder and decoder that is depicted in Figure 1.5. We demonstrate that attention-based interaction modules improve performance and that they produce

interpretable attention maps. Also, we find that full 4D processing is beneficial in this application scenario.

Across all our applications, we find that our proposed cGRU-CNN-based approaches are effective for dealing with spatio-temporal data ranging from 2D to 4D. Similarly, our architecture with a cGRU between encoder and decoder shows promising results for 4D data processing in the context of segmentation problems. For CNNs, we explore different ways of architecture design, including adaptation, custom design, and automated learning.

We find that each approach is viable in different contexts, as learning and adapting is more suitable for lower-dimensional data, and custom design is required for higher-dimensional data. For Siamese CNNs, we find that the extent of shared data processing is application-specific and that attention modules are useful for exchanging information between two states. Regarding the choice of data representations, we find that using higher-dimensional data is effective across all our applications. Our deep learning models effectively exploit the additional context and consistency provided by the additional data dimensions. Our insights confirm and extend the current trend of processing full 3D image volumes instead of just slices. In particular, our insights and proposed architectures for 4D deep learning appear to be promising for biomedical applications where high-dimensional image data is available.