• Keine Ergebnisse gefunden

analysis [10]. Another recent application is vision-based force estimation [179]. As mechatronic force sensor integration into surgical setups can be difficult, vision-based force estimation has been studied using deep learning methods [27]. Most of these applications are designed for processing a real-time stream of 2D images or 3D volumes.

Therefore, deep learning methods for computer-assisted interventions have to deal with the challenge of 2D, 3D, and 4D spatio-temporal data analysis.

Summarized, deep learning has found tremendous success in medical research. There is a vast amount of potential deep learning applications in the areas of medical image analysis and computer-assisted interventions. While there have been a lot of deep learning applications, there are still a lot of open image processing problems that could be improved by the use of deep learning methods. Also, for applications where deep learning has found initial success, there are still significant challenges that need to be addressed.

1.3 Research Questions

There are multiple ways of grouping work on deep learning for medical image analysis, for example, by their particular method, imaging modality, or anatomical region. Another aspect that is commonly neglected but takes a vital role in many applications is data dimensionality. Medical images are often volumetric, and many acquisition devices offer a temporal stream of images. As a result, many problems are multi-dimensional, with data representations ranging from 1D depth profiles to 4D spatio-temporal streams of volumes.

Previously, the most common problem regarding multi-dimensional data represen-tations was the choice between 2D and 3D data processing. For visual assessment, medical images, even if volumetric, are often viewed in 2D, slice-by-slice. Most early deep learning approaches for popular imaging modalities such as MRI or CT have also used slice-by-slice processing [63, 593]. However, recently, more and more full 3D approaches emerged with the goal of exploiting inter-slice context in full 3D volumes.

Many approaches have demonstrated that considering full 3D, inter-slice context is beneficial for CNNs [81, 141, 231, 301, 339, 376, 575].

Similarly, early machine learning approaches for OCT image analysis have processed individual 1D depth scans, usually focusing on tissue scattering patterns [134, 308].

While OCT’s light scattering is often tissue-specific [557], 1D depth scans can be ambiguous and spatial relationships cannot be captured. Therefore, in recent years, machine learning methods have been increasingly used with 2D data representations that also capture spatial context [2, 131, 171, 282, 305, 397].

Thus, data dimensionality has started to play a significant role for several medical image analysis problems. Promising results presented for 2D and 3D MRI, or 1D and 2D OCT data open up questions for other data representations that have not been addressed so far.

For example, for MRI, there are other multi-dimensional aspects that have not been addressed frequently. Cardiac MR images are typically acquired as a sequence of 2D slices, covering an entire cardiac cycle. For function assessment, left ventricle quantification is required where parameters can be directly derived from individual 2D

1D Depth Profile 2D Spatio-Temporal Data

Depth

Intensity

Depth

Time

Depth

Lateral Spatial Dimension

2D Spatial Data

3D Spatial Data

Surfacet0

Surfaceti

Surfacetn−1

Volumeti Volumet0 Volumetn−1

4D Spatio-Temporal Data Representation Full 4D Spatio-Temporal Data

Fig. 1.3: Example for several OCT data representations, including a 1D depth profile, a 2D spatio-temporal series of depth profiles, a 2D cross-sectional image, a full 3D volume, and two 4D spatio-temporal data representations, shown as overlayed image volume renderings.

slices or they can be estimated considering the entire 3D spatio-temporal sequence [561].

This leads to the question of whether temporal context could allow for more consistent estimates.

Another open problem is longitudinal image analysis for tracking disease progression in the context of multiple sclerosis lesion activity [50]. Here, two 3D volumes or an entire 4D sequence of MR volumes needs to be analyzed to derive changes in the brain.

Deep learning with 4D data has found few applications, and it is still largely considered an open research problem [88]. Thus, there is a need for more extensive analysis of different MRI data representations and their value for deep learning problems.

The imaging modality OCT also comes with a lot of different multi-dimensional data representations, see Figure 1.3. Fundamentally, OCT images consist of 1D depth profiles. Thus, in the context of tissue analysis, early approaches analyzed 1D intensity images to find patterns for tissue identification [496]. By acquiring multiple 1D depth profiles at neighboring locations, 2D images can be constructed, which are used in clinical practice for assessment of tissue layers [486]. Following the idea from CT and MRI applications that higher-dimensional context might be useful, 2D deep learning techniques have emerged for retina images [416] and intravascular images [3]. For needle insertion scenarios, OCT can also be employed where time series of 1D depth profiles are acquired. While individual 1D profiles have been processed with deep learning models [368], it is still unclear whether temporal context and thus 2D spatio-temporal data is useful. With more advanced scanning procedures, 3D image volumes can be

1.3 Research Questions

1D KernelKRkd 11 Parameters

2D KernelKRkw×kd 121 Parameters

3D KernelKRkh×kw×kd 1331 Parameters

4D KernelKRkt×kh×kw×kd 14641 Parameters

Lower-Dimensional Higher-Dimensional

Fig. 1.4: An illustration of the curse of dimensionality and kernel dimensions. We show kernel sizes for different data dimensions where each dimension has a size of 11, as employed in the popular architecture AlexNet [262].

constructed. While finding application in intraoperative imaging [76], 3D deep learning-based processing has rarely been used. Thus, multi-dimensional OCT deep learning problems with an intraoperative context, including pose and motion estimation [276], as well as force estimation [367], remain open problems. Advancements in Mhz OCT devices have even enabled 4D spatio-temporal data generation [537], which could provide even richer context and has not been made use of so far. As a result, there are many opportunities for the use of multi-dimensional OCT data.

As a result, there are numerous multi-dimensional deep learning problems for MRI and OCT data that lack an analysis from a multi-dimensional perspective. When addressing such a problem, there is typically a choice between different data representations that can range from 1D to 4D data. This choice is accompanied by the design of a suitable deep learning method that deals effectively with the data structure.

Historically, deep learning model design for medical image analysis is heavily in-fluenced by methods proposed in the natural image domain where deep learning for images originally emerged [301]. This becomes evident in the extensive use of 2D CNNs that have been originally designed for 2D natural images [451]. This has been largely successful for a lot of medical image analysis problems [301, 447]. Thus, from a multi-dimensional perspective, the question is how to extend common 2D deep learning methods to other data representations.

Moving to lower data dimensions, 1D data does not offer much context, but it comes with the advantage of being computationally cheap. Thus, 1D data is interesting in terms of real-time applications, for example, for computer-assisted interventions. Furthermore, deep learning methods that are computationally expensive otherwise might be easy to employ on 1D data.

Moving towards higher-dimensional deep learning models is particularly challeng-ing as the curse of dimensionality becomes a significant problem. For a CNN, the number of trainable parameters increases exponentially, which leads to a severe risk of overfitting due to overparameterization, see Figure 1.4. Therefore, model design for higher-dimensional data requires a particular focus on efficiency in terms of the number of trainable parameters. At the same time, computational resources and memory usage become critical as processing high-dimensional data is very cost-intensive. Thus,

higher-dimensional deep learning model design is complicated, which might be one of the reasons why it has not gained more traction so far, despite promising results being reported [231]. For example, 3D CNN design has been previously referred to as "a nightmare" [327]. Overall, high-dimensional deep learning model design remains a challenging problem for medical image analysis.

Summarized, multi- and high-dimensional data are a promising opportunity to make use of relevant context. However, the optimal choice of data representations is often un-clear, and deep learning model design is challenging, in particular for high-dimensional data. This results in two principal research questions addressed in this thesis:

1. Which data representations should be used for deep learning-based multi-dimensional medical image processing?

2. Which deep learning methods can be designed and used for processing specific data representations?

These research questions are very fundamental and broad in nature, requiring an analysis of different applications and imaging modalities to obtain general insights.

Therefore, throughout this dissertation, we address these two research questions for the imaging modalities OCT and MRI in the context of several different applications. We propose multiple novel deep learning approaches and architectures for multi-dimensional image data. Our methods undergo extensive empirical evaluation in different application scenarios for addressing the two research questions in an application-specific context and for gaining a broader understanding and generalizable insights on multi-dimensional deep learning.