• Keine Ergebnisse gefunden

Deep Learning with other Multi-Dimensional Problems

While we cover a large number of OCT and MRI applications in the context of multi-dimensional deep learning, there are other problems for the two modalities with relevant deep learning approaches to consider. Also, for other imaging modalities such as CT and US, there are several multi-dimensional problems that share similarities with MRI and OCT.

For OCT, another important multi-dimensional problem is OCT angiography (OCTA).

This imaging modality is used to visualize blood flow with a primary application in ophthalmology. Retinal blood flow in larger vessels can be detected and quantified using the Doppler shift, i.e., the phase shift between consecutively acquired A-Scans [529].

This has been applied to the task of detecting patients with diabetic retinopathy [530].

Wang et al. observed that patients with diabetes showed lower than normal blood flow in retinal vessels [530]. This has also been extended to blood flow quantification in microvessels using high-speed OCT [225]. The authors computed decorrelation angiography using eight consecutive 2D intensity B-Scans with decomposition into four spectral bands. The averaged, decorrelated angiography frames were averaged, and, based on slice-wise processing, a 3D angiography volume was obtained. A maximum intensity projection along the depth dimension provided an en face visualization of blood flow in the optic disc. Also, the authors demonstrated that blood flow calculated from the angiography images could serve as a marker for glaucoma detection. Moult et al. extended this idea to the problem of detecting AMD in patients [350]. Here, the authors also computed angiograms using the decorrelation of intensity B-Scans repeatedly acquired at the same spatial location. As a result, most conventional methods treated the problem of angiography as a 3D spatio-temporal problem by processing entire OCT volumes in a slice-wise fashion.

More recently, deep learning methods have been proposed in the context of OCT angiography. Guo et al. focused on the segmentation of vascular structures in en face OCT angiograms, which can provide information on the presence of diabetic retinopathy [181]. The authors avoided the spatio-temporal problem by calculating the angiography images using conventional methods. First, the superficial vascular complex in the retina was obtained from manual segmentation. Then, angiograms were calculated in the relevant areas using intensity images and decorrelation. A maximum intensity projection then provides a 2D en face view of the angiogram, which is segmented by a conventional 2D CNN. The authors employ an encoder-decoder architecture which also considers multi-scale context. The authors extended this approach by using additional inputs to the 2D CNN [182]. Besides the en face maximum intensity projection of the angiogram, the authors also use a maximum intensity projection of the normal OCT intensities and a depth image encoding the retinal thickness at each pixel location, obtained by retinal layer segmentation. Thus, the authors also consider additional spatial information in the problem but do not explicitly process the spatial depth dimension or the temporal dimension. Lauermann et al. proceeded similarly when addressing the problem of OCT angiogram quality assessment [274]. Deep learning-based temporal processing is not considered, as pre-computed angiography images are directly fed to a

5.6 Deep Learning with other Multi-Dimensional Problems

CNN for classification.

OCT angiography was addressed very differently by Liu et al., who tried to estimate OCT angiograms from a time series of OCT images, thus explicitly formulating a spatio-temporal deep learning problem [309]. Ground-truth angiograms were automatically obtained by using an algorithm that considered both intensity-based decorrelation and phase difference. Then, a CNN was trained, which received four B-Scans taken at the same spatial location as the input and predicted an angiogram at that location. The four time points were processed by stacking in the CNN’s channel dimension. Interestingly, the authors report an improved signal-to-noise ratio over the method that was used for generating the ground-truth. In a follow-up study, Jiang et al. investigated several different CNN methods for OCT angiogram generation [226]. This includes single-and multi-path models, an encoder-decoder CNN, single-and a generative adversarial network.

Also, the authors tried adding additional phase information for improved angiogram prediction. The authors found that adding phase information significantly improves several measures for image quality assessment.

The problem of visualizing and assessing blood flow can also be tackled with MRI, usually using phase-contrast (PC) MRI [373] or arterial spin labeling [257]. Note that this type of imaging is particularly focused on blood flow itself, not blood flow as a surrogate for brain activity, as performed for functional MRI (fMRI). Cerebral blood flow has been visualized using MRI for generating 3D perfusion maps [104]. Flow maps were deemed useful for assessing blood flow and arterial stenosis. Blood flow measurements have also been employed for assessing cardiac function. Furthermore, Jerosch et al. demonstrated the feasibility of measuring blood flow from MR images [224]. While blood flow images can be processed and visualized in 2D or as 3D spatio-temporal images, 4D flow estimation, and visualization have been shown to improve the assessment procedure [505]. One problem with perfusion maps is the high presence of noise and a low signal-to-noise ratio. Therefore, several conventional methods for image denoising have been proposed. For example, Bibic et al. employed a wavelet-domain filtering approach, showcasing improved performance over conventional spatial denoising [48]. Liang et al. extended this approach by also employing non-local means filtering [296]. Also, a spatio-temporal approach was proposed using low-rank total variation [132]. Recently, deep learning approaches have been presented for this problem.

For example, Kim et al. took an image-to-image translation approach [248]. Here, a CNN was trained to reconstruct a high-quality perfusion image, obtained from multiple measurements, based on a lower-quality image from fewer measurements. The authors demonstrated improved image quality compared to the perfusion images from fewer images. The authors employed a slice-wise 2D CNN for this problem. An extension was presented by Pinto et al [381]. Here, the authors augmented a 2D CNN approach with a signal model for improved denoising performance. An approach by Xie et al. proceeded similarly with a different 2D CNN architecture [551]. Overall, MRI processing related to blood flow is largely performed on 2D slices.

Additional multi-dimensional deep learning problems arise for the imaging modality fMRI. Here, blood flow is used to derive functional activity and creating a mapping of the brain [140]. As fMRI is able to capture functional activity, it can be used to detect and classify neurological brain disorders such as Alzheimer’s disease (AD), autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD). Similar to other

detection and classification problems, initial methods for these kinds of problems utilized conventional machine learning methods. Khazaee et al. constructed a connectivity matrix using a parcellation of the brain into multiple different functional regions to build a graph of brain functionality and extracted features from that graph. After using feature selection, a Naive Bayes classification model was trained for differentiating healthy, AD, and patients with mild cognitive impairment (MCI) [242]. This method was extended by using multivariate Granger causality analysis for building a directed graph representing functional brain regions [243]. Similar approaches have been pursued for the classification of ASD. For example, Iidaka et al. calculated correlation matrices from fMRI and used correlation features for training a probabilistic neural network model [211]. Correlation features were computed by taking several hundred different ROIs, averaging the spatial regions, and then calculating the correlation between the different times series. This formes a correlation matrix to be used as a feature source. Plitt took a similar approach and compared multiple different conventional machine learning methods for the task [382]. For ADHD, Park et al. also created a connectivity graph for deriving features from fMRI [371]. The features were then used for classification with an SVM. Deshpande et al. employed similar methods in combination with FC-NNs [103].

Recently, deep learning methods have taken over for classification tasks based on fMRI data. An early deep learning methods was proposed by Saraf et al. for AD classification [425]. The authors decomposed the 4D fMRI time series into 2D slices which were individually classified by a 2D CNN. Suk et al. used convolutional auto-encoders to extract discriminant features from fMRI data. [471]. Then, an HMM was used to provide an estimate of AD, modeled as a hidden state. A similar approach was presented by Zeng et al. [585]. Here, the authors first computed connectivity features, which were then used with an auto-encoder to obtain a compressed latent representation.

The representation was then used for classification with a linear SVM. A method by Zou et al. combined both structural and functional MRI images with a spatial 3D CNNs [604].

The fMRI data was reduced to a 3D volume by computing voxel-wise features from the time series, which were stacked into the CNN’s input channel. A similar approach was proposed by Qureshi et al., where 4D times series were aggregated into 3D volumes by removing noisy components from the time series [392]. Li et al. took a straightforward approach for aggregating 4D fMRI into a 3D volume by calculating mean and standard deviation across the time dimension and stacking both features into the model’s channel dimension [293].

Dvornek et al. shifted the focus from aggregation into a spatial representation to temporal processing [117]. The authors took the time series extracted from multiple ROIs and, instead of computed correlation features as performed for conventional methods, directly processed the time series using LSTMs. Recently, in a preliminary study, we proposed full 4D deep learning for fMRI-based ASD classification [43]. We employed both spatial 3D CNNs processing a volume obtained by a statistical summary and 4D CNNs, as well as our novel cGRU-CNN3D architecture. We demonstrated that full 4D spatio-temporal processing led to the best performance. Another very recent method by Mao et al. combined multiple 4D deep learning methods for ADHD classification [320].

The authors used several parallel processing paths with a 3D CNN followed by LSTMs, a full spatio-temporal 4D CNN, and a 3D CNN followed by temporal pooling. The method was applied to AD classification. As a result, current deep learning methods for

5.6 Deep Learning with other Multi-Dimensional Problems

fMRI data already make use of high-dimensional data processing.

The imaging modality CT is similar to MRI in terms of its data dimensionality. While image acquisition is usually performed for obtaining 3D image volumes, the processing is often also performed in 2D or using 2.5D projections. Also, temporal information can be involved, leading to up to 4D spatio-temporal data. Being one of the most widely used imaging modalities, CT comes with a vast amount of literature related to multi-dimensional deep learning problems. Here, we focus on the most relevant and prominent applications.

Typical tasks where CT imaging is employed include lung nodule detection, lung disease detection, and cardiac assessment. For long nodule detection, conventional methods have relied on frameworks using an algorithm for rough nodule detection, followed by false positive reduction. For example, Murphy et al. performed initial lung nodule detection by using the shape index and curvedness features [352]. Then, two stages of k-NN classification were used to reduce the false positive rate. Features were largely shape-based, including measures for nodule size, shape, dimensions, and sphericity. Other conventional approaches were similar, for example, Messay et al. used intensity thresholding and morphological operations for segmentation and detection of nodule candidates [336]. Then, a set of shape and intensity features is used to classify nodules using a Fisher linear discriminant classifier. For the classification of various lung diseases, conventional methods relied on 2D patch-based approaches, feature extraction from patches, and conventional machine learning methods. Uppaluril et al. computed gray level co-occurrence matrices for features, employed several texture features, and also used the geometric fractal dimension as a feature for the training of a Naive Bayes classifier [499]. Song et al. extend this approach by adding Gabor filter-based LBP features and HOG features to their pool of features for classification [460]. Classification is performed with a modified k-NN approach. One problem that can be tackled by cardiac CT is the detection of calcifications within coronary arteries. A classical approach by Isgum et al. used multi-atlas segmentation, thresholding, and 3D connected components to obtain candidates for calcifications around the heart [216]. Then, intensity and texture features were used for voxel-wise classification using multiple classifiers, including k-NN and an SVM. Another approach by Xie et al. used an initial segmentation to detect the heart and coronary arteries [555]. Then, filtering and thresholding were used to detect calcifications. Classic image processing and 4D CT has mostly been studied in the context of respiratory motion for radiation therapy [535] and modeling of the heart [338].

Similar to other medical imaging applications, deep learning methods recently gained traction for CT image processing. For long nodule classification, Setio et al. propose a deep learning-based system where nodule candidate detection is performed by a conventional method, followed by false positive reduction using several 2D CNNs [438].

The authors take a multi-view approach by selecting nine different planes, where each is processed by a different CNN, followed by prediction fusion. A more recent approach directly detects lung nodules from CT scans using CNNs [552]. The authors employ Faster R-CNN [400] for nodule detection in 2D CT slices. The nodule candidates are then processed by multiple 2D CNNs, each receiving a different 2D view. In another approach, full 3D volume processing was performed [597]. Here, the authors first employ an extension of the object detection framework Faster R-CNN to 3D for candidate nodule

detection. Then, a second, multi-path CNN is employed for nodule classification. Finally, the features learned from the second CNN are used to train gradient boosting machines to obtain the final classification.

For lung disease classification, Anthimopoulos et al. also rely on a conventional patch extraction system. Then, patches are classified using a conventional 2D CNN [18]. In a large-scale study, Walsh et al. used pre-segmented axial 2D CT slices for classifying multiple diseases [516]. The 2D slices were processed by a standard 2D CNN. More recent methods also employed 3D CNNs with 3D crops from full CT images [383]. The authors compared several 3D CNN variations but did not perform a comparison to 2D slice-wise approaches. For coronary calcification detection, an early deep learning approach relied on patch-wise classification where patches were detected using thresholding and morphological operations [290]. Then, several 2D CNN classified patches in terms of the presence of coronary calcifications. The different CNNs receive different orthogonal 2D views as the input. Lessmann et al. extended this approach by also using a 2D CNN for initial candidate selection in a full axial 2D CT slice [289]. More recent methods have also moved to full 3D CT image processing. For example, Ghanem et al. used 3D CT angiography images for detecting calcifications within a segmented coronary artery tree [169]. Very recently, deep learning-based 4D CT processing has also emerged, mostly, in the context of image-to-image translation and image reconstruction. Leemput et al. derived non-contrast CT images from 4D spatio-temporal perfusion CT images using fused recurrent-convolutional networks [284]. This is motivated by reduced radiation exposure for patients being treated in the context of acute stroke. Also, Clark et al. reconstruct high-quality 4D cardiac CT images from undersampled 4D CT data using a 4D encoder-decoder CNN [92]. Summarized, over the years, deep learning-based CT processing has moved towards higher-dimensional data processing, moving from slice-wise 2D processing to volumetric processing. While 4D CT is available for some problems, applications are still rare.

Given that CT images are widely employed, there are additional, more subtle methods improvements that have been proposed for different CT applications. An overview of deep learning methods for CT image data is given by Litjens et al. [301] and Halder et al. [185].

US is more closely related to OCT and offers similar data representations as OCT, ranging from 2D to 4D data representations. The multi-dimensional aspect of US is mostly relevant for applications including echocardiography, disease detection and classification, and fetal US imaging. A frequent application for disease detection is breast cancer. Similar to all other previously discussed imaging modalities and applications, early methods for automated breast cancer detection and classification relied on a conventional pipeline with feature extraction and classic machine learning models [172].

Over recent years, this was replaced by deep learning methods, mostly processing 2D US images with CNNs [306]. Han et al. first demonstrated the effectiveness of CNNs for breast lesion classification [186]. The authors made use of the Inception architecture for classifying lesions based on small 2D image crops around the lesion. The authors observed significant performance improvement over the use of classic features and SVMs.

This approach was extended by Byra et al. where transfer learning from the ImageNet dataset was explored [68]. While transfer learning with CNN fine-tuning substantially improved performance, the authors also found that a color conversion strategy of the

5.6 Deep Learning with other Multi-Dimensional Problems

2D US images to artificial color channels improved performance further. One way to incorporate higher-dimensional information into the problem is the use of shear-wave elastography. Here, temporal information in terms of velocity information can be used to derive 2D elastography maps. This 2D encoding of higher-dimensional data has also been used for cancer detection in breast US using 2D CNNs [595]. The use of full 3D US volumes is more common for other applications such as fetal US imaging. For example, Looney et al. performed placenta segmentation in 3D US volumes using a 3D CNN [313].

The authors relied on a multi-scale 3D CNN architecture that was previously employed for brain MRI. Another approach addressed the problem of abdomen segmentation in fetal US [432]. The authors performed initial segmentation using a multi-scale 3D CNN. Then, the segmentation maps were used in a traditional model-based segmentation algorithm to obtain improved segmentation borders. While 4D US can be acquired by obtaining 3D scans over time, deep learning methods explicitly processing this type of data are rarely found. Philip et al. used 4D US images of the heart, however, processing was applied on individual 3D volumes in the temporal sequence [379]. Thus, there are few US applications moving towards higher-dimensional data, but most methods still rely on 2D slice-wise processing.

Overall, US is widely applied in clinical practice due it safe, cheap, and fast imaging capabilities, resulting in a large body of applications where deep learning can be helpful.

A more extensive overview of deep learning-based US applications is given by Liu et al. [306] and Huang et al. [209]. A survey with a focus on 3D US was conducted by Kozegar et al. [259].

Across multiple imaging modalities, there are a variety of deep learning applications with relation to multi-dimensional data. For OCT, angiography involves temporal context, which was explicitly considered in very recent work. For MRI and CT, the advantage of 3D volumetric over 2D slice-wise processing has become very evident over the last few years. MRI applications involving blood flow and brain function have also made steps towards 4D image processing with deep learning methods very recently. A similar trend can be observed for CT, however, 4D applications for this modality are focused on reconstruction and image-to-image translation. US deep learning applications show similar trends, however, 4D deep learning is rarely found for this imaging modality so

Across multiple imaging modalities, there are a variety of deep learning applications with relation to multi-dimensional data. For OCT, angiography involves temporal context, which was explicitly considered in very recent work. For MRI and CT, the advantage of 3D volumetric over 2D slice-wise processing has become very evident over the last few years. MRI applications involving blood flow and brain function have also made steps towards 4D image processing with deep learning methods very recently. A similar trend can be observed for CT, however, 4D applications for this modality are focused on reconstruction and image-to-image translation. US deep learning applications show similar trends, however, 4D deep learning is rarely found for this imaging modality so