MRI-Based Left Ventricle Quantification - Deep Learning with Multi-Dimensional Medical Image Da

5.4 MRI-Based Left Ventricle Quantification

Left ventricle (LV) quantification from medical image data is often employed for assess-ment of cardiac function and for diagnosing diseases [234]. Relevant LV indices include the myocardium and cavity area, three LV cavity dimensions, six regional wall thickness (RWT) parameters, and the cardiac phase (systole and diastole). In clinical practice, additional measures such as ejection fraction are often calculated. Different imaging modalities can be used for quantification, including echocardiography, cine MRI, and cardiac CT. While echocardiography is frequently employed, it is limited in terms of image quality, and the result is operator-dependent [385]. Currently, cardiac MRI is con-sidered the gold standard for LV quantification, and cardiac CT is concon-sidered as a viable alternative [253]. Independent of the imaging modality, LV indices are usually obtained by manual segmentation of the myocardium, which is time-consuming and associated with a high intra- and inter-observer variability [469]. Also, the problem is challenging due to the high variability of cardiac structure between patients and deformation during the cardiac cycle. Therefore, a lot of automatic LV segmentation and quantification methods have been proposed. In terms of data dimensionality, LV quantification can be considered a 2D image processing problem if individual image frames within the cardiac cycle are processed. Since function assessment requires observation across the entire cardiac cycle, the problem can also be considered to be 3D spatio-temporal. The problem can be extended to full 4D spatio-temporal data when using full volumetric data, which has been shown to be advantageous for echocardiography and leads to better volume estimates [269]. Similarly, 4D cardiac MRI has found clinical application [300].

For echocardiography, early methods for LV segmentation relied on semi-automatic methods to obtain a model of the left ventricle that is deformed over time [152]. Similarly, Caiani et al. proposed a method where initial, manual markings of the endocardial border are tracked over time [70]. Another method models the LV as a cubic hermite spline [453]. To account for deformation over time, the model can be translated, rotated, and scaled, and each control point can be varied to enable local deformation. A Kalman filter is used for tracking the LV deformation over time. More recently, deep learning approaches have been presented for LV segmentation from echocardiography images.

For example, Chen et al. address the problem using fully convolutional 2D CNNs [82]. Specifically, they try to overcome the general problem of noisy borders and artifacts in ultrasound image segmentation with a cross-domain approach. The authors hypothesize that building a CNN that solves multiple ultrasound segmentation problems simultaneously should achieve better generalizable performance. The author’s CNN consists of a domain-independent, initial processing path which is split into different paths for the different segmentation problems. The authors demonstrate that joint learning improves LV segmentation performance. Smistad et al. also investigate 2D CNNs for LV segmentation [455]. The authors try to tackle the problem of limited annotated data availability with a student-teacher approach. First, the authors obtain automated segmentations with a basic Kalman-based method, proposed in [453], for a larger dataset including images without expert annotations. The CNN is trained using these annotations and compared to the Kalman method, showing improved performance for the Hausdorff distance.

Oktay et al. propose anatomically constrained neural networks for several tasks,

including LV segmentation from echocardiography images [363]. Here, the idea is to incorporate anatomical priors such as shape into the learning process. First, an autoencoder is trained to learn a lower-dimensional representation of the label map.

An additional CNN path ensures that the representation can be predicted from the original intensity images. Then, during training of the segmentation network, predicted segmentation masks are again encoded by the fixed encoder part of the autoencoder. The calculated representation is compared to a representation of the ground-truth mask using the L2 loss. This loss is also used for training the segmentation network. The authors found that this strategy improves LV segmentation performance. Jafari et al. focused on the temporal aspect of echocardiography data by employing a 3D spatio-temporal segmentation method [220]. The authors design a convolutional and recurrent model that first encodes individual frames into a lower-dimensional representation. Then, a bidirectional convolutional LSTM also performs temporal processing across slices.

Finally, a decoder outputs individual segmentation masks for each image. The authors also consider temporal information by including an additional input path that processes pre-computed optical flow images between frames. In another work, Smistad et al.

performed left ventricle segmentation using a fully convolutional architecture while focusing on the aspect of real-time processing [454]. The CNN processed the 2D images frame-by-frame. Recently, Azarmehr et al. compared several different CNNs for segmenting the LV from echocardiography images [30]. The authors found that the original U-Net architecture [409] without any modifications outperformed other newly proposed methods.

For cardiac MRI, initial LV segmentation and quantification methods relied on con-ventional image processing techniques such as deformable templates, active contour models, and level sets. Lee et al. proposed a segmentation method where region growing with iterative thresholding was employed for segmentation of the LV endocardium, followed by segmenting the epicardium using an active contour model [281]. Paragios employed a level set method for LV segmentation where local and global constraints, as well as temporal consistency, are introduced to the problem [370]. Kaus et al. build a deformable model for LV segmentation using prior knowledge from an annotated dataset [238]. Multiple similar approaches have been introduced, which are reviewed and discussed by Ngo et al. [358]. Ngo et al. also proposed a method where a traditional level set method is fused with a deep learning approach.

In recent years, deep learning methods have been employed frequently for MRI-based LV segmentation and quantification. For obtaining LV indices, one approach is to segment the myocardium with a CNN and calculate relevant metrics afterward. For example, Avendi et al. combined several 2D CNNs and deformable models [26]. The first step of their method is to detect an ROI around the heart, which is extracted from the entire cardiac MR image using a CNN. Next, the authors train a segmentation model to obtain the general shape of the myocardium. Then, an energy minimization strategy is used to adjust the predict segmentation’s contour for improved performance. Finally, the LV area is derived from the obtained segmentation mask. Note that the method did not perform full LV quantification by calculating all relevant indices. Yang et al. also employ a two-step approach where a CNN first detects a bounding box around the heart [566].

Then, a U-Net model performs segmentation of the LV cavity. The authors also compare several different U-Net variations. Romaguera et al. directly segment the LV cavity

5.4 MRI-Based Left Ventricle Quantification

from MR images using a fully convolutional CNN, without a distinct decoder in their model [407]. They compare to several classic approaches for LV segmentation and present a slight performance improvement. Another approach performs LV segmentation in polar instead of Cartesian image space [480, 481]. First, the authors use a CNN to locate the center of the LV cavity. Then, the image is transformed into polar space, and a second CNN predicts the inner and outer border of the myocardium. Poudel et al.

considered the 3D spatio-temporal learning problem by also considering neighboring slices [384].

Similar to the work by Jafari et al. for echocardiography [220], an encoder-decoder architecture, augmented by recurrent units, is employed. The authors demonstrate performance improvements over classic approaches and a slice-wise processing method.

Mortazi et al. performed segmentation of multiple cardiac structures, including the LV, using multiple planar views of the heart [349]. The authors also compared MRI- and CT-based segmentation, finding similar performance. A very different approach was pursued by Mo et al. where an agent traverses the MR image to build up the contour of the LV cavity [345]. In an iterative process, patches are sampled from the MR image, which are processed by a CNN that predicts a velocity vector. This vector is used to sample the next patch for processing. Using the Poincaré map, the authors develop a stopping criteria for the trajectory generation process. The authors demonstrate performance improvement over conventional segmentation methods. A recent approach by Hu et al. combined deep learning with a conventional contour optimization algorithm based on dynamic programming [204]. First, a CNN predicts the rough contour of the endocardial and epicardial border. Then, several refinement steps follow to adjust the two borders. The authors also rely on a polar image representation for their post-processing procedure. A recent study by Tao et al. investigated the performance of CNN-based LV segmentation and LV indices calculation when using training data from different devices and different centers [483]. In general, performance improved significantly when using a multi-center and multi-device dataset for training.

As an alternative to LV segmentation and subsequent calculation of relevant indices, the values can be directly regressed from the 2D MR slices. This was proposed by Xu et el. [562] and found a wider application afterward. Here, the authors employed a 2D Auto-Encoder CNN for the extraction of relevant features. On top of the network, the authors stacked a small CNN for indices regression from the output of the Auto-Encoder. The authors compared their method to conventional feature extraction paired with random forests for direct indices regression, demonstrating a performance improvement. In an extension of their work, Xu et al. also considered temporal context by using a CNN as a feature extractor from individual slices, which was followed a recurrent neural network for aggregating temporal context [561, 563]. An additional auxiliary output was used to also predict the current cardiac phase. As a total of eleven LV indices have to be predicted for full quantification, the authors proposed a multi-task relationship learning scheme for capturing the correlation and dependence of the different indices.

A similar approach was pursued by Li et al., where a multi-task relationship loss was employed [292]. Here, the authors did not explicitly incorporate temporal information and determined the cardiac cycle based on a polynomial fit to the predicted cavity area size. Jang et al. proposed to perform direct indices regression using 2D and 3D spatio-temporal CNNs [222]. In particular, the authors also design a CNN that uses alternating

2D and 3D convolutions for more efficient processing. They find a slight performance improvement when employing CNNs with 3D convolutions.

Other methods have combined segmentation and regression, for example, by regress-ing indices from a segmentation with an end-to-end model [528]. Here, the authors first pretrain a CNN for segmentation of the LV myocardium. Then, an additional CNN is plugged onto the model’s output that processes the predicted segmentation maps and regresses the LV indices. The authors claim substantial improvement over previous direct regression methods. Xu et al. compared direct indices regression and calculation of indices from predicted segmentation masks [558]. The authors found that calculation from segmentation maps appears to be beneficial. When jointly predicting a segmenta-tion map and LV indices, performance is higher than for regression only but lower than

Im Dokument Deep Learning with Multi-Dimensional Medical Image Data (Seite 121-124)