2.2 Spinal Mestastasis Segmentation
4.1.4 Materials and Methods
In order to evaluate the proposed method a dataset was assembled, con-sisting of 26 follow-upMR images of patients who underwentRFAs of spinal metastases at the Department of Neuroradiology of the Univer-sity Hospital of Magdeburg. These images were acquired between one and three days after the intervention including, among others, sagittal contrast-enhanced T1-weighted and nativeT2-weighted MRI sequences.
Both sequences were chosen since they are most commonly used for visual examination regarding treatment outcome validation by neurora-diologists in this particular case. The image data differed w.r.t. specific acquisition parameters and scanner configuration. The scan resolution within a slice ranged between the individual patient cases and imag-ing sequences from 0.45 mm to 1.25 mm, the spacimag-ing between adjacent slices ranged from 3.3 mm to 4.8 mm. The image volumes of each patient case were pre-processed by a cubic interpolation between the original number of slices (between 15 and 25) to a fixed number of 64 to yield almost isotropic spatial resolution and simplify any following processing steps. An experienced neuroradiologists manually contoured each necro-sis slice-wise. Thus, the input data could be used for training purposes as individual slices or as patient-wise merged volumes.
Data Augmentation
Due to the relatively small amount of available data, each of the 26 original MRIvolumes were extensively augmented using the following techniques:
• Gaussian blur:The images were blurred with a Gaussian filter with σin the range from 0 to 0.5.
• Gamma transformation:Gamma transformations withγin the range from 0.5 to 2 were applied to modify image intensities.
• Mirroring: Each patient volume was flipped in all directions. This included vertical flips, i.e. craniocaudal. Even though it may appear inappropriate, preliminary studies have proven it to be advanta-geous for the final results due to the avoidance of fast overfitting.
• Scaling: The image volumes were scaled with randomly chosen factors between 0.6 and 1.4.
• Rotation:Rotations were applied to the image volumes in the range of±30◦around the transversal axis and between±20◦ around the sagittal axis.
• Elastic deformations: Random displacement fields with subsequent Gaussian smoothing the grid with aσranging between 0 and 0.3 were used to elastically deform the image volumes (cf. Ronneberger et al.,2015).
• Random Cropping:Finally, each patient volume was translated in a random cropping manner within a range of±20 voxels in sagittal and vertical direction w.r.t. the centers of the necroses mc and subsequently cropped to patches of a fixed size of 128×128×64 voxels.
After the augmentation each image volume patch was whitened by mean subtraction and a subsequent division by the standard deviation.
It was ensured that each patch contained at least fractions of necrotised tissue. For the purpose of a stratified cross-validation, the patient data was grouped into six folds with a 21/5 (training/validation) split for two folds and a 22/4 split for the remaining four. Each patient volume within the training set was augmented 150 times, yielding in total 3,150 volumetric and 201,600 cross-section training samples, respectively for both 21/5-split folds, and 3,300 and 211,200, respectively for the remaining four folds.
CNN Architecture
Likewise to the similar data augmentation strategy implemented for the spinal metastases segmentation (see Section 2.2.4), U-net and vU-net ar-chitectures were applied to this task (see Figure4.2). Again, a patch size of 128×128 pixels for 2D slices as U-net input and 128×128×64 voxels
Figure 4.2: The U-net architecture used for multimodal 2D image input. The vU-net architecture for 3D input was analogous, but with an additional dimension for all layers and convolutional kernels. A significant difference between the two variants was the number of trainable pa-rameters, which was about 2.85 times higher for the vU-net variants.
for volumetric vU-net input was defined, while an additional dimension contained either one or multipleMRIsequences as channels. According to the implemented network variant, the convolutional layers had a kernel size of 3×3 (×3), except for the last one, which applied a 1×1 (×1) kernel to reduce the dimensionality to the desired output size. Subsequent to each convolutional layer a batch normalisation followed. Downsam-pling was performed by strided convolutions (stride of 2) and simplified upsampling layers replaced the commonly used up-convolutions (Isensee et al., 2017).ReLU was implemented as the activation function of each convolutional layer, except for the last layer, where a sigmoid function was used to transform the resulting values in the range between zero and one. Again, only a single epoch was used with iterations equal to the number of available training samples, Tversky loss was implemented as a loss function and Adam was used as an optimizer with a starting learning rate of 0.01. With regards to the limited GPU memory resources, especially if incorporating image volumes, mini-batch sizes were defined as 2 for volumetric input of the vU-nets and 32 for 2D image slices as U-net input. Finally, a threshold of 0.5 was applied to produce binary output images.
Experimental Design
Similar to the experimental design described in Section2.2.4, various input and network configurations were tested. Thus, differentMRIsequences were applied either as monomodal or multimodal input to both, the U-nets and vU-nets. Dependent on the number of available patient cases, stratified 6-fold cross-validation over disjunct subsets was performed, with either four or five patients within each validation set. The results stated in the following refer to the average over all 6 cross-validation folds. The problem of limited training data was even more critical for this task, as only two thirds of the patient data were available compared to the spinal metastases segmentation of Section2.2.4. Hence, a separate test set
was not comprised, since it would further reduce the available samples which hardly seemed to indicate promising training results. This, however, did not lead to biased results or undermined an un-biased evaluation of the trained models, since no training and design decisions were based on intermediate validation results (no look-ahead bias).
Evaluation
Reference, i.e. ground truth, segmentations of the necrosis zones were performed by a neuroradiologist using contrast-enhancedT1-weighted and nativeT2-weightedMRIsequences of each patient. For this purpose, a manual segmentation framework was set up to enable slice-wise con-touring of the necrosis zones with subsequent 3D merging to volumetric masks. These segmentations were fed to theCNNs along with the cor-responding image data in order to train the model, which subsequently predicted binary segmentation masks of unseen necrosis zones. DSC, sensitivity and specificity, as well asASDandHD95were used as quality measures to evaluate the produced segmentations. Those measurements always referred to 3D patient volumes, which were either directly pro-duced by the vU-nets or as patient-wise merged 2D predictions by the U-nets.