• Keine Ergebnisse gefunden

5 CNN-based Segmentation

5.1 Dataset Split

The inputs of the CNN are knee MRIs and their corresponding gold-standard seg-mentations (section 4.5). The data is split into three subsets which is a common procedure when using CNNs. The sets are denominatedtraining set,validation set, andtest set, and each one is used for a different purpose.

The training set is commonly the largest of the three sets and contains the data that is applied to the actual learning process of the neural network. It is the only portion of the data the network will see and directly learn from. The validation set is used to regularly measure the performance of the model. If the accuracy on the training set is above the one on the validation set during several consecutive training iterations, than the model is beginning to memorize the training data itself.

This phenomenon is known as overfitting. Additionally, the validation set is used to fine-tune the model’s hyperparameters. By doing so, the network learns from the validation set, but only indirectly. The third and last subset is the test set.

Contrary to the other sets, it is never involved in the learning process but used as a final performance measure of the model. The objective is to acquire a reliable and independent evaluation of the model based on this third subset.

One-hundred coronal MRIs were split into 70% for the training set, 15% for the validation set, and 15% for the test set. This is a common split ratio of machine learning and deep learning algorithms. The split was random and it was performed for separately per dataset to ensure that all sets had images from Dataset A and Dataset B (Table 5.1).

5.2 Augmentation

Table 5.1:The coronal MRIs were split into three sets for the segmentation task:

training,validation, andtest sets.

Set Split ratio 3D images 2D image slices Dataset A Dataset B Dataset A Dataset B

Training 70 % 12 58 492 1392

Validation 15 % 3 12 123 288

Test 15 % 3 12 123 288

Total 100 % 18 82 738 1968

5.2 Augmentation

Data augmentation is a common approach in ML to artificially increase the num-ber of datasets [106]. Generally, the dataset sizes for CNNs range from tens of thousands to millions [106], which is a number difficult to attain for medical applica-tions [157]. The larger the dataset the better the model can account for variability in the data. The types of augmentation on images include translation, rotation, reflections, blurring, brightness, contrast, and gamma adjustments, and other mod-ifications [117, 189]. Augmentation allows the model to learn invariance to such modifications [41, 157]. Moreover, it serves as a type of regularization mechanism to mitigate overfitting by introducing more variation in the data [106, 117, 158].

This approach is generally only performed on the training set from which the model learns directly, while the validation and test sets are excluded in order to properly evaluate the model on real life scenarios.

The following types of augmentation were applied to each 2D image slice of the training set (Fig. 5.2):

Horizontal translationin pixel∈ {−24,24}

Rotationaround z-axis in degrees∈ {−5,5}

Horizontal flip

FOV changewith a factor∈ {0.9,1.0,1.1}

The augmentation was performed on the downsampled (448×448 pixels) and pre-pro-cessed images (chapter 4). Each modification was appliedone at a time, e.g. an im-age was not rotated and additionally translated, with exception of the FOV change.

The latter was performed on the original images as well as on the translated, rotated, and flipped ones. In contrast to the regular use of augmentation, the FOV change

Reference VOI 0.9×FOV 1.1×FOV Flip

Shift right Shift left Rotate CCW Rotate CW

Figure 5.2: Augmentation of knee MRIs to virtually increase the size of the training set used for CNN-based segmentation

alone was also applied to the validation and test sets because it did not change the positions of the knee bones relative to the VOI. By augmenting the images the CNN learns to be invariant to these modifications.

Horizontal translation was introduced to mimic small localization errors of the au-tomated cropping method (section 4.3). Thus, the CNN learns to detect bone irre-spective of it’s exact positioning in the image frame. The translations were limited to 24 pixels in each direction to maintain the integrity of the shape of the bones.

Vertical translation was not performed in order not to undo the effect of automated cropping, which had the advantage of vertically aligning images around the knee joint cavity.

Clockwise (CW) and counter-clockwise (CCW) rotations of 5 around the z-axis were applied to the images to account for variations of subject positioning in the MR-scanner and for anatomical differences. Higher rotation amounts were avoided to create rather “realistic” representations of knees in MRI.

Horizontal flips were included to artificially generate images from both the left and right knee. This type of augmentation was especially useful for the robust detection of the Fibula since the bone lay in the bottom left or right of the image slice

5.3 Resampling

depending on the laterality. Vertical flips do not represent valid representations of a MRI sequence of the knee and were consequently not considered.

FOV changewere brought into use to account for anatomical differences in subjects.

The center of the standardized VOI (section 4.3) was used to create two additional VOIs; one 10% smaller and the other 10% larger than the reference size of 130×130 mm2. This type of modification effectively created three different FOVs of the same MRI (Fig. 5.2).

Applying translation and rotation after the automated cropping (section 4.3) causes parts of the structures to exit and new pixels to enter the image frame. These new pixels are unknown information in the VOI and filled with zeros, but do not represent actual anatomical structures. Yet, this information is, in most cases, available in the full size images. Therefore, augmentation was performedpriorto the actual cropping to recover “lost” anatomical structures. This was an improvement in comparison to the initial approach in [148]. Refer to appendix C for more details and visual examples.

5.3 Resampling

In theory, CNNs have no image size limitation, but it is recommended to reduce the spatial resolution to decrease the number of calculations. Moreover, all images are required to have equal dimensions for the training. Thus, the preprocessed, augmented, and cropped images of this work were resampled to a uniform size per slice of 224×224 pixels, while also being a common resolution used for CNNs.

Resampling can have the drawback of causing loss of information, i.e. detail loss of anatomical structures. Nevertheless, only a small amount of resampling was necessary due to the automated cropping (section 4.3), which previously reduced the image size. The final size of 224×224 pixels per image slice provided enough detail to identify the shape of Femur, Tibia, and Fibula. The images of all three sets were resampling to this size since it is a requirement for the input shape of the CNN.