Method 2: CNN-MRI - 6 Age Estimation - Towards Automated Age Estimation of Young Individuals

6 Age Estimation

6.2 Method 2: CNN-MRI

6.1.3 Training

After the setup, the models are trained on the training data. For most supervised⁵ ML algorithms, the training can be expressed as an optimization problem. The objective is to ﬁnd the optimal mapping functionf(x) to minimize a task-speciﬁc loss function on the training set,

min

wherenis the number of training samples,θthe parameter of the mapping function, Lthe loss function,xi the feature vector of theith sample, and yi the associated label [201]. An example isleast squares regression, where the goal is to minimize the sum of squares of the diﬀerences between true and predicted labels.

For this work, multiple ML models are trained on chronological age regression and majority classiﬁcation based on the mentioned data and algorithms. They solve the tasks using the following combination of features:

1. Anthropometric measurements (AM)

2. Ossiﬁcation stages of the three growth plates of the knee (OS) 3. Score of the knee joint (SKJ)

4. AM and SKJ

Given the feature combinations, the number of algorithms, and the two tasks, a total of 44 model variants are trained and evaluated. Refer to the Scikit-learn documentation⁶for more information on the speciﬁc algorithms, their parameters, and their optimization function.

6.2 Method 2: CNN-MRI

Neural networks are capable of learning and extracting information that is relevant to a speciﬁc task [187]. This assumes that the amount of samples they learn from and the complexity of the problem, enable them to ﬁnd correlating features. With

5“Supervised learning refers to a class of systems and algorithms that determine a predictive model using data points with known outcomes” (deﬁnition by DeepAI)

6https://scikit-learn.org/stable/modules/classes.html

this in mind, the initial idea of this work was to use CNNs to solve the age estima-tion task based on the pre-processed (chapter 4) MRIs, without the need to extract age-relevant features via segmentation (chapter 5) or other techniques. Hence, an architecture similar to the contracting path of the CNN for segmentation (Fig. 5.8) was adapted to the age regression problem. It proved to be rather unsatisfactory to predict the age of a subject based on the pre-processed MRIs only. The training was unstable and early convergence for the validation loss was observed (Fig. 6.5).

This suggestsunderﬁtting, i.e. the model is not capable of generalizing well on new data (generalization gap). Even after optimizing the hyperparameters of the afore-mentioned CNN and training during more epochs, a similar outcome was observed.

The conclusion from these initial observations is that the problem needs to be simpli-ﬁed. The following hypothesis is deﬁned: by reducing the MRIs to the age-relevant structures via bone segmentation, a stable age estimation is possible.

This section describes how Method 2 (Fig. 6.1) enables the simpliﬁcation of the age regression problem in order to evaluate the hypothesis. The ﬁrst part prepares the data for the CNN by extracting age-relevant objects from the MRIs tion 6.2.1). Next, a CNN architecture is engineered for the new scenario (subsec-tion 6.2.2) and trained onmasked2D images (subsection 6.2.3). The model

predic-0 50 100 150 200 250 300 350 400 450 500

Figure 6.5: Train vs. validation losses for the age regression task using 2D MRIs withoutthe bone segmentation step. The loss on the validation set converges early while the training set loss continues improving. This is a sign of underﬁtting (generalization gap).

6.2 Method 2: CNN-MRI

tions per image slice are then sorted by subject and used to train ML algorithms on two tasks: the regression of the chronological age of a subject and on the major-ity classiﬁcation using the 18-year-limit. The usage of ML algorithms enables the possibility of incorporating the AM and OS as additional features to solve the tasks.

Method 2 is applied to both coronal and sagittal MRIs of this work. For simplicity, the following subsections describe the method on the basis of the coronal images.

6.2.1 Data Preparation

Input data. The input data ofMethod 2are the preprocessed MRIs, the predicted segmentation maps, and the subject ages. The maps are acquired from the best segmentation model resulting from the 5-fold cross-validation (subsection 5.7). Af-terwards, they are multiplied with the preprocessed MRIs to generatemasked images (Fig. 6.6). The subject ages are retrieved from the formatted ﬁlenames of the images (Fig. 4.3) and therefore no further data has to be supplied.

Due to the slicing technique of MRI, the outer slice of an MRI volume can exhibited sparse tissue information. This holds for the masked images as well, which show limited or no bone structures in the outer slices. Supplying a CNN for age estimation with this sparse information can cause a misconception of how the actual age is deduced from the images. Especially, when training a model for age estimation based on 2D image data where context information from adjacent slices is missing.

Pre-Processed MRI Segmentation Map Masked MRI

Figure 6.6:Pre-processed MRI slice (left), predicted segmentation map (middle), and masked image slice (right). Using the maps to mask the images enables the extraction of age-relevant structures in the image.

Removal of sparse bone information. The removal of sparse information in the images is attained in two data reduction phases (Fig. 6.7). Theﬁrst reduction phase consists of removing all image slices containing little or no bone structures. This is achieved by ﬁrst computing the total size of the segmented area of a 3D image and then removing all image slices with less than 2% of the total size. In the second reduction phase, the number of slices per masked image stack is reduced to a predeﬁned minimum of 12 slices for all MRIs. This is necessary for several reasons.

First, images from Dataset A and Dataset A have 41 and 24 slices, respectively, which causes an imbalance in the data used for the CNN. Second, it ensures that the slices of diﬀerent subjects contain similar bone information. Third, it enables the possibility not only to train a neural network for age estimation based on 2D but also on 3D image data. CNNs using 3D input data require equal dimensions across all axes. Lastly, given thatMethod 2 is based on 2D image data and predicts an age for each image slice, ML algorithms can be trained on the age predictions of all slices.

During reduction phase 2 several actions are performed. At ﬁrst, the amount of bone structures per slice of a 3D image is computed based on the segmentation map.

Then, a reference image slice is deﬁned as the starting point to extract the minimum of 12 slices. The reference slice is obtained from the bone-amount distribution per slice by calculating the center of gravity (COG) of the distribution. Finally, slices are evenly selected below and above the reference to match the minimum of 12 slices.

The whole process of removal of sparse information is executed separately per stack of 2D masked images. The pseudo-code is shown in Algorithm 4.

Masked MRI

n < 24

n = 24 n = 12

Reduction phase 1 Reduction phase 2

Figure 6.7: Removal of sparse bone information in two phases: ﬁrst, the slices from the masked MRI that contain sparse or no bony structures are removed and subsequently, a predeﬁned minimum of 12 slices are evenly extracted from the image volume for all MRIs. The reduc-tions ensure a balancing of the CNN input.

6.2 Method 2: CNN-MRI

Algorithm 4: Removal of sparse bone information

Im Dokument Towards Automated Age Estimation of Young Individuals (Seite 84-88)