Model Performance on Regression - Towards Automated Age Estimation of Young Individuals

The performance of multiple model variants usingMethod 1(section 6.1),Method 2 (section 6.2), andMethod 3 (section 6.3) was analyzed for the age regression task.

The models were evaluated on all ten training rounds (all) or only the best one of each fold (best). All regression results were compared to a direct statistical evaluation of the training data, designated asstat(subsection 6.4). Thestatdiﬀered between methods since the number of samples and the distribution of the training data was diﬀerent. However, the same subjects were included in the test sets of each fold, irrespective of the method. This allowed an unbiased comparison between Method 1,Method 2, andMethod 3. The principal metric to deﬁne the performance of the models on age regression was the MAE. The performance was measured on the test set only, i.e. the part of the dataunknownto the models. The test sets includedncor= 35 subjects for all models of M1, M2 (coronal MRIs), and M3 and nsag= 75 subjects for the models of M2 (sagittal MRIs).

To present the results in the tables, the following designations are necessary due to the high number of model variations:

• Method 1 (M1), Method 2 (M2), Method 3 (M3)

• Anthropometric measurements (AM), ossiﬁcation stages of the knee growth plates (OS), score of the knee joint (SKJ), coronal knee MRIs (COR), sagittal knee MRIs (SAG)

• Support-vector regressor (SVR), extremely randomized trees regressor (ETR), gradient tree boosting regressor (GBR)

The results for Method 1 show how ML algorithms trained on AM, OS, and SKJ improve the statistical evaluation of the training set (Table 7.4). For bothall and best rounds, the lowest MAE was achieved with a combination of AM and SKJ, suggesting that more and diﬀerent data can help to reduce errors on age regression.

Using SKJ or OS was indiﬀerent, but the former was preferred as it is a single feature instead of three. Finally, the best average MAE of 0.77 ± 0.60 years is attained with a support vector regressor. Less than 5% of test subjects from all folds had a deviation from the true age above two years.

Table 7.4:Age regression performanceof several model variants from Method 1 (M1) on the test sets in an “extended” 5-fold cross-validation using AM, OS, and SKJ

Rounds Data Regressor MAE±SD Max AE %≤ |2.0y|

AM+SKJ GBR 0.84±0.63 2.62 94.86

best

AM SVR 1.00±0.69 2.57 92.00

OS GBR 0.90±0.68 3.00 92.57

SKJ GBR 0.90±0.68 2.98 92.57

AM+SKJ SVR 0.77±0.60 2.58 95.43

∗: predicts all subjects with the mean age of the training set all/best: all ten or best training rounds per fold are included

Thestatof the models based onMethod 2 (MAE of 1.63±0.99 years) was higher in comparison to M1 due to a larger and augmented training set. All model variants of M2 achieved better results than the stat(Table 7.5). Considering alltraining rounds, the MAE improved about 12% and the SD about 15% by using a regressor on the age predictions of the CNN. The inclusion of AM and SKJ as features to the ML regressors did not markedly improved the results. Overall, the model with the

7.3 Age Estimation Results

best average performance on age regression, achieved an MAE of 0.69±0.47 years and maximum AE of 2.15 years. It combined all available data and used CNN and ETR in succession to regress the ﬁnal age of the subjects of the test sets.

Table 7.5:Age regression performanceof several model variants fromMethod 2 (M2) on the test sets in an “extended” 5-fold cross-validation using coronal knee MRIs, AM, and SKJ

Rounds Data Regressor MAE±SD Max AE %≤ |2.0y|

- - stat^∗ 1.63±0.99 3.59 59.70

COR CNN 0.81±0.65 3.55 94.00

COR CNN+ETR 0.71±0.55 2.46 96.78

all

COR+AM+SKJ CNN+SVR 0.73±0.55 2.39 97.60

best

COR CNN 0.79±0.62 3.49 95.05

COR CNN+ETR 0.67±0.49 2.10 98.86

COR+AM+SKJ CNN+ETR 0.69±0.47 2.15 98.29

∗: predicts all subjects with the mean age of the training set all/best: all ten or best training rounds per fold are included

Method 2 was applied to a larger dataset containing sagittal MRIs as well which altered thestat, in comparison to the coronal case, to an MAE of 1.93±1.20 years and a maximum AE of 4.74 years due to the broader and more uniformly distributed age range of the training set. Similar to M1, the models of M2 were superior tostat and the errors substantially reduced with the inclusion of ML-based regressors to combine CNN age predictions (Table 7.6). The best performing model variant of M2 using sagittal MRIs attained an MAE of 0.79±0.57 years and a maximum AE of 2.63 years. Less than 5% of predictions deviated more than two years from the actual chronological.

The ﬁnal evaluated method on regression was M3, which integrated the AM and OS directly into the CNN, instead of including these features into the training of the ML algorithms following the CNN. Similarly to M1 and M2, the statistical mean of the training set (stat) was surpassed by the models based on M3 (Table 7.7).

Likewise, the combination of CNN and ML regressors improved the results further.

Ultimately, the lowest prediction errors were achieved by using the CNN on mixed data and an ETR in succession, attaining an average MAE of 0.71±0.54 years over all ﬁve folds and maximum AE of 2.2 years.

Table 7.6:Age regression performanceof several model variants fromMethod 2 (M2) on the test sets in an “extended”

5-fold cross-validation usingsagittalknee MRIs Rounds Data Regressor MAE±SD Max AE %≤ |2.0y|

- - stat^∗ 1.93±1.20 4.74 54.98

SAG CNN 0.92±0.73 4.31 90.91

all SAG CNN+SVR 0.81±0.62 2.86 94.85

SAG CNN 0.89±0.70 4.22 92.44

best SAG CNN+ETR 0.79±0.57 2.63 95.73

∗: predicts all subjects with the mean age of the training set all/best: all ten or best training rounds per fold are included

Table 7.7:Age regression performanceof several model variants fromMethod 3 (M3) on the test sets in an “extended” 5-fold cross-validation using coronal knee MRIs, AM, and SKJ

Rounds Data Regressor MAE±SD Max AE %≤ |2.0y|

- - stat^∗ 1.63±0.99 3.59 59.70

COR+AM+SKJ CNN 0.92±0.70 3.47 91.49

all COR+AM+SKJ CNN+ETR 0.84±0.65 2.74 92.78

COR+AM+SKJ CNN 0.85±0.64 3.29 94.33

best COR+AM+SKJ CNN+ETR 0.71±0.54 2.20 95.43

∗: predicts all subjects with the mean age of the training set all/best: all ten or best training rounds per fold are included

Im Dokument Towards Automated Age Estimation of Young Individuals (Seite 122-125)