• Keine Ergebnisse gefunden

3.2 Comparison of TKV Measurement Methods

3.2.5 Conclusion

Different methods for KV computation were evaluated in terms of reproducibility, accuracy, precision and time required on both MR and CT representative images. The dataset in the main experiment consisted of 30 kidneys each from MRI and CT scans, with a wide range of SKV. Overall, planimetry methods and stereology showed the highest reproducibility, low bias, and desired accuracy and precision. However, the reproducibility of planimetry and stereology was inferior on MR than CT dataset, likely attributed by lower image quality on MR compared to CT, making kidney identification on MR further operator-dependent. High intra-rater variability was reported for the beginner operator suggesting that KV computation on MR needs to be performed by expert operators, to reliably detect KV changes. On MRI, highest accuracy and precision were observed for planimetry based methods, while on CT stereology performed equally well which might again be attributed by higher image quality

34 Chapter 3 Kidney Volume Measurement in ADPKD

AbsolutechangeinTKV(ml)PercentagechangeinTKV(%) KVcomputationmethodOctreotide-LARPlacebopOctreotide-LARPlacebop (n=38)(n=37)(n=38)(n=37) ImageJPolyline46.1±112.3143.7±158.10.032(<0.05)2.57±6.076.72±5.890.003(<0.01) Stereology45.8±114.1152.1±160.40.018(<0.05)3.30±7.147.00±5.830.016(<0.05) Mid-slice40.1±129.8127.2±186.00.111(NS)2.89±10.716.55±8.060.098(NS) Ellipsoid36.3±153.9125.4±179.10.102(NS)2.54±11.666.35±9.990.132(NS) Kidneylength] -0.25±1.220.26±1.970.115(NS)-0.69±3.971.10±6.690.165(NS) Tab.3.5.ValidationStudy:Totalkidneyvolumechangescomparedwithbaselineat1yearoftreatmentwithplaceboorOctreotide-LAR.Totalkidneyvolumewasassessedby differentkidneyvolumecomputationmethodsonMRimagestakenfromtheALADINclinicalstudy.Abbreviations:LAR,long-actingrelease;KV,kidneyvolume;NS,not statisticallysignificant;TKV,totalkidneyvolume(sumofrightandleftkidneyvolumes).]Kidneylength(incm)iscomputedassumofrightandleftkidneylengths.pvalues fromANCOVA(absolutechange)orunpairedt-test(percentagechange).

3.2 Comparison of TKV Measurement Methods 35

and more number of axial sections compared to MRI. The mid-slice method and ellipsoid equation, despite providing quick KV estimates, were less reproducible and showed lowest precision and accuracy on both MR and CT images.

Our work and previously reported investigations provide evidence that both mid-slice and ellipsoid equation cannot detect KV changes in the range of 3 to 5% due to much lower precision ranging between 10 and25%(i.e. SD of the difference between KV calculated by these methods and the reference method). The validation experiment in our work also showed that these simplified methods are not precise enough to be utilized in clinical studies for capturing between-treatment changes in TKV that might develop over one-year treatment period. Moreover, owing to the high variability in estimating TKV, both mid-slice and ellipsoid methods require approximately 4-fold larger sample size than ImageJ polyline to capture significant difference between TKV changes in the two treatment groups. The results also show that stereology allows detection of difference in TKV between the treated and control groups.

Other than SKV measurements, kidney length is of interest since it can be easily computed on ultrasound investigations. It has been recently proposed as predictor of disease progression [13] and shows linear correlation with kidney volume, assessed on either MR or CT. However, the correlation is accompanied with very low precision and therefore, kidney length may be restricted to be used only for rough estimations of TKV. Our validation study also shows that kidney length is not accurate as desired to identify between-treatment changes, suggesting that it should not be recommended as outcome measure for clinical trials.

Despite having advantages of precision and accuracy, planimetry methods require 20 to 40 minutes on average for SKV measurement (21 to 35 min for expert operators, for two kidneys).

Stereology reduces this average time for SKV measurement to 15 to 17 minutes and time required is reduced to great extent by the simplified methods (5 to 10 minutes approximately), but at the cost of reduced precision and accuracy of SKV measurements. To overcome time requirement and operator-dependency limiting the planimetry methods, it would be ideal to use completely automated approaches.

The limitations of manual segmentation and stereology for efficient TKV computation in clinical studies, provide good motivation for investigating novel strategies to improve segmentation of polycystic kidneys from acquired imaging (CT or MR) dataset. Some attempts to develop automatic segmentation tools have been reported in literature and also described in chapter 2, but achieving desired accuracy and precision required for clinical studies is a challenging task.

In the next chapters, we describe two different machine learning methods based onrandom forestsanddeep convolutional neural networks, respectively for segmentation of polycystic kidneys from CT dataset of ADPKD patients. We show that by formulating the segmentation task into a pattern-recognition problem and training an efficient classification model, it is possible to identify complex patterns within the data, thereby facilitating fast and reproducible segmentation for TKV measurement in ADPKD.

36 Chapter 3 Kidney Volume Measurement in ADPKD

Part II

Machine Learning based Approaches for Segmentation

The original question, "Can machines think?" I believe to be too meaningless to deserve discussion.

Alan Turing ("Computing Machinery and Intelligence" - Mind 59 (1950): 433-460)

4

Random Forests for Segmentation

4.1 Introduction

Random Forests, or more generallyDecision Forestsare a popular ensemble learning method that have been successfully applied to a number of computer vision, machine learning, and medical image analysis tasks. One of the initial works ondecision treesby Breiman et al.[19]

describing classification and regression trees (CART) strongly influenced later developments in this field.Decision Treesare directed acyclic graphs consisting of a hierarchy of feature learners in an ensemble of a decision model. They use predictive modelling for making probabilistic decisions in machine learning applications. Decision trees became popular because they are computationally inexpensive, allowing fast model construction which can be also be used on very large training datasets, and can be devised to take into account the uncertainty in a probabilistic function. One of the most popular algorithms for training optimal decision trees is theC4.5by Quinlan [101]. For growing a decision tree, heuristic-based approaches are used to guide the decision tree algorithm in the vast hypothesis space. However, solely learning an optimal decision tree is known to be an NP-complete problem [77], that can lead to complex models which do not generalize well due to overfitting of the training dataset.

Based on the ensemble learning, weak decision trees, also known as the Random Forests were constituted aiming to optimize a single complex tree. They consist of an ensemble of independentdecision treesfollowing a divide and conquer strategy in a probablistic framework to solve regression, classification or clustering based tasks. Random Forests can achieve better generalization by averaging their predictions in a learning process over de-correlated trees.

T. K. Ho [56] first introduced random decision forests for handwritten digit recognition. In subsequent work [57], random forests were shown to yield superior generalization compared to both boosting and pruned C4.5 trained decision trees. In another approach, by introducing randomness during the learning process, also known as bagging, it was possible to train independent trees with a random subset of the training data [18] . Random forests have since been used for several tasks including regression, classification, semi-supervised and/or manifold learning in both medical and general applications.

39