• Keine Ergebnisse gefunden

The predictive value of segmentation metrics on dosimetry in organs at risk of the brain.

N/A
N/A
Protected

Academic year: 2022

Aktie "The predictive value of segmentation metrics on dosimetry in organs at risk of the brain."

Copied!
13
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

ContentslistsavailableatScienceDirect

Medical Image Analysis

journalhomepage:www.elsevier.com/locate/media

The predictive value of segmentation metrics on dosimetry in organs at risk of the brain

Robert Poel

a,b

, Elias Rüfenacht

b

, Evelyn Hermann

a,c

, Stefan Scheib

d

, Peter Manser

e

, Daniel M. Aebersold

a

, Mauricio Reyes

b,

aDepartment of Radiation Oncology, Inselspital, Bern University Hospital, and University of Bern, Bern, Switzerland

bARTORG Center for Biomedical Research, University of Bern, Bern, Switzerland

cRadiotherapy Department, Riviera-Chablais Hospital, Rennaz, Switzerland

dVarian Medical Systems Imaging Laboratory, GmbH, Switzerland

eDivision of Medical Radiation Physics and Department of Radiation Oncology, Inselspital, Bern University Hospital, and University of Bern, Bern, Switzerland

a rt i c l e i nf o

Article history:

Received 29 January 2021 Revised 29 June 2021 Accepted 2 July 2021 Available online 13 July 2021 Keywords:

Segmentation parameters Clinical validation Radiotherapy Brain OARs

Automatic segmentation Dose distribution

a b s t r a c t

Background: Fullyautomaticmedicalimagesegmentationhasbeenalongpursuitinradiotherapy(RT).

Recentdevelopmentsinvolvingdeeplearningshow promisingresultsyieldingconsistentandtimeeffi- cientcontours.Inordertotrainandvalidatethesesystems,severalgeometricbasedmetrics,suchasDice SimilarityCoefficient(DSC),Hausdorff,andotherrelatedmetricsarecurrentlythestandardinautomated medicalimagesegmentationchallenges.However,therelevanceofthesemetricsinRTisquestionable.

Thequalityofautomatedsegmentationresultsneedstoreflectclinicalrelevanttreatmentoutcomes,such asdosimetry andrelatedtumorcontrol andtoxicity.Inthisstudy, wepresentresultsinvestigatingthe correlation betweenpopular geometricsegmentation metrics and doseparameters for Organs-At-Risk (OAR)inbraintumorpatients,and investigatepropertiesthat mightbepredictivefordosechangesin brainradiotherapy.

Methods: Aretrospectivedatabaseofglioblastomamultiformepatientswasstratifiedforplanningdiffi- culty,fromwhich12caseswereselectedandreferencesetsofOARsandradiationtargetsweredefined.

Inordertoassesstherelationbetweensegmentationquality-asmeasuredbystandardsegmentationas- sessmentmetrics-and qualityofRTplans,clinicallyrealistic,yetalternative contoursforeachOAR of theselectedcaseswereobtainedthroughthreemethods:(i)Manualcontoursbytwoadditionalhuman raters.(ii)Realisticmanualmanipulationsofreferencecontours.(iii)Throughdeeplearningbasedseg- mentationresults.Onthereferencestructuresetareferenceplanwasgeneratedthatwasre-optimized foreachcorrespondingalternativecontourset.Thecorrelationbetweensegmentationmetrics,anddosi- metric changeswasobtained andanalyzed foreachOAR,by meansofthe meandoseand maximum doseto1%ofthevolume(Dmax1%).Furthermore,weconductedspecificexperimentstoinvestigatethe dosimetriceffectofalternativeOARcontourswithrespecttotheproximitytothetarget,size,particular shapeandrelativelocationtothetarget.

Results: WefoundalowcorrelationbetweentheDSC,reflectingthealternativeOARcontours,anddosi- metricchanges.ThePearsoncorrelationcoefficientbetweenthemeanOARdoseeffectandtheDicewas -0.11.For Dmax1%, wefoundacorrelationof-0.13.Similar lowcorrelations werefoundfor22 other segmentationmetrics.Theorganbasedanalysisshowedthatthereisabettercorrelationforthelarger OARs(i.e.brainstemand eyes)asforthesmallerOARs(i.e.opticnervesandchiasm).Furthermore, we foundthatproximitytothetargetdoesnotmakecontourvariationsmoresusceptibletothedoseeffect.

However,thedirectionofthecontourvariationwithrespecttotherelativelocationofthetargetseems tohaveastrongcorrelationwiththedoseeffect.

Corresponding author at: University of Bern ARTORG Center, Murtenstrasse 50 CH-3008 Bern, Switzerland . E-mail address: Mauricio.reyes@med.unibe.ch (M. Reyes).

https://doi.org/10.1016/j.media.2021.102161

1361-8415/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

(2)

R. Poel, E. Rüfenacht, E. Hermann et al. Medical Image Analysis 73 (2021) 102161

Conclusions: Thisstudyshowsalowcorrelationbetweensegmentationmetricsanddosimetricchanges forOARsinbraintumorpatients.ResultssuggestthatthecurrentmetricsforimagesegmentationinRT, aswellasdeeplearningsystemsemployingsuchmetrics,needtoberevisitedtowardsclinicallyoriented metricsthatbetterreflecthowsegmentationqualityaffectsdosedistributionandrelatedtumorcontrol andtoxicity.

© 2021TheAuthor(s).PublishedbyElsevierB.V.

ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/)

1. Introduction

For radiotherapy (RT) planning it is important to have accu- rate contours of the target as well as the organs that need to be spared. Contouring in clinical practice is predominantly per- formed by manual segmentation. Unfortunately manual segmen- tation is subject to inconsistencies which are known as inter- andintra-observervariability (Mazzara etal.,2004;Deeley,2011; Visser etal.,2019).Manualcontouringwill thusbe subjectto in- accuracies whichare known tohave a highimpact ontreatment quality (Jameson et al., 2010; Marks, 2013; Stanley et al., 2013; Sandström etal., 2016; Vinod etal., 2016; Cloak et al., 2019). In addition,manualsegmentationoftargetsandorgansatrisk(OARs) isaverytimeconsumingtask,varyingfrom1to4hdependingon locationandtumorextent(Bondiauetal.,2005;Hararietal.,2010; Deeley,2011;Voet etal., 2011).InthecurrentRT erawheredaily adaptive treatmentfindsits wayintotheclinic (Brock, 2019),the need offastandautomated segmentationisincreasing. Fullauto- maticsegmentation hasthereforebeenone ofthe“holy grails” in RT.

Recent publications have shown that auto-segmentation can yield consistent and time efficient contours for different tumor sites, which is summarized by Cardenas, 2019. Besides, for the common clinicalpractice,auto-segmentation canbe a usefultool creating data forretrospective studies.With the large amount of digitalimaginganddosimetricdata,large retrospectivestudieson treatmentoutcomeandtoxicitycanbeperformed..

The current state of the art auto-segmentation methods are basedondeeplearning(DL)andmoreparticularonconvolutional neural networks (Meyer et al., 2018). Ever more deep learning basedapproachesaredevelopedandarebecomingclinicallyavail- able through commercial products (Brunenberg, 2020; van Dijk et al., 2020). This new generation of auto-segmentation meth- ods has outperformed the quality of atlas based and traditional machine learning based auto-segmentation approaches. The first published deeplearningbased auto-segmentationstudies already showedresultsintermsofdicesimilaritycoefficient(DSC)ofwell above 0.8 (Roth et al., 2015; Ben-Cohen, 2016; Hu et al., 2016; Milletari etal.,2016;Zhou,2016;Litjens, 2017).Recent andmore sophisticated DLmethods show DSCs inthe rangeover 0.8, with some reportedcasesexceeding 0.9,dependingonthe typeofthe OAR(Cardenas,2019).MostrecentlyMlynarskietal.publishedim- pressive results in OARs of the brain by combining deep learn- ing with sophisticatedpost processingmethods (Mlynarski etal., 2020).

Although theseresultsare promising, aDSC of0.8 still leaves a lot of room for errors, especially in larger OARs, that might have a substantial impact onthe treatment. More importantly,it is unknownwhen andwheresuch an erroroccurs.Consequently, auto-segmentation results require thorough visual inspection by a trained professional, which again requires additional valuable time.

Tosolvethisissue,onecanaimtoimproveautomaticsegmen- tationresults intermsofgeometrical similarityparameters upto the point it reachesperfection (i.e. a diceof 1.0).This isan am-

bitioustaskthatispursuedby many.Sinceprogressoverthelast yearshavebeenincremental,itisuncertainifthisgoalcaneverbe achieved.Instead,inthisstudywefocus onhowwe canvalidate auto-segmentationresultsinsuch awaythattheycanpredictthe qualityofthetreatment.

The practical standard for validating automatic segmenta- tion is based on geometrical similarity indices of which the DSC and the Hausdorff distance are the most popular. In the case of RT we can think of other parameters that are perhaps more clinically relevant. Current methods for validating auto- segmentation results have been criticized before (Gooding etal., 2018;Maier-Heinetal.,2018;Nikolov,2018;Vaassenetal.,2020).

Gooding et al. (2018) suggest a qualitative measure of experts in thefield beingable to distinct auto-segmented contours from manually drawn contours. More recently, Kofler et al. suggested new parameters for loss functions based on quality assessment by experts(Kofler,2021). Other parameters have beensuggested, like added path length (Vaassen et al., 2020) or surface dice (Nikolov, 2018),todeterminehowvaluablecontoursareforRT,in termsofmanual adjustmentstime.Althoughthetimerequiredto adjustauto-segmentedresultsisimportant,themostrelevantpa- rametertolookatinRTistreatmentoutcome.Treatmentoutcome in general is quantified by tumor control and toxicity, both re- flectedbythedosedistribution.Dosedistributionisareadilyavail- ablemeasure,providedonehasaccesstoanRTtreatmentplanning system(TPS).

Describing the correlation between contour variation and dosimetry,toourknowledgehasonlybeenexploredbyXianetal.

(XianandChen,2020).Theystudiedtheeffectofsystematicgeo- metricaltransformationtoseveralc-shapedtargets,andconcluded that dosimetric indices should be included in the assessment of contouraccuracy.However,intheirassessmenttheyonlyprovided the plan of the reference contour and determined the dose pa- rametersofthegeometricaltransformationsonthedosedistribu- tion. Obviously,systematically moving the alternative target con- touraway fromthereferencetarget willdecrease bothgeometri- cal similarity, aswell asdosecoverage. Thisdoesnot exactly re- flect the dose effect of an incorrect contour, since for this mat- ter you need to calculate a dose distribution for both the refer- ence target and the transformed target, and then determine the differencesthesebothdistributionshaveonthereferencecontour volume.

Inthisstudy,weanalyzedthecorrelationbetweenthegeomet- ric similarity parameters and the effect a specific change on an OAR contour hason the dosedistribution.To doso we focus on radiotherapy forintracranial diseases. A large amount ofcancers situatedinthebrain,suchasmetastasis,butalsoprimarydiseases asgliomas,arebeingtreatedwithRT.Thebrainisalocationwitha largeamountofcriticalstructuresthatareimportanttospare,and thusaccuratedelineationisofimportance(Scocciantietal.,2015).

Mostofthestructuresaresmallandcanonlybe distinguishedon magnetic resonance imaging (MRI).Contouring is therefore a te- diousprocess.DeeplearningmethodsforintracranialOARsegmen- tationareunderdevelopment,butareuptodatenotyetcommer- ciallyavailable.

(3)

Fig. 1. Graphical description of the methodology as performed on a single alternative contour set containing 8 OARs. On the computed tomography (CT) and MRI imaging of a clinical case the OAR contours are defined (i.e., reference structures), as well as an alternative structure set defined by a second physician, from an auto segmentation method, or manual manipulation of the reference. The geometric metric (in this case DSC) is determined for the alternative OARs with respect to the reference. The two structure sets are used as input for an RT plan. The beam setup, dose prescription and the optimization criteria are set based on the reference plan. This generates two different dose distributions. The reference structure set is overlaid on the output dose distributions, and the dose volume histograms (DVH) of the reference and alternative plans are determined. The difference in dose between the alternative plan and the reference plan is plotted against their respective DSC.

Itisourhypothesisthatcurrentlyusedgeometricalindicesare not agoodpredictorofthequalityofasegmentation forthepur- pose of intracranial RT. We analyzed the level of correlation be- tween dosimetryandgeometricalmetricsused toassesssegmen- tationquality.Asgeometricalmetrics,wehaveselectedasetof23 commonly usedparameters. As geometricalsimilarity approaches a perfectmetric(i.e.aDSCof1.0),itisexpectedthatdoseeffects will be minimal. On theother hand,if there is barely geometri- cal similarity, itis questionablewhether thisinformation isclin- ically relevantat all. Consequently, we want to focus on analyz- ing contourvariations that could present itself in a clinical situ- ation, regardless of how the contours are obtained. Forthis pur- pose, we want to stay away from contour alternatives that are nearperfectorontheotherside,areobviouslywrong.Intermsof DSCs however,the valuesdepend heavily on therespective OAR, mainlyinfluencedbyitssize.Forreadability,wefocusononespe- cific parameter throughoutthis manuscript,the DSC. We specifi- cally choosethis metricsince it is still themost usedparameter andiswellinterpretedbymanyprofessionalsinthefield.Further- more,theDSCisawidelyusedparameterinlossfunctionsindeep learningbasedauto-segmentationmethods.Wewillcomebackto theotherparametersintheresultssection.

Additionally,weperformedsyntheticexperimentstofindwhat other characteristics,that arenotdepictedbythesegeometricpa- rameters,haveaneffectonthedosedistribution.Sincewearefo- cusingon OARs,thegoal ofRT isto avoiddoseasmuchaspos- sible.TheamountofdoseanOAR receivesisthereforedependent onitslocationrelativetothetarget,doseconstraintsandoptimiza- tion parameters.Furthermore,howthedosewillbeaffected bya change in contour is additionallydependent on the technique of dosedeliveryandtheshapeandnatureofthespecificchangesto thecontour.Isitanover-segmentationoranunder-segmentation?

AreerrorsinthesegmentationplacingtheOARcloserormoredis-

tanttothetarget?DoesthesizeoftheOARhaveaninfluence?Are therespecificoutliers?Consequently,we areinvestigatingcharac- teristicsasshape,size,distancetothetargetandrelativelocation.

With these findings, we expect to contribute to a better under- standing as to what quality of auto-segmentation is required to obtain clinically acceptable treatments, as well as to foster with implementingauto-segmentationintotheclinicsinasafeandse- cureway.

2. Materialsandmethods 2.1. Correlationonclinicalcases

To assess the correlation between the DSC metric and the dose effect in OARs of the brain, we have constructed RT plans for different sets of contours on a selection of cases from a co- hort ofglioblastomamultiforme(GBM)patients.Fig.1 presentsa schematicoverviewof themethodology, which isdetailedin the subsections below. From the left to right: 1) Selection of clinical imagingdataandreferencecontours.2)Creationofalternativesets ofcontours tomimicsegmentation anddicemetricvariability.3) Calculation of DSC andother geometrical metrics used to assess segmentation.4)Calculationofdosedistributionsoncontoursets.

5)Assessmentofdosimetricdifferencesforthereferenceandthe alternativecontoursets.6)Correlationanalysisbetweendosimetric differencesandDSC.

2.1.1. Clinicalimagingdata

Theclinical dataforthisstudywasselectedfroma retrospec- tive database of 100 post-operative GBM cases that have been treated with RT at the Inselspital, University Hospital Bern. All casescontaineda planningcomputedtomography(CT) registered

(4)

R. Poel, E. Rüfenacht, E. Hermann et al. Medical Image Analysis 73 (2021) 102161

Fig. 2. Selection and acquisition of the study data. From left to right it started with 100 post-operative glioblastoma multiforme cases. The 100 cases were stratified into 4 categories. From each category, 3 cases were selected. For these selected cases, 5 alternative sets of OAR contours were composed. Each of the alternative contour sets contained 8 OARs.

to MRI images and a reference structure set containing OARs as wellasthetargetvolumes.

Theplanningtargetvolume(PTV)wasdefinedaccordingtothe ESTRO-ACROP guidelines (Niyazi, 2016). The OARs are contoured according to Scoccianti etal. (2015) andverified by mutual con- sensusofthreeexperiencedradiationoncologyexperts.

Toobtainarepresentativeselectionofcasesintermsoftumor location, thecohortofGBMcaseswasdivided in4categoriesde- pendingonhowdemandingacaseisforradiotherapyplanningin termsoftheincludedOARs(Fig.2).

Category 1; is highly challenging, and is defined as the PTV overlapping with one ormore critical OARswith a hard con- straint(brainstemoroptictract).

Category2;isdefinedasthePTVoverlappingwithone ofthe hippocampi.

Category 3; are those cases where the PTV resides within 20mmofoneormoreOARs.

Category4;theleastchallengingcasesare definedasthe PTV morethan20mmawayfromanyOARormorethan10mmin thecranialdirectionfromanyOAR.Sinceplanningisperformed withco-planarvolumetricmodulatedarctherapy(VMAT)tech- niqueperpendiculartothebodyaxis,OARsresidingsuperiorof thetargetareautomaticallyspared.

Fromeachcategory,threecaseswereincludedtocompleteour studyset of12 cases.No additionalanalysisis performedonthe stratificationcategories.

2.1.2. Alternativecontours

The reference structure sets comprise 8 selected OARs; the brainstem, the opticchiasm,theopticnerves (leftandright),the eyes (left and right) and the hippocampi (left and right). Other smaller andperipheral located OARs such as the cochlea, lenses andlacrimalglandswerenotincludedsincetheimpactonthere- sultant dosedistributionistypicallylimitedduetosizeandloca- tion.

Eachofthe12 includedcasesreceivednext tothe8reference OARstructures,fivesetsofalternativeOARcontours.Withinthese alternativecontours,we wanttohaverealisticdatafromdifferent

sources thatdoesprovide sufficientvariety inrelationto theref- erencecontours.Tworadiationoncologyphysiciansmanuallycon- touredtheOARs resultinginalternative contours modelinginter- rater variability. Furthermore, for each case an alternative struc- ture set was obtained by a standard version of an in-house de- velopeddeeplearningbasedauto-segmentationmethodbasedon theU-netarchitecture(Isensee,2021).Wehavespecificallychosen fora standard version oftheauto-segmentation methodthat did notprovidestateoftheartresults,butinsteadprovidesuswitha widerrangeofsegmentationqualityresults.

A version of the U-Net (Ronneberger et al., 2015) was ad- justed tomeet the needs ofmulti-organ automatic segmentation on multiple MRI sequences. In order to incorporate recent im- provements we interleaved batch normalization [33] and a 10%- dropout[75] layeraftereach convolution layer.Theresulting fea- ture maps of the up-sampling layer are then concatenated with the feature maps from the contractive path, which are provided by the skip connections. The ending sequence of the expanding path consistsof a 1 × 1 convolution and a softmaxlayer to get the probabilities for each OAR and the background. For training we used focal Loss (gamma = 2) [48] in combination with an ADAM optimizer(betas= (0.99,0.999)) [38]. The initial learning rate was 10e-3, which reduced to 4 × 10e-4 after 150 epochs, and to 1.6 × 10e-4 after 250 epochs. The model was trained for 300 epochs in total, with a mini-batch size of 20 training examples.

Additionally,all 12 cases received two sets ofalternative OAR structuresbymeansofcontrolledmanualmanipulationoftheref- erence contours. These manual manipulations were designed to furtherincreasetherangeofgeometricalsimilarity,andstudythe patterns of correlationsat a low regime of segmentation perfor- mance. This data will complementthe data ofthe human raters andtheauto-segmentationresultsinordertoobtainawidedistri- butionofpossiblealternatives. Allstructures werecontoured ina researchenvironmentoftheclinicalversionofEclipseTPS(Eclipse, version15.6,Varian,PaloAlto, UnitedStatesofAmerica).Insum- mary,every casehadaset ofreferenceOARsand5setsof alter- nativeOARcontours.Intotal60alternativecontoursetswerecre- ated,resultingin480alternativeOARcontours.

(5)

Table 1

Structures and dose prescription.

Dose prescription

PTV (Reference only ) 60 Gy

Constraint doses

Brainstem Surface Max dose to 1% 60 Gy Brainstem Center Max dose to 1% 54 Gy Eye ( L + R ) Max dose to 1% 10 Gy

Chiasm Max dose to 1% 55 Gy

Optic Nerve ( L + R ) Max dose to 1% 55 Gy Hippocampus ( L + R ) Dose to 40% of volume 7.3 Gy Reference only

Lens ( L + R ) Max dose to 1% 10 Gy Lacrimal gland ( L + R ) Mean dose 25 Gy Cochlea ( L + R ) Hard: Mean Dose 45 Gy Soft: Mean Dose 32 Gy Retina ( L + R ) Max dose to 1% 45 Gy

Pituitary Hard: Mean Dose 45 Gy

Soft: Mean Dose 20 Gy

The structures labeled under reference only, do not have alterna- tive versions and are therefore not interchanged during the differ- ent dose calculations, since the dosimetric effect due to size and location is typically limited.

2.1.3. Geometricalsimilarityindices

TodeterminetheDSCofeachalternativestructurewithrespect tothereferencecontour,thestructuresetswereexportedfromthe TPS in RT-Dicom format. They were converted to Niftiformat in 3D slicer software(www.slicer.org).Withthe open-sourcepython software pymia (Jungo etal., 2021), the DSC foreach alternative -referencecontourpairwasdetermined.Additionally,anotherset of 22alternative segmentation parameters was determinedusing evaluation tools provided by the Visual Concept ExtractionChal- lengeinRadiology(VISCERAL,www.visceral.eu)project.Thelistis supplemented withcurrentpopular measuresasthe averagedis- tanceandthenormalizedsurfacedice(NSD)(Nikolov,2018).

2.1.4. RTplancalculation

Foreverycase,areferenceRTplanwasgeneratedbasedonthe referencestructures.TheClinicalTargetVolume(CTV)wasdefined astheresectioncavityandremainingGBM,includingperitumoral edema, as per ESTRO-ACROP guidelines (Niyazi, 2016). A 3 mm margin was added, to form the PTV. According to clinical stan- dard,theprescriptiondoseforthePTVwassetto60Gray(Gy)in aconventional scheme(30× 2.0Gy).ThedefinedOARsandtheir respectivehardandsoftconstraintdosescanbefoundinTable1.

Aco-planarVMATplanwassetup,withadoublefull arc,and 6 megavoltX-rayflatteningfilterfree beams,andoptimizedwith the anisotropicanalytical algorithm.Theplan wasacceptedwhen allconstraintsweremet.TheplanswerenormalizedonthePTVso that100%oftheprescribeddosecovered50%ofthePTV.

For the alternativestructure sets, we wanted tocreate a new planwhilekeepingalltreatmentparametersexcepttheOARstruc- turesthesame.Todosoweduplicatedthereferenceplanandsub- stituted thereference OARswiththe alternative OARs.The beam orientation,prescription,constraintsandoptimizationweights,re- mained unchanged from the reference plan. Thereafter, the plan wasre-optimized.Thiswouldresultinaslightlydifferentdosedis- tributionbecauseofthedifferentorientation ofthedefinedOARs.

These plans are also normalized so that 100% of the prescribed dosecovered50%ofthePTV.

2.1.5. Doseparameteranalysis

For all constructed RT plans (1 reference, 5 alternatives per case), the dose to the OARs of the reference structure set was analyzed. This reflects the dose the actual organ (i.e., reference) would receive, when it is incorrectly contoured (i.e., alternative).

The difference in dosebetweenthe alternative plan andthe ref- erenceplanis referredhereafter asthe doseeffect ordeltadose.

We determined the deltadose forboth the mean OAR doseand themaximumdoseto1%oftheOARvolume(Dmax1%).Theseare typicalmetricsusedtodeterminedoseconstraintstospecificOARs (Emami,2013).

2.1.6. Dataanalysisandstatistics

We analyzed the data in two ways: I). By the nature of how the segmentation variability was established, divided into threegroups:intra-ratervariability,manualadjustments,andauto- segmentation results. This is to show the variability in contour similaritywith respectto thereference foreach ofthesegroups.

II).Per specific organ type. Since segmentation metrics areinflu- encedbythevolumeofthesegmentation,andinter-ratervariabil- ityisOARdependent,resultsmightdifferamongdifferentsizesof OARs.Thespecificorgan typesweredividedinfivegroups;brain- stem,opticchiasm,opticnerves,eyesandhippocampi.

Thecorrelation foreach ofthe groupswasdeterminedby the Pearson correlation coefficient. Additionally, the correlation with 22alternativesegmentationparameters,listedinTable3,wasde- termined.The calculationsof themetricsare performedwiththe open-source python software pymia (Jungo et al., 2021) and the open source implementation of the surface DSC (Nikolov, 2018), available from https://github.com/deepmind/surface-distance. All thedistanceparametersarecomputedwhileconsideringthevoxel spacing.

2.2. Possiblecharacteristicspredictiveforthedoseeffect

Additionaltotheclinicaldata,syntheticexperimentswereper- formedto assessthecorrelation betweenthe effectofalternative OAR contours and(i) the distance withrespect tothe target, (ii) thesizeoftheOAR,(iii)theirrelativelocationwithrespecttothe targetandtheradiationbeams,(iv)theirspecificshape.

2.2.1. Diceversusdistance

A synthetic spherical target and one reference OAR were de- finedinthecenterofthebrainintheplanningCTofoneofthein- cludedsubjects.BasedonthereferenceOAR,8alternativecontours wereconstructedwithdifferentshapesandsizes.Thisresultedin a variety ofDSC with respect to thereference OAR (Fig. 3). This set of 9 different OARs (reference plus alternatives) were dupli- catedat5differentdistancesfromthetargetstartingfrom1.5cm, upto6.5cm,with1.5cmincrements.Foreachofthe5resulting distancesa referenceplan wasconstructed.Thegoalofthe refer- enceplanwastoobtainthelowestpossibledosetothereference OAR,withoutcompromisingtheprescriptiondosetothetargetof 60Gy.Foreachofthe8alternative OARs,the referenceplanwas duplicatedwhilesubstitutingthereferenceOARforeachoftheal- ternative onesinthedoseoptimizationstep, inthesamewayas describedinSection2.1.4.

The obtainedDSC ofthe alternative OARswithrespect to the reference,areplottedagainstthedose-effect.Thedoseeffectisde- terminedbythedosedifferencetothereferenceOARsbetweenthe referenceandalternativeplanssimilarasinSection2.1.5.

2.2.2. Diceversussize

ItiswellknownthatthesizeofanOARhasinfluenceonvoxel wise similaritysegmentation metricssuch astheDSC metric.We wantedtodetermineifthesizeofanOARwouldcorrelatewiththe doseeffectgivenaspecificfixedDSC.Forthispurpose,wesynthet- icallycreated7sphericalreferenceOARsascendinginsizefrom1.0 cc to 64.4 cc, on the planningCT of an actual subject. All OARs hadthesameminimumdistancetothetarget.Foreachoftheref- erenceOARs,weproduced twoalternativeOARs,obtainedbydis- placements in two different directions, with a DSC with respect

(6)

R. Poel, E. Rüfenacht, E. Hermann et al. Medical Image Analysis 73 (2021) 102161

Fig. 3. Synthetic experiment to assess the relationship of distance to the target on the dice-dose effect. On the left, axial slice representations of the 8 synthetic variations on the spherical reference contour with their respective dice similarity coefficient. On the right, the reference OAR and the alternatives are located at 5 different distances from the target (PTV, red circle), leading to a total of 45 synthetically generated alternative contours. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).

Fig. 4. Synthetic experiment to assess the influence of the size of the OAR on the dose effect with respect to the DSC metric. On the left we see the 7 reference OARs with different sizes (in green). On the right, an example of a reference OAR is shown accompanied with the respective alternative contours in blue and orange. The target is shown in red. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).

to the reference OAR of respectively 0.55 and 0.58 (see Fig. 4).

A referenceplan wasconstructedon each ofthe referenceOARs.

The goal of the plan is the same as in Section 2.2.1. This ref- erence plan was duplicated and re-optimized for the alternative OAR contours.ThedosedifferenceforthereferenceOAR between reference and alternative plan was determined for each of the 7sizes.

2.2.3. Diceversuslocation

To determine theeffect a specific location might have onthe dose-effect,anothersyntheticexperimentwasdesigned.Aspheri-

caltarget anda singleOAR referencecontour,aswellasanalter- nativeOARwere definedontheplanningCTofanactual subject.

The alternative OAR had a DSC of 0.46 with respect to the ref- erenceOAR. Thetwo OARs wereduplicated to differentlocations withrespectto thetarget,while keepingthesamedistancefrom thetarget (Fig.5).The locationsare posterior,medial,lateraland superiorofthetarget.Areferenceplanwasconstructedforeachof thereferenceOARs.Thisplanwasduplicatedandre-optimizedon thealternativeOAR contours,similarasinSection 2.2.1.Thedose differenceforthereferenceOARbetweenreferenceandalternative planwasdeterminedforeachlocation.

(7)

Fig. 5. Assessing the relationship of location relative to the target, on the dice-dose effect. Transversal, frontal sagittal and a 3D view of a human subject’s head are depicted. The red circle represents the PTV. The pink circle is the reference OAR and the blue structure is the alternative OAR structure. The pair of OARs is duplicated in locations A, B and C. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).

Fig. 6. Representation of the dice versus shape synthetic experiment. The red circle represents the PTV. The pink circle represents the reference OAR. At the same lo- cation, 4 alternative OARs with similar DSC to the reference were constructed with different shapes and size then the reference OAR. (For interpretation of the refer- ences to color in this figure legend, the reader is referred to the web version of this article.).

2.2.4. Diceversusshape

Thefourthsyntheticexperimentconsistedofasphericaltarget and one reference OAR. Four alternative OARs were constructed with differentshape or size,but withthe same arbitraryDSC of 0.66 withrespect to thereferenceOAR (Fig. 6). Areferenceplan was calculatedand optimized basedon the target andthe refer- ence OAR.This plan wasduplicated andre-optimizedwhile sub- stituting the referenceOAR for any of the alternative OARs. The difference in doseto the reference OAR and target, betweenthe alternativeandthereferenceplan,arecomparedtoassessthecor- relationofdifferentshapesofalternativeOARcontoursonthedose distribution.

3. Results

3.1. Segmentationdose-effectcorrelationonclinicalcases

In total,we haveconstructed 60 alternative structure sets for the 12 reference cases. Including the 12 referenceplans we cal- culated 72plans, and created 480 single pairs of DSC andtheir corresponding meandoseand Dmax 1%.Depending on the method the alternative contours were produced, results are pre- sentedinthree categories:(i) human-ratervariability, (ii)explicit manualmanipulationand(iii)auto-segmentationresults.Addition- ally,anorganspecificanalysisisperformed.

3.1.1. Humanratervariability

Forthehumanratervariability,themedianDSCwas0.83(inter- quartilerange[IQR]:0.13) Themedianmeandosetothe refer- enceOAR was0.25 (IQR: 0.80) Gy, and the median Dmax 1%

was0.4(IQR: 1.1)Gy.The Pearson’scorrelation coefficientforthe meandosedifference andthemaximumdosedifferencewiththe DSCwas−0.11and−0.08,respectively.Themeandose,andthe Dmax1%,areplottedagainsttheircorrespondingDSCinFig.7A andD.

3.1.2. Explicitmanualmanipulations

The manual manipulations resulted in a median DSC of 0.68 (IQR:0.22).Themedianmeandosewas0.30(IQR: 0.8)Gyand themedianDmax1%was0.50(IQR:1.22)Gy.ThePearson’scor- relationcoefficientforthemeandoseandtheDmax1%with theDSCwas−0.17and−0.13,respectively.Themeandose,and theDmax 1%,areplottedagainsttheir correspondingDSC,and showninFig.7BandE.

3.1.3. Auto-segmentationresults

Theauto-segmentationresultshadamedianDSCof0.70 (IQR:

0.33)withrespecttothereferencecontours.Themedianmean dosewas0.40(IQR:1.6)GyandthemedianDmax1%was0.75 (IQR:1.95)Gy.ThePearson’scorrelationcoefficientforthemean dose and the Dmax 1% with the DSC, was −0.31 and −0.13 respectively. The mean doseand the Dmax 1% are plotted againsttheircorrespondingDSC,andshowninFig.7CandF.

3.1.4. Segmentationdose-effectperOARtype

ThesegmentationresultsdifferslightlyoverthedifferentOARs.

The results are summarized in Table 2 anddisplayed asscatter- plotsinFig.8.Thesimilarityforthechiasmandopticnerveswere relativelylowwithamedianDSCof0.67and0.66respectively.The brainstemandtheeyesshowedrelativelybetter similaritywitha medianDSC of0.85and0.84respectively(Table2). The doseef- fectsamongthedifferentOARsdidnotshowmuchdifference.The highestobserved median mean dosewas0.70 Gy forthe op- ticchiasmandfortheDmax1%dose0.75fortheHippocampi.

ThePearsoncorrelationcoefficientisverylowforthesmallerOARs as the optic nerves and optic chiasm. However, it can be a lot higherforlargerOARsasthebrainstemandtheeyes.Interestingly thePearsoncorrelationforthebrainstemisverylowforthedelta meandose,butrelativelyhighforthedeltamaxdose(Table2).

3.1.5. Correlationofallalternativecontourscombined

ThecorrelationoftheDSCandthedoseeffectofthethreecate- goriescombined,aswellasfor22additionalsegmentationparam- eterscanbefoundinTable3.

3.2. Possiblecharacteristicspredictiveforthedoseeffect

3.2.1. Diceversusdistance

A total of 40 plans were calculated, on the eight alternative OARs,atfivedifferentdistancesfromthetarget(Fig.3).Themean

(8)

R. Poel, E. Rüfenacht, E. Hermann et al. Medical Image Analysis 73 (2021) 102161

Fig. 7. Scatter plots of the DSC versus the dose effect. The dose effects of the three different natures of alternative contours are plotted against their respective DSC. From left to right, the human-rater alternatives, the manually manipulated alternatives and the auto-segmented alternatives. The mean dose results are located in the upper plots while the Dmax 1% results are shown below.

Table 2

Results of the organ specific analysis of the correlation of the DSC and the dose effect (Median and IQR).

Mean volume (cc) DSC Delta mean dose

(Gy)

Pearson correlation - Mean dose and DSC

Delta max dose (Gy)

Pearson correlation - Max dose and DSC

Brainstem 26.5 0.849 (0.097) 0.2 (0.8) −0.013 0.5 (1.3) −0.387

Eyes 8.38 0.843 (0.145) 0.2 (0.6) −0.396 0.3 (0.7) 0.312

Optic Chiasm 0.24 0.674 (0.207) 0.7 (1.5) −0.04 0.4 (1.8) −0.072

Hippocampi 1.85 0.733 (0.268) 0.3 (0.8) −0.289 0.75 (1.3) −0.147

Optic Nerves 0.36 0.659 (0.245) 0.4 (1.0) −0.063 0.6 (1.5) −0.006

Table 3

Pearson correlation coefficients for additional segmentation parameters. The used metrics are a collection of segmentation metrics composed by the VISCERAL evaluation software (www-visceral.eu) complemented by some new popular metrics as the average distance and NSD ( Nikolov, 2018 ).

Correlation coefficient with: Mean dose Maximum dose

Similarity measures

Dice −0.112 −0.137

Jaccard −0.134 −0.152

Area under curve −0.117 −0.140

Cohen kappa −0.134 −0.164

Rand index 0.015 −0.102

Adjusted rand index −0.134 −0.164

Interclass correlation −0.134 −0.164

Volumetric Similarity Coefficient −0.055 −0.035

Mutual information −0.102 0.054

Normalized Surface Dice 0.075 0.010

Distance measures

Hausdorff distance 0.186 0.184

Average HDD 0.160 0.175

Average Distance −0.011 0.080

Mahanbolis Distance 0.083 0.168

Variation of info −0.031 0.091

Global consistency error −0.023 0.097

Probabilistic distance 0.103 0.202

Classic Measures

Sensitivity −0.117 −0.140

Specificity 0.071 −0.027

Precision −0.108 −0.141

F-Measure −0.134 −0.164

Accuracy 0.015 −0.102

Fallout −0.071 0.027

dose and Dmax 1% received by the reference OARs are plotted against theDSC, for each distance,in Fig. 9.From Fig.9A andC weobservedthatthedoseeffecttotheOAR,doesnotseemtobe directly influenced by the distance betweentarget andOAR. The dose versus dice plotsdo not seem to lead to more variation as the distance to the target is decreased.The absolute dosediffer-

ences(Fig.9B andD),show thatproximityto thetargetdoesnot necessarilyleadtoalargerdoseeffect.Whereweexpecttoseein- creasingdoseeffectswithdecreasingdistancetothetarget,weac- tuallyseethatspecific alternativecontours,characterizedbytheir DSCon thex-axis,show a lotofdosevariation(indicatedby the asterisksinFig.9B).Ontheotherhand,theotheralternativecon-

(9)

Fig. 8. Scatter plots of the DSC versus the dose effect for the 5 different or gan types. The mean dose (blue dots) and the max dose (orange dots) effects, in Gy, are plotted against their respective DSC on the x-axis. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).

tours show almost no dosevariationat all,regardless ofthedis- tancetothetarget.

3.2.2. Diceversussize

ThedifferentreferenceOARsizesandtheiralternativecontours withDSC valuesof 0.55and0.58respectively,show differentre- sults inthe mean doseeffectandthe Dmax 1% doseeffect. It is observed that whenthesize oftheOAR increases,the maximum doseincreasesandthemeandosedecreases.Thisisalogicalcon- sequence, sincea larger OARresults inlessroom forthedoseto avoidtheOAR nearthetargetandsimultaneouslythevolumere- ceivinglessdoseisincreasingduetotheincreasedsizeoftheOAR.

The dose-effectseemstofollowa differenttrend,whichincreases withincreasingsize ofOARbutseemstostabilizeandslowlyde- creasesasaspecificsizeisreached(Fig.10).

3.2.3. Diceversuslocation

The mean doseand the Dmax 1% to the referenceOARs, are showninFig.11AandB.Thedifferenceindoseduetotheplanning on thealternativeOAR,differs withthe respectivelocationtothe target. Thisdata showsthat one singlespecificcontour deviation ofanOARcanleadtobothanincreaseindose,adecreaseindose, ortonochangeindoseatall,dependingonitsrelativelocationto thetarget.Thesameeffectcanalsobeseenforthecoverageofthe PTV(Fig.11CandD).

3.2.4. Diceversusshape

The five plans, optimized forthe 5 different OARs,have been analyzed.InFig.12,themeandoseandDmax1%tothereference

OAR are depictedforeach ofthe plans. Despitehavingthe same DSCwithrespecttothereferenceOAR,themeandosetotheref- erenceOAR can varyup to 7.7Gybetweendifferent alternatives.

ThelargestdifferenceinDmax1%amongthealternativeplanswas 11.2Gy.Thetargetcoverageisstableamongallplans(Fig.12,cen- ter).

4. Discussion

This study shows the correlation between current segmenta- tionparametersanddosimetriceffectsinaselectionofGBMcases treatedwithVMATRT.Itwasourhypothesisthatgeometricsimi- laritymightnotbeagoodmethodtovalidate,orqualifycontours, forthepurposeofradiotherapy.Thesamequestionhaspreviously been investigated by other authors, but with a slightly different motivation.Goodingetal.usedanadaptedTuringtestfortheclin- icalvalidation ofauto-segmentedcontours (Gooding etal., 2018).

Thisapproachwasmotivatedbythebenchmarktrap,whichiscre- atedbycomparingresultstoagroundtruththatdoesnotactually exist. Vaassenet al. also proposed a different contouring valida- tionschemebyclaimingthatcorrectiontimeisclinicallymoreim- portantthangeometricalsimilarity(Vaassenetal.,2020).Thisre- sultedinanewparameterthatisbetterabletopredicttheamount ofmanual adjustment time.Although manual adjustmenttime is clinicallyrelevant,it assumesthat allcontours requirecorrection.

However, our datasuggest that many OAR contours do not need correctionatall.

Inthisstudy,we lookedatcontour validationthrougha more clinicalendgoalperspectiveofradiotherapy.Hence,we lookedat

(10)

R. Poel, E. Rüfenacht, E. Hermann et al. Medical Image Analysis 73 (2021) 102161

Fig. 9. Represented are the doses to the reference OARs at different distances as shown in Fig. 3 . The dots represent the plan based on the specific alternative OAR with the corresponding DSC on the x-axis. The upper plots show the results for the mean dose to the OAR (A) and the absolute difference with respect to the reference plan (B). The lower plots show the Dmax 1% of the OAR (C) and the absolute difference in Dmax 1% (D). The asterisks in B indicate cases that show a lot of dose effect variation among the different distances.

Fig. 10. The influence on the size of an OAR on the dosimetric effect for a two fixed alternative contours with a respective DSC of 0.55 (alternative A) and 0.58 (alternative B). The dose difference of the reference plan and the plan optimized on the specific alternative is plotted against the size of the volume in cubic centimeter (CC). The absolute dose difference is given in Gy.

thedosimetriceffects,whicharedirectlyrelatedtothetreatment outcome.Weaskedourselvesthequestions:Howincorrectdoesa contour needtobe forittostart influencingthedose?Arethere parametersabletopredictthedoseeffect?Intheseregards,weas- sessedthecorrelation betweenthe geometricalsimilarityandthe doseeffectinOARsofthebrain, andfoundalow correlation.Not only forthe DSC metric but also for other well-known segmen- tationassessmentmetricsaswellasrecentlyintroducedimproved metrics.Asexpected,someamountofcorrelationwasfound.How- ever,ifthegeometricalsimilaritygetsworse,wefoundalowcor-

relationtoacertaindoseeffect.Inconclusion,thepredictivevalue ofcurrentsegmentationparameters forcorrespondingdoseeffect is inadequate for segmentation tasks in brain radiation therapy planning.It cannot be determined ifa specific contourwould be clinically unacceptablebased on theanalyzed segmentation met- rics.

Theseresultsare differentthan theconclusionsmadeby Xian et al. (Xian and Chen, 2020) who also looked atthe correlation ofcontourvariationanddose.Weseesomesignificantdifferences inthe experimental design ofour studyandtheirs.Theyspecifi-

(11)

Fig. 11. Influence of location on the dose effect to a specific alteration in OAR contour. Bar graphs represent the dose effect to the OAR (A, B) and the PTV (C, D) for the reference plans (pink) and the alternative plans (blue), for the 4 different locations shown in Fig. 5 . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.).

Fig. 12. Influence of shape and volume of an alternative OAR contour on the dose effect. The bar plots represent the dose received by the reference OAR (pink, in Fig. 5 ) for the reference plan and the 4 alternative plans. The colors and the shape above the bars correlate with the shapes and colors from Fig. 5 . The middle bar graph represents the dose coverage (D95%) to the PTV.

cally looked atthecorrelation ofgeometricalsimilarity anddosi- metricindicestotargetstructures,whilewefocusedonOARs.Even though theyconcludedthatgeometricalsimilarityisnotsufficient forclinical contourvalidation,they showedstrongcorrelationsin theirresults.Agoodreasonforthiscouldbethefactthattheyana- lyzedtargets.Thetargetisastructurewhereallthedoseispointed towardsandhasasteepdosefall-off.Consequently,systematically

transformingthistargetcontourovertheexistingdosedistribution willleadtoacorrelatingeffect.

ThismethodwasalsousedbyBeasley etal.whenthey looked at the correlation of DSC between “ground truth” and auto- segmentedparotidandlarynxcontours(Beasleyetal.,2016).They didnotre-optimizetheplansbasedonthealternativecontourbut ratheroverlaidboth“groundtruth” andalternativecontouronthe

(12)

R. Poel, E. Rüfenacht, E. Hermann et al. Medical Image Analysis 73 (2021) 102161

existingdosedistributiontodeterminethedoseeffect.Inlinewith our results, they did finda weak correlation betweensegmenta- tionmetricsanddoseeffectthatwasOARdependent.However,it shouldbe mentionedthattheyonlyhad10datapairsperOARto determinethiscorrelation.

In this study, we are looking at large number of alternative OARcontoursinthebrain,wherethetargetareavariesinlocation.

Manymorefactors areinvolvedindeterminingthe doseeffectto a specificOAR,Increasingthecomplexityofpredictingwhichfac- torswillhaveaneffectandtowhatextent.Weinvestigatedafew characteristicsthathaveaninfluenceonthedoseeffecttoOARsin the brain.Notably,we foundthat thedistancefromatarget does not directlyinfluence the doseeffect; however,change in a spe- cificdirectiondoesseemtohaveamoreprominenteffect.Inad- dition, the relative location of the OAR andits contour variation withrespecttothetargetcouldbeoflargeinfluence.Furthermore, we noticed that the size of a specific OAR could have an influ- enceonthedoseeffectandshowsthatthedosimetricparameters will depend on the size as well. I.e. the difference in mean and max dose is substantially small for smallOARs but can be large for larger OARs. Nevertheless, size is a disturbing factor in vol- umebasedsimilaritymetrics.Variabilityofthesemetricscandiffer largelyamongdifferenttypesofOARs.Fig.8showsthatthisvari- ability is also presentin our test data. The variability ofsmaller longitudinal structures as the optic nerves and chiasm is larger as that of larger more spherical structures as the eyes and the brainstem.

Although wecannot attribute anyconclusionsduetothesyn- thetic natureof theexperiments,it doesshow the complexityof howdoseisaffectedinOARs.Moreover,itpresentshowmanypa- rametersandfactorsareinvolveddeterminingthefinaldosedistri- bution.SincemanycriticalOARsareincloseproximitywithinthe brain,theycanalsoinfluencethedosetoeachother(i.e.achange incontourtoOARAcould leadtoadoseeffectinOARB).Which issomething wedidnotaccountforinthecurrentstudyandne- cessitatesmoreinvestigation.

As mentioned, the dose effect is also dependent on how the dose isdelivered andhow theoptimization isperformed. In this study we worked with a clinical protocol delivering a co-planar VMAT technique using a dose prescription where the constraint dosestotheOARswhereprioritized.Differentdeliverytechniques anddifferentoptimizationapproacheswillthereforeleadtodiffer- entdosimetricoutcomes.Althoughweusedstratificationtohavea diversedistributionofcases,ithastobementionedthatdosimetry is verycase-specificandmany specificsituations arenot covered by our studydata.The resultsfromthisstudyare thereforeonly validforthisparticulartypeofRTdeliverytothebrainregion.On the other hand, the general rationale and experimental setup to investigatewhethergeometrical similaritymetrics arenot agood predictorofRTquality,could bevalidfordifferenttypesofRT in differentregions,andisworthinvestigating.

Asadditionalfollowup,webelieveitisimportanttofindchar- acteristicsthatdoreflecttreatmentquality.Inotherwords,tofind goodpredictorparametersofthedoseeffect.Aparameterlikethis would be very helpful forclinical validation ofcontours that are derivedfrommanualcontouringoranytypeofauto-contouring,as longasthereisareferencetocompareagainst. Additionally,such a parameterwouldbeveryusefulaspartofthecost functionfor designing and optimizingdeep learningbased auto-segmentation methods (Ma et al., 2021). Kofler et al. proposed incorporating qualitative measures into the loss function of a tumor segmen- tation method (Kofler, 2021). This can lead to several improve- ments.First,validatingacontouringsystemonarobusttreatment qualityisexpectedtoimprovetheclinicalimplementationofsuch tools.Secondly,ifoneisabletodeterminethatchangestodoseef- fectsarenegligibledespitegeometricaldifferences,onecanestab-

lishamore clinicallyorientedperformance objectiveforan auto- segmentationmethod.

Forclinical RT it could meanthat we donot have to visually inspect andmanually adjust all OAR contours. If we can predict that segmentedoutcomesdonothave adoseeffect,we canskip the inspection and correction part for these cases. Another sce- nario could be to predict which specific contours have an effect on the dose distribution. In this case, the visual inspection and manualcorrection step,whichisoftenrequiredwhenusingauto- segmentation,couldbemadesignificantlymoreefficient.

Lookingattheresultsfromourdata,we observedthata large numberofalternativecontoursdonotleadtoasignificantdoseef- fect(Figs.7and8).However,thequestionisifthisisalsoclinically insignificant.Thisisnotaneasyquestiontoanswer.Ingeneral,any increaseindosetoanOAR isundesired.However,duetotheop- timization process inRT, an increase in doseto a specific region oftenresultsinadecreasesomewhereelse.Thiscanbebeneficial ifthisregionisa criticalorganaswell, however,itwillbe detri- mental ifit comes at the expense ofthe target coverage. There- fore,itisdifficulttosaythatacertainincreasetoaspecificorgan isaffectingtheoveralltreatmentquality.Furthermore,anabsolute increase in dose is difficult to quantify. At what increase, either inabsolute orrelative numbers is achange significant. Should it be absolute dose or relative dose or should it be relative to its specificdoseconstraint?Besides,itisimportantwhichparameter, meandoseorDmax 1%,is used.Forinstance,ifonelooks atthe arbitrary threshold of 2.0 Gy absolute doseeffect, from the 960 parametersanalyzed inthisstudy,79exceeded thisthreshold.Of these79,in46thedoseincreased,whileintheother33thedose decreased.

A solution for this problem might be found in normal tissue complicationprobabilitymodels(Yorke,2001).Providedthatvalid modelsareavailableforthespecificOARsinthebrain, oneisable todeterminethetrade-off between sub-optimalcontoursandthe increaseinchanceofdevelopingaspecificcomplication.

Even though our data included a wide variety of geometrical similarityvalues, i.e.an averageDSC of0.71 ± 0.19,the doseef- fect tothe largemajority ofcasesshowed tobe limited. Thisin- formationisindeedencouragingforexploring newapproachesto improveandimplementauto-segmentationmethods.

Inconclusion,currentlyusedsegmentationassessmentparam- eters, which are mainly based on geometrical similarity, are not well correlated with dosimetric changes on OARs in the brain.

Our results also show that in the brain the majority of imper- fectcontours,whetherresultingfrommanualsegmentation, auto- segmentationordeliberatemanipulations,donotleadtoclinically relevantdosechanges.Inorderto findspecificcontourvariations that do lead to dose changes, other characteristics, such asrela- tive distanceandorientation tothe targetandtheshape andna- tureofthecontourvariationseemtohaveaninfluence.Thesere- sultssuggestthat thecurrentevaluationmetricformedicalimage segmentationinradiationtherapy,aswell asthetrainingofdeep learningsystemsemployingsuchmetrics,needtoberevisitedto- wardsclinicallyorientedmetricsthat moreaccuratelyreflecthow segmentation qualityaffects dosimetryandrelated tumorcontrol andtoxicity.

DeclarationofCompetingInterest

The authors declare the following financial interests/personal relationshipswhichmaybeconsideredaspotential competingin- terests: Stefan Scheib is a full time employee of Varian Medi- calSystems,ImagingLaboratoryGmbH,Dättwil,Switzerzland.The other Authors declarethatthey haveno knowncompetingfinan- cialinterestsorpersonalrelationshipsthatcouldhaveappearedto influencetheworkreportedinthispaper.

Referenzen

ÄHNLICHE DOKUMENTE

Bereanu, The continuity of the optimum in parametric programming and applications to stochastic programming, J.Optim.Theory Applic. Robinson, A characterization of stability

To examine the potential effect of lead on the proliferation of the neuronal and/or glial precursors in control and lead-exposed rats, animals were administered a proliferation

Pbtscher (1983) used simultaneous Lagrange multiplier statistics in order to test the parameters of ARMA models; he proved the strong consistency of his procedure

A shRNA-mediated knockdown of Copine 6 in primary hippocampal culture increases the number of dendritic spines and influences their maintenance upon changes in neuronal activity..

The results showed that there is a significant (p<0.05) pallido-cortical coherence reduction during DBS in the beta band while there were no considerable coherency changes in

the  sleep  and  wake  behaviour  of  zebrafish  also  leads  to  typical  differences  in  c‐fos  patterns  in 

2.2 Up-dating of the risk management procedures in the Hazardous Substances Committee - Action of the Project Group “Risk Acceptance”.. Steps to be taken in the procedure

Because the Mecp2 -/y mice showed a significant increase in Trh compared to wt mice in almost all brain areas analyzed and the TRH function depends on the