M -e st im a to rs

(1)

142

1 0 M u lt iv a ria te S c a tt e r M a tr ic e s

10.1

M o d e ls

aThemultivariateNormalDistributionDensity

f y;µ,|Σ =c·exp − 12 (Y−µ) T|Σ −1(Y−µ)

1/c= 1

(2π) m/2det D|Σ E1/2

Moreinsightisprovidedbythefollowingderivation.

(2)

14310.1

bNormaldistributionasatransformationfamily

Z∼Nmh0,Ii:centered,sphericallydistributedobservations=mrandomvariablesZ (j),standardnormal,independent.

Consider,foranyvectorµandregularmatrixB,

X=µ+BZ

=linear(oraffine)transformationofZ.CallthedistributionofXF µ,B .Transformationsgenerateafamilyofdistributions.

(3)

144

cGeneral:Useanygroupoftransformationsonany“starting”distribution.

dExpectationandcov.matrixforlineartransformations,ifEhZi=0andvarhZi=I,

EhXi=µ,varhXi=BB T

(4)

145

eIdentifiabilityProblem:Orthogonaltransformations

X=µ+QZ,QQ T=IleavethedistributionofZunchanged(=definitionof“sphericallysymmetric”distr.)

−→ParameterBisnotidentifyable!

Covariancematrix:varhXi=BB T=:|ΣisasuitableparameterinsteadofB.(different|Σ−→differentdistributions)

(5)

14610.1 fEllipticaldistributionsLetF0beanysphericallysymmetricdistribution.Z∼F0.

F µ,|Σ =distributionofX=µ+BZ,

BB T=|Σ,|Σregular.Familyiscalledthefamilyofellipticaldistr.generatedbyF0.

|Σscattermatrix,notcovarianceingeneral.–Covariancemaynotexist,–varhZ∼F0imaybeσ 20 Iwithσ06=1.

(6)

147

gDensitiesDensityofF0mustbeafunctionofthe(squared)“radius”,

f0hzi= ef0 kzk 2 .Fornormal: ef0 kzk 2 =c·exp −kzk 2/2

Generalµ,|Σ:LetBsuchthatBB T=|Σ.

z=B −1(x−µ)−→JacobiandethBi=det |Σ 1/2

f x;µ,|Σ =det |Σ −1/2f0hzi=det |Σ −1/2ef0 kzk 2

=det |Σ −1/2ef0 (x−µ) T|Σ −1(x−µ)

(7)

14810.1 hExamplesofellipticaldistributions:Multivariatetdistr.=distr.ofB b −1bµforXi∼Nmh0,Ii

Not“natural”forobservations!IfZ∼F0,thecomponentsarenotindependent.IndependentX (j)cannotbemodelledbyanyellipticalfamilyexceptfortheNormal!

(8)

14910.1

iSizeandShape(Log-)Sizeofthecovariancematrix:

τ:=log det |Σ = 1m Pk loghλkiλk=eigenvaluesof|Σ.

Shape:“standarddeviation”ofthelogeigenvalues:

η 2= Pk (loghλki−τ) 2.

Bothsize&shapeareinvariantunderorthogonaltransf.

(9)

150

10.2

E q u iv a ria n t E st im a to rs

aEstimatorsbµand b|Σshould“respect”thetransformationnatureofthefamily:IfYi=a+CXi,then

bµhY1,...,Yni=a+CbµhX1,...,Xnib|ΣhY1,...,Yni=C b|ΣhX1,...,XniC Tastheserelationsholdifbµand b|Σarereplacedbytheparametersµand|Σ.

Estimatorsforwhichtheserelationsholdarecalledequivariant(withrespecttoaffinetransf.s,whicharethegeneratingfa-mily)

(10)

15110.2 bOtherway’round:GivenbµhX1,...,Xniand b|ΣhX1,...,XniLetB bbea“squareroot”of b|Σ,B bB b T= b|Σ.Then,

bµ D...,B b −1(Xi−bµ),... E=0b|Σ D...,B b −1(Xi−bµ),... E=I

−→equationdeterminingbµand bB.bµand bB bB T= b|Σusuallyuniquelydeterminedbytheseeqns. Thus,itissufficienttodefinewhenbµ=0and b|Σ=I.EverysuchdefinitionyieldspotentiallysuitableestimatingfunctionalsµhFiand|ΣhFi.

(11)

152

10.3

M -e st im a to rs

aMaximumlikelihoodforellipticaldistributionsWrite ef0hkzk 2i=:exph−ρhkzk 2ii.(Noticethe 2intheargument!)

ℓℓ x;µ,|Σ =log f x;µ,|Σ

=−ρ (x−µ) T|Σ −1(x−µ) − 12 logdet |Σ

∂ℓℓ∂µ =ρ ′ kzk 2 2|Σ −1(x−µ)=ρ ′ kzk 2 2B −1z

∂ℓℓ∂|Σ =ρ ′ kzk 2 |Σ −1(x−µ)(x−µ) T|Σ −1−|Σ −1

=B −Tρ ′ kzk 2 zz T−I B −1

(12)

153

Forthesecondequation,wehaveusedtheformulas

∂(logdethAi)∂A =A −T

u=x TV −1x⇒ ∂u∂V =−V −1xx TV −1

Thus,maximumlikelihoodsolves

ave ρ ′ kzik 2 zi =0 ave ρ ′ kzik 2 ziz Ti =I

Here,ρ ′isaweightingfunction,notaψfunction(becauseofthe 2intheargumentofρ).

(13)

15410.3

bM-estimators

avei Dw (µ) kzik 2 zi E=0

avei Dw (η) kzik 2 ziz Ti E=avei Dw (δ) kzik 2

I E

NotethatforMLEs,w (µ)=w (η)andw (δ)hui=1.Asinthelocation-scalecase,theM-estimatorsaremoregeneralthanMLEforlong-taileddistributions.

w (δ)isnotintroducedintheusualliterature(Maronna+...).

(14)

155

Alternativeform:Weightedclassicalestimatorswithimplicitelydefinedweights

bµ= avei w (µ) D 2i Xi

avei w (µ) D 2i

b|Σ= avei w (η) D 2i (Xi−bµ)(Xi−bµ) T

avei w (δ) D 2i

D 2i =(Xi−bµ) Tb|Σ −1(Xi−bµ)

Weightedclassicalestimatorscanonlybeobtainedwhenallo-wingforaw (δ),andchoosingw (δ)=w (η)(=w (µ)).

Computation:Iterativelyreweightedclassicalestimates.

(15)

156

cInfluenceFunctionAbbreviateIFhzi=IF Dz;[bµ, b|Σ],Nmh0,Ii E

IF (µ)hzi=a (µ)w (µ)hkzk 2izIF (|Σ)hzi=a (η)w (η)hkzk 2izz T−ew (δ)hkzk 2iIwhereew (δ)dependsonw (δ)andw (η).

Generalcase:IF (µ) Dx;[bµ, b|Σ],Nm µ,|Σ

E

=BIF (µ)hzi IF (|Σ) Dx;[bµ, b|Σ],Nm µ,|Σ

E

=BIF (|Σ)hziB T

Forallequivariantestimators,theIFmusthavethisform.(notonlyM-estimators)

(16)

157 Calculationofa (µ),a (η),a (τ):Weneedintegralsoftheform

D= Zahuibhui Td eF0hui

whereahkzki=ψhz;0,Ii,bhkzki=shz,0,IiFortheFisherInformationandtheas.var.ofanequivariantest.thesametypeofintegralisneededwitha=b=sanda=b=ψ,respectively.

Theseintegralsarecharacterizedbyd (µ),d (η),d (τ),where

d (µ)= Zum ω (µ)a huiω (µ)b huid eF0hui d (η)=(1+2/m) −1 Zum 2ω (µ)a huiω (µ)b huid eF0hui

d (τ)= 12m Zψ (τ)a huiψ (τ)b huid eF0hui

(17)

158

Proof:SeeHampeletal.

D=“d-typematrix”=specialstructure!Thisshowsthatψ (τ)hui=u·ω (η)hui−m·ω (δ)huiisthefunctionthatcharacterizestheM-estimatorof|Σtogetherwithω (η)—notω (δ)!

(18)

15910.3 dSensitivitiesMaximumofthenormoftheIF.Itisnaturaltofocusonthestandardizedsensitivities.Standardizedsensitivities−→invariantw.r.t.affinetransf.=independentofµ,|Σ.

−→onlyneedtoconsiderIFatF0.Thenormsplitsinto2parts,kIFk 2=kIF (µ)k 2+kIF (|Σ)k 2

(19)

160 Forthe|Σterm,wegetafurthersplit:Letψ (τ)hui:=u·w (η)hui−m·w (δ)hui

kIF (|Σ)hzik 2:= X

j,k IF (|Σ)hzi 2

j,k

= a (η)w (η)hkzk 2ikzk 2 2+ a (τ)ψ (τ)hkzk 2i 2

=:kIF (η)hzik 2+|IF (τ)hzi| 2

ψ (τ)&w (η)connectedtosizeandshapeparameters,see10.1.i.

Naturalchoicew (δ)=w (η)−→ kIF (|Σ)hzik 2=w (η) kzk 2 (a (η)+a (τ))kzk 2−a (µ)m allowsforeasydefinitionofredescendingestimatorswhui→0foru→∞(cf.“weightedscale”7.1.e)

(20)

16110.3

eOptimalrobustestimationBoundedinfluencefunction,butasefficientforcentralmodel(normaldistribution)aspossible!Bound3“parts”separately,

kIF (µ)k≤γµ,kIF (η)k≤γη,|IF (τ)|≤γτ. Result:w (µ)hui=min D1, pbµ/u E,w (η)hui=minh1,bη/ui,

ψ (τ)hui=hbτ hu−mβi(Huberfunction)or

w (δ)hui=uw (η)hui−(u−mβ)minh1,bτ/(u−mβ)i,

βasuitableconstantforFisherconsistency,dependingonbηandbτ.

(21)

162

10.4

B re a k d o w n

aBreakdownofcovariancematrixestimatorsNobreakdownof b|Σmeanseigenvalues<∞and>γh|Σi>0(boundedawayfrom0).

If|Σ→singular, b|Σ→shouldalsobesingular(equalsubspace).Goalofmultivariatestatistics:Studyrelationsbetweenvariables Forallaffinelyequivariantestimators,boundednessof b|Σfor|Σ=Iissufficient.

(22)

163 10.4

bBreakdownofM-estimatorsResult:breakdownpoint<1/m.

Proof:Considerthe“barrowwheel”distribution:LetFτ=Nh0,diaghτ,1,...,1ii,Hdistributionof[U,0,...,0]withU∼χ 2p−1 Chisqu.distr.,Gε,τ=(1−ε)Fτ+εHlookslikethewheelofabarrow.

(23)

164 Letε=1/mandτ→0.Then,ThGε,0i=cIisasolutionoftheM-estimatingequationfor|Σ.

−→Breakdown!

Thebarrowwheelsituationiseasilydetectedinpractise–unlessrotatedsuchthatthefirstaxisismappedtothespacediagonal.Then,thescatterplotofeverypairofvariableslookssphericalifthedimensionisnotsmall.

(24)

16510.4

cWhatdoesthismean?Consider100obs.of7variables=700data.Assume2%grosserrors

−→about13-14(multivar.)obs.contain≥1error(s)

−→breakdownat2%ofthedata,not1/m=1/7

−→another“curseofdimension”

(25)

16610.4

dNonaffinelyequivariantestimators?Idea:Estimateeachelementofthecovariancematrixrobustly!

E.g.,GnanadesikanandKettenring(1972):Notethat

cov X (j),X (k) = 14 var X (j)+X (k) −var X (j)+X (k)

Robustversion:

bρ DX (j),X (k) E= 14 bσ 2 DX (j)bσhX (j)i + X (k)bσhX (k)i E−

bσ 2 DX (j)bσhX (j)i − X (k)bσhX (k)i E

dcov DX (j),X (k) E=bρ DX (j),X (k) E·bσ DX (j) Ebσ DX (k) E

(26)

167 10.4

eProblem:Theresultingmatrixwillingeneralnotbepositive(semi-)definite.

Tricktomakethempositivedefinite:ApplyEigenvalueanalysise|Σ=QDQ T

withdiagonalDandorthogonalQ.If e|Σwasacovariancematrix,thenDwouldcontainthevariancesDiioftheuncorrelatedvariablesZ=Q −1X.

−→Estimaterobustvariancesbvk oftheZ (k)

Thenuseback-transformedmatrixasestimate,b|Σ=Q Tdiaghbv1,...,bvmiQ

(27)

16810.4

fOntheotherhand:

•Theseestimatorsarenotequivariant.Buttheywanttoreactto“outliersinsinglevariables”,whichisnotanaffinelyinvariantmodel.

•Singularcovariancematriceswillmostprobablynotbedetected.(Thinkofthebarrowwheel!)

(28)

169

10.5

E st im a tio n w it h h ig h b re a k d o w n p o in t

aProjectionestimatorOutlyingnessofanobservation:One-dimensional:|xi−bµ|/bσbµ,bσ:robustestimatorsoflocationandscatter.

Multi-dimensional:Examineprojectionsb TXi,|b|=1.Outlyingness:|xi−bµh...,b TXh ,...i|/bσh...,b TXh ,...iMultivariateoutlyingness:Maximizeoverprojectiondirectionsb

(29)

170

Useoutlyingnesstodefineweightsforweightedestimator:

Di=maxb:kbk=1

hix−bµ...,bX,... T

h/bσ...,bX,... T

bµ= avei w D 2i Xi

avei w D 2i

b|Σ= avei w D 2i (Xi−bµ)(Xi−bµ) T

avei w D 2i

Disadvantage:Computationally“impossible”!Wayout:seebelow.

(30)

171 10.5 bS-estimators“Backwarddefinition”:Assumethat b|Σistheestimate.ShapematrixS= b|Σ/(det b|Σ 1/mdeterminesoutlyingnessd 2i =(xi−bµ) TS −1(xi−bµ).Scaleestimatebσ d 21 ,...,d 2n .Minimizethisscale!...anduseitastheestimateofsize.

[bµ,S b]=argmin bσ 2 ...,(xi−µ) TS −1(xi−µ),... |

µ,S:dethSi=1

b|Σ=S bbσ 2 D...,(xi−bµ) TS b −1(xi−bµ),... E

bσcanbeanM-estimatorofscaleoranotherrobustestimator.

Earlyexample:MinimalCovarianceDeterminant(MCD):usestrimmedscale.

Computation:SimilarcomplexityasProjectionestimator.

(31)

172

Finally,adataexample:Irisdata,firstspecies=50obs.,first2vars.–contaminatedby4grosserrors(colored).Ellipsesaresizedsuchthattheyshouldcontain80%ofthedata.Classicalestimates(larger,red...)andMCD(smaller,blue,---).

4.44.64.85.05.25.45.65.8

2.5 3.0 3.5 4.0 4.5

Length

Width

(32)

173 10.5 cComputationofhighbreakdownpointestimatorsMinimaneeded–overalldirectionsinR morallshapematricescannotbecomputed(exceptforverysmalldim.).Needan“intelligent”selection,overwhichtheminimumis“hopefully”closetotheglobalminimum.Atleast:Breakdownshouldbeavoided.

Basicidea:Choose“candidate”forminimumfromarandomminimalsubsetofobs.(called“elementalsubset”).Hopeforcandidatesthatarefreeofcontamination.Thisshouldgiveareliablestartingpointforsomekindoflocalminimzation,whichwilldefinetheestimate.

(33)

174

Projections:mpointsdetermineahyperplane.Projectonthedirectionperpendiculartoit.Shapematrices:needm+1points.

Probabilitytogeta“clean”elementalsubset:

P=1−(1−(1−ε) m) rforrrandomchoices.Exercise:CalculatethenumberofreplicatesneededtoobtainP=0.9,say,forseveralcombinationsofεandm!

(34)

175 10.5

dMessages

•Normaldistribution:The“onlyrealistic”model!

f x;µ,|Σ =c·exp− 12 d 2,whered 2=(x−µ) T|Σ −1(x−µ) T

•Affinetransformations,appliedtoNmh0,Iidefinethefamilyofnormaldistributions,Nm µ,|Σ

−→transformationfamily,−→Ellipticaldistributions,−→asksforequivariantestimators.

•M-estimatorsgivenbyonly3scalarfunctionsw (µ),w (η),ψ (τ).

(35)

176

•BreakdownofM-estimatorsispoor:

≤1/mofthe(vector)obs.−→≤≈1/m 2ofthedata!

•Estimatorswithhighbreakdownpoint:Projection-andS-estimatorsComputationis“impossible”.Thebestwecandois“elementalsubsets”whichavoidsbreakdownwithaprobabilitythatdependsonthedimensionmandthenumberofreplicatesr.