142
1 0 M u lt iv a ria te S c a tt e r M a tr ic e s
10.1
M o d e ls
aThemultivariateNormalDistributionDensity
f y;µ,|Σ =c·exp − 12 (Y−µ) T|Σ −1(Y−µ)
1/c= 1
(2π) m/2det D|Σ E1/2
Moreinsightisprovidedbythefollowingderivation.
14310.1
bNormaldistributionasatransformationfamily
Z∼Nmh0,Ii:centered,sphericallydistributedobservations=mrandomvariablesZ (j),standardnormal,independent.
Consider,foranyvectorµandregularmatrixB,
X=µ+BZ
=linear(oraffine)transformationofZ.CallthedistributionofXF µ,B .Transformationsgenerateafamilyofdistributions.
144
cGeneral:Useanygroupoftransformationsonany“starting”distribution.
dExpectationandcov.matrixforlineartransformations,ifEhZi=0andvarhZi=I,
EhXi=µ,varhXi=BB T
145
eIdentifiabilityProblem:Orthogonaltransformations
X=µ+QZ,QQ T=IleavethedistributionofZunchanged(=definitionof“sphericallysymmetric”distr.)
−→ParameterBisnotidentifyable!
Covariancematrix:varhXi=BB T=:|ΣisasuitableparameterinsteadofB.(different|Σ−→differentdistributions)
14610.1 fEllipticaldistributionsLetF0beanysphericallysymmetricdistribution.Z∼F0.
F µ,|Σ =distributionofX=µ+BZ,
BB T=|Σ,|Σregular.Familyiscalledthefamilyofellipticaldistr.generatedbyF0.
|Σscattermatrix,notcovarianceingeneral.–Covariancemaynotexist,–varhZ∼F0imaybeσ 20 Iwithσ06=1.
147
gDensitiesDensityofF0mustbeafunctionofthe(squared)“radius”,
f0hzi= ef0 kzk 2 .Fornormal: ef0 kzk 2 =c·exp −kzk 2/2
Generalµ,|Σ:LetBsuchthatBB T=|Σ.
z=B −1(x−µ)−→JacobiandethBi=det |Σ 1/2
f x;µ,|Σ =det |Σ −1/2f0hzi=det |Σ −1/2ef0 kzk 2
=det |Σ −1/2ef0 (x−µ) T|Σ −1(x−µ)
14810.1 hExamplesofellipticaldistributions:Multivariatetdistr.=distr.ofB b −1bµforXi∼Nmh0,Ii
Not“natural”forobservations!IfZ∼F0,thecomponentsarenotindependent.IndependentX (j)cannotbemodelledbyanyellipticalfamilyexceptfortheNormal!
14910.1
iSizeandShape(Log-)Sizeofthecovariancematrix:
τ:=log det |Σ = 1m Pk loghλkiλk=eigenvaluesof|Σ.
Shape:“standarddeviation”ofthelogeigenvalues:
η 2= Pk (loghλki−τ) 2.
Bothsize&shapeareinvariantunderorthogonaltransf.
150
10.2
E q u iv a ria n t E st im a to rs
aEstimatorsbµand b|Σshould“respect”thetransformationnatureofthefamily:IfYi=a+CXi,then
bµhY1,...,Yni=a+CbµhX1,...,Xnib|ΣhY1,...,Yni=C b|ΣhX1,...,XniC Tastheserelationsholdifbµand b|Σarereplacedbytheparametersµand|Σ.
Estimatorsforwhichtheserelationsholdarecalledequivariant(withrespecttoaffinetransf.s,whicharethegeneratingfa-mily)
15110.2 bOtherway’round:GivenbµhX1,...,Xniand b|ΣhX1,...,XniLetB bbea“squareroot”of b|Σ,B bB b T= b|Σ.Then,
bµ D...,B b −1(Xi−bµ),... E=0b|Σ D...,B b −1(Xi−bµ),... E=I
−→equationdeterminingbµand bB.bµand bB bB T= b|Σusuallyuniquelydeterminedbytheseeqns. Thus,itissufficienttodefinewhenbµ=0and b|Σ=I.EverysuchdefinitionyieldspotentiallysuitableestimatingfunctionalsµhFiand|ΣhFi.
152
10.3
M -e st im a to rs
aMaximumlikelihoodforellipticaldistributionsWrite ef0hkzk 2i=:exph−ρhkzk 2ii.(Noticethe 2intheargument!)
ℓℓ x;µ,|Σ =log f x;µ,|Σ
=−ρ (x−µ) T|Σ −1(x−µ) − 12 logdet |Σ
∂ℓℓ∂µ =ρ ′ kzk 2 2|Σ −1(x−µ)=ρ ′ kzk 2 2B −1z
∂ℓℓ∂|Σ =ρ ′ kzk 2 |Σ −1(x−µ)(x−µ) T|Σ −1−|Σ −1
=B −Tρ ′ kzk 2 zz T−I B −1
153
Forthesecondequation,wehaveusedtheformulas
∂(logdethAi)∂A =A −T
u=x TV −1x⇒ ∂u∂V =−V −1xx TV −1
Thus,maximumlikelihoodsolves
ave ρ ′ kzik 2 zi =0 ave ρ ′ kzik 2 ziz Ti =I
Here,ρ ′isaweightingfunction,notaψfunction(becauseofthe 2intheargumentofρ).
15410.3
bM-estimators
avei Dw (µ) kzik 2 zi E=0
avei Dw (η) kzik 2 ziz Ti E=avei Dw (δ) kzik 2
I E
NotethatforMLEs,w (µ)=w (η)andw (δ)hui=1.Asinthelocation-scalecase,theM-estimatorsaremoregeneralthanMLEforlong-taileddistributions.
w (δ)isnotintroducedintheusualliterature(Maronna+...).
155
Alternativeform:Weightedclassicalestimatorswithimplicitelydefinedweights
bµ= avei w (µ) D 2i Xi
avei w (µ) D 2i
b|Σ= avei w (η) D 2i (Xi−bµ)(Xi−bµ) T
avei w (δ) D 2i
D 2i =(Xi−bµ) Tb|Σ −1(Xi−bµ)
Weightedclassicalestimatorscanonlybeobtainedwhenallo-wingforaw (δ),andchoosingw (δ)=w (η)(=w (µ)).
Computation:Iterativelyreweightedclassicalestimates.
156
cInfluenceFunctionAbbreviateIFhzi=IF Dz;[bµ, b|Σ],Nmh0,Ii E
IF (µ)hzi=a (µ)w (µ)hkzk 2izIF (|Σ)hzi=a (η)w (η)hkzk 2izz T−ew (δ)hkzk 2iIwhereew (δ)dependsonw (δ)andw (η).
Generalcase:IF (µ) Dx;[bµ, b|Σ],Nm µ,|Σ
E
=BIF (µ)hzi IF (|Σ) Dx;[bµ, b|Σ],Nm µ,|Σ
E
=BIF (|Σ)hziB T
Forallequivariantestimators,theIFmusthavethisform.(notonlyM-estimators)
157 Calculationofa (µ),a (η),a (τ):Weneedintegralsoftheform
D= Zahuibhui Td eF0hui
whereahkzki=ψhz;0,Ii,bhkzki=shz,0,IiFortheFisherInformationandtheas.var.ofanequivariantest.thesametypeofintegralisneededwitha=b=sanda=b=ψ,respectively.
Theseintegralsarecharacterizedbyd (µ),d (η),d (τ),where
d (µ)= Zum ω (µ)a huiω (µ)b huid eF0hui d (η)=(1+2/m) −1 Zum 2ω (µ)a huiω (µ)b huid eF0hui
d (τ)= 12m Zψ (τ)a huiψ (τ)b huid eF0hui
158
Proof:SeeHampeletal.
D=“d-typematrix”=specialstructure!Thisshowsthatψ (τ)hui=u·ω (η)hui−m·ω (δ)huiisthefunctionthatcharacterizestheM-estimatorof|Σtogetherwithω (η)—notω (δ)!
15910.3 dSensitivitiesMaximumofthenormoftheIF.Itisnaturaltofocusonthestandardizedsensitivities.Standardizedsensitivities−→invariantw.r.t.affinetransf.=independentofµ,|Σ.
−→onlyneedtoconsiderIFatF0.Thenormsplitsinto2parts,kIFk 2=kIF (µ)k 2+kIF (|Σ)k 2
160 Forthe|Σterm,wegetafurthersplit:Letψ (τ)hui:=u·w (η)hui−m·w (δ)hui
kIF (|Σ)hzik 2:= X
j,k IF (|Σ)hzi 2
j,k
= a (η)w (η)hkzk 2ikzk 2 2+ a (τ)ψ (τ)hkzk 2i 2
=:kIF (η)hzik 2+|IF (τ)hzi| 2
ψ (τ)&w (η)connectedtosizeandshapeparameters,see10.1.i.
Naturalchoicew (δ)=w (η)−→ kIF (|Σ)hzik 2=w (η) kzk 2 (a (η)+a (τ))kzk 2−a (µ)m allowsforeasydefinitionofredescendingestimatorswhui→0foru→∞(cf.“weightedscale”7.1.e)
16110.3
eOptimalrobustestimationBoundedinfluencefunction,butasefficientforcentralmodel(normaldistribution)aspossible!Bound3“parts”separately,
kIF (µ)k≤γµ,kIF (η)k≤γη,|IF (τ)|≤γτ. Result:w (µ)hui=min D1, pbµ/u E,w (η)hui=minh1,bη/ui,
ψ (τ)hui=hbτ hu−mβi(Huberfunction)or
w (δ)hui=uw (η)hui−(u−mβ)minh1,bτ/(u−mβ)i,
βasuitableconstantforFisherconsistency,dependingonbηandbτ.
162
10.4
B re a k d o w n
aBreakdownofcovariancematrixestimatorsNobreakdownof b|Σmeanseigenvalues<∞and>γh|Σi>0(boundedawayfrom0).
If|Σ→singular, b|Σ→shouldalsobesingular(equalsubspace).Goalofmultivariatestatistics:Studyrelationsbetweenvariables Forallaffinelyequivariantestimators,boundednessof b|Σfor|Σ=Iissufficient.
163 10.4
bBreakdownofM-estimatorsResult:breakdownpoint<1/m.
Proof:Considerthe“barrowwheel”distribution:LetFτ=Nh0,diaghτ,1,...,1ii,Hdistributionof[U,0,...,0]withU∼χ 2p−1 Chisqu.distr.,Gε,τ=(1−ε)Fτ+εHlookslikethewheelofabarrow.
164 Letε=1/mandτ→0.Then,ThGε,0i=cIisasolutionoftheM-estimatingequationfor|Σ.
−→Breakdown!
Thebarrowwheelsituationiseasilydetectedinpractise–unlessrotatedsuchthatthefirstaxisismappedtothespacediagonal.Then,thescatterplotofeverypairofvariableslookssphericalifthedimensionisnotsmall.
16510.4
cWhatdoesthismean?Consider100obs.of7variables=700data.Assume2%grosserrors
−→about13-14(multivar.)obs.contain≥1error(s)
−→breakdownat2%ofthedata,not1/m=1/7
−→another“curseofdimension”
16610.4
dNonaffinelyequivariantestimators?Idea:Estimateeachelementofthecovariancematrixrobustly!
E.g.,GnanadesikanandKettenring(1972):Notethat
cov X (j),X (k) = 14 var X (j)+X (k) −var X (j)+X (k)
Robustversion:
bρ DX (j),X (k) E= 14 bσ 2 DX (j)bσhX (j)i + X (k)bσhX (k)i E−
bσ 2 DX (j)bσhX (j)i − X (k)bσhX (k)i E
dcov DX (j),X (k) E=bρ DX (j),X (k) E·bσ DX (j) Ebσ DX (k) E
167 10.4
eProblem:Theresultingmatrixwillingeneralnotbepositive(semi-)definite.
Tricktomakethempositivedefinite:ApplyEigenvalueanalysise|Σ=QDQ T
withdiagonalDandorthogonalQ.If e|Σwasacovariancematrix,thenDwouldcontainthevariancesDiioftheuncorrelatedvariablesZ=Q −1X.
−→Estimaterobustvariancesbvk oftheZ (k)
Thenuseback-transformedmatrixasestimate,b|Σ=Q Tdiaghbv1,...,bvmiQ
16810.4
fOntheotherhand:
•Theseestimatorsarenotequivariant.Buttheywanttoreactto“outliersinsinglevariables”,whichisnotanaffinelyinvariantmodel.
•Singularcovariancematriceswillmostprobablynotbedetected.(Thinkofthebarrowwheel!)
169
10.5
E st im a tio n w it h h ig h b re a k d o w n p o in t
aProjectionestimatorOutlyingnessofanobservation:One-dimensional:|xi−bµ|/bσbµ,bσ:robustestimatorsoflocationandscatter.
Multi-dimensional:Examineprojectionsb TXi,|b|=1.Outlyingness:|xi−bµh...,b TXh ,...i|/bσh...,b TXh ,...iMultivariateoutlyingness:Maximizeoverprojectiondirectionsb
170
Useoutlyingnesstodefineweightsforweightedestimator:
Di=maxb:kbk=1
hix−bµ...,bX,... T
h/bσ...,bX,... T
bµ= avei w D 2i Xi
avei w D 2i
b|Σ= avei w D 2i (Xi−bµ)(Xi−bµ) T
avei w D 2i
Disadvantage:Computationally“impossible”!Wayout:seebelow.
171 10.5 bS-estimators“Backwarddefinition”:Assumethat b|Σistheestimate.ShapematrixS= b|Σ/(det b|Σ 1/mdeterminesoutlyingnessd 2i =(xi−bµ) TS −1(xi−bµ).Scaleestimatebσ d 21 ,...,d 2n .Minimizethisscale!...anduseitastheestimateofsize.
[bµ,S b]=argmin bσ 2 ...,(xi−µ) TS −1(xi−µ),... |
µ,S:dethSi=1
b|Σ=S bbσ 2 D...,(xi−bµ) TS b −1(xi−bµ),... E
bσcanbeanM-estimatorofscaleoranotherrobustestimator.
Earlyexample:MinimalCovarianceDeterminant(MCD):usestrimmedscale.
Computation:SimilarcomplexityasProjectionestimator.
172
Finally,adataexample:Irisdata,firstspecies=50obs.,first2vars.–contaminatedby4grosserrors(colored).Ellipsesaresizedsuchthattheyshouldcontain80%ofthedata.Classicalestimates(larger,red...)andMCD(smaller,blue,---).
4.44.64.85.05.25.45.65.8
2.5 3.0 3.5 4.0 4.5
Length
Width
173 10.5 cComputationofhighbreakdownpointestimatorsMinimaneeded–overalldirectionsinR morallshapematricescannotbecomputed(exceptforverysmalldim.).Needan“intelligent”selection,overwhichtheminimumis“hopefully”closetotheglobalminimum.Atleast:Breakdownshouldbeavoided.
Basicidea:Choose“candidate”forminimumfromarandomminimalsubsetofobs.(called“elementalsubset”).Hopeforcandidatesthatarefreeofcontamination.Thisshouldgiveareliablestartingpointforsomekindoflocalminimzation,whichwilldefinetheestimate.
174
Projections:mpointsdetermineahyperplane.Projectonthedirectionperpendiculartoit.Shapematrices:needm+1points.
Probabilitytogeta“clean”elementalsubset:
P=1−(1−(1−ε) m) rforrrandomchoices.Exercise:CalculatethenumberofreplicatesneededtoobtainP=0.9,say,forseveralcombinationsofεandm!
175 10.5
dMessages
•Normaldistribution:The“onlyrealistic”model!
f x;µ,|Σ =c·exp− 12 d 2,whered 2=(x−µ) T|Σ −1(x−µ) T
•Affinetransformations,appliedtoNmh0,Iidefinethefamilyofnormaldistributions,Nm µ,|Σ
−→transformationfamily,−→Ellipticaldistributions,−→asksforequivariantestimators.
•M-estimatorsgivenbyonly3scalarfunctionsw (µ),w (η),ψ (τ).
176
•BreakdownofM-estimatorsispoor:
≤1/mofthe(vector)obs.−→≤≈1/m 2ofthedata!
•Estimatorswithhighbreakdownpoint:Projection-andS-estimatorsComputationis“impossible”.Thebestwecandois“elementalsubsets”whichavoidsbreakdownwithaprobabilitythatdependsonthedimensionmandthenumberofreplicatesr.