52
4 ResidualAnalysis
4.1 Introduction
Assumptions: a E
∼N i 2 h0,σ
i
(a)
Eh E i i
:Linea =0
rity, Additivity,
(b) equalva
riances:
hE var i i
σ =
, 2
(c) normal
distribution,
(d)
E
independent i
Checkmo delassumptions!
findb − →
ettermo del!
53
Findb ettermo
delb y
transformaton •
ofva riables,
additionalterms, •
like interactions,
weights •
for observations,
usealternative •
methods for
estimationand inference
Simpleregression: Examinescatterplot
Yvs.
X
Multipleexplanato ryva
riables:use alinea
rcombination ofX’s
for horizontal
axis
use − → b β
+ 0
b β X 1
+ (1)
b β X 2
+ (2)
= ...
b y
,
“fittedvalues
” .
54
angepasste Werte
log10(Erschütterung)
−0.2 0.0
0.2 0.4
0.6 0.8
−0.5
0.0
0.5
1.0
55
4.2 Residualsand
fittedvalues
Tuk a
ey-Anscombe Plot:Res.
R
= i
y
− i
b y
vs.fitted i
values
b y
i
−0.2 0.0
0.2 0.4
0.6 0.8
−0.4−0.20.00.20.4
angepasste Werte
Residuen
R
i
y ^
i
56
4.2
Whatkinds b
ofdeviations fromassumptions
cansho wup?
(a) Regressionfunction:
Pattern ofthe
points:
Slidingmean (smoother)
may show
curvature.
(b) Equality
ofva riances:(vert.)
variation ofpts
around smooth.
Points may
“fanout
” tothe right.
More precisely
seenin plotof
absolute residualsagainst
fit.
(c) Distributionof
errors:
Dop ointsscatter
symmetrically
around the0
line(o rthe
smooth)?
Outliers
?
57
4.2
How c
tojudge?
Aredeviations •
inrange ofchance?
Simulatedcurves − →
−0.2 0.0
0.2 0.4
0.6 0.8
−0.10−0.050.000.050.10
angepasste Werte
Residuen
y ^
i
R
i
Aredeviations •
dangerous?Answ erdep
endson purpose
.
58
4.3 Distributionof
errors
(c)no a
rmaldistribution
? Histogramof
E
... i
residuals − → R
i
!Note that
does Y
NOTneed tob
eno rmallydistributed
!
Residuen
−0.50
−0.25 0.00
0.25 0.50
0
2
4
6
8 0
1
2
3
Wahrsch.dichte Häufigkeit
59
4.3
Refinement:quantile-quantile-plot b
(qq-plot,no rmalplot
)
Quantile der Standardnormalverteilung
Geordnete Residuen
-2 -1
0 1
2
-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3
60
4.3
Deviationssignificant? c
goo − →
dnessof fittest
...o d
rsimulate!
−0.4
−0.2 0.0 0.2 0.4
051015 Häufigkeit
−0.4
−0.2 0.0 0.2 0.4
024681014
−0.4
−0.2 0.0 0.2 0.4
024681012
−0.4
−0.2 0.0 0.2 0.4
024681014
Residuen
Häufigkeit
−0.4
−0.2 0.0 0.2 0.4
051015
Residuen
−0.4
−0.2 0.0 0.2 0.4
051015
Residuen
61
−2.0
−1.5
−1.0
−0.5 0.0 0.5 1.0 1.5 2.0
−0.3−0.2−0.10.00.10.2 empir. Quantile
−2.0
−1.5
−1.0
−0.5 0.0 0.5 1.0 1.5 2.0
−0.3−0.2−0.10.00.10.20.3
−2.0
−1.5
−1.0
−0.5 0.0 0.5 1.0 1.5 2.0
−0.2−0.10.00.10.2
−2.0
−1.5
−1.0
−0.5 0.0 0.5 1.0 1.5 2.0
−0.2−0.10.00.10.20.3
theoret. Quantile
empir. Quantile
−2.0
−1.5
−1.0
−0.5 0.0 0.5 1.0 1.5 2.0
−0.4−0.20.00.2
theoret. Quantile
−2.0
−1.5
−1.0
−0.5 0.0 0.5 1.0 1.5
2.0 −0.4−0.20.00.2
theoret. Quantile
62
4.3
Distributionof e
errors
?Erro rs
E
6 i
residuals = R
i
R
= i
Y
− i
b y
both i
random.
b y
dependant i
of
Y
,hence i
of
E
. i
R f
∼N i 2 h0,σ
− (1 H )i ii
.
H
leverage ii
• Y
→ i
Y +∆ i
y
− → i
b y
→ i
b y + i
H
∆ ii
y
i
• H
measures ii
distance” ” bet
ween
x
and i
x
. i
simpleregr.:
H
=(1 ii
)+ /n x (
− i
) x / 2 (X SSQ
. )
multipler.:
H
=(1 ii
dhx /n)+
,x i 2 i
. :Mahalanobis d
dist.
•
≤ 0 H
≤ ii
, 1
ave
hH i
i ii
p/n =
.
63
4.3
Standardize g
residuals identicaldistribution. − →
e R
= i
R
i
b σ . p
− 1 H
ii
Usestand. residalsfo
rchecking thedistribution!
Usuallyunimp ortant!
But:Notion ofleverage
willb eused again!
64
4.4 Shouldw
etransfo rmthe
target variable?
Whatif a
deviationsdo show
up?
Cf.medical diagnosison
thebasis ofsymptoms.
Studysymptoms ofkno
wndeseases for
calibration.
Disease:Missing logtransfo
rmation
− →
curvedsmo •
othin TA
plot(plate shape)
Variation •
fanningout tothe
right
Skew •
eddistribution
Transfo − →
rmationSyndrom
65
−1 0 1 2 3 4 5 6
−2024
Tukey−Anscombe Plot
angepasste Werte
Residuen
−1 0 1 2 3 4 5 6
0.00.51.01.52.0
Streuungs−Diagramm
angepasste Werte
Wurzel(|standard. Residuen|)
standardisierte Residuen
standardisierte Residuen
Häufigkeit
−2
−1 0 1 2 3 4
0246810
−2.0
−1.0 0.0
1.0 2.0
−2−10123
QQ−Diagramm
theoretische Quantile
standardisierte Residuen
66
4.4
FirstAid b
Transfo rmations
amounts log − →
counts sqrt − →
Percentages
“arc − →
sin”
-Trsf.
asin(sqrt(p/100))
Outliers c
Long-taileddistributions d
robustmetho − →
ds.
67
4.5 Residualsand
Explanatory Va
riables
Plot a
Residualsagainst explanatory
variables
Transfomation − →
of s,additional x
terms
Non-constantva b
riances weighted − →
regression
InfluentialP c
oints
Independence d
ofErro rs:
PlotResiduals againstSequence