• Keine Ergebnisse gefunden

3. Diagnostics and Remedial Measures

N/A
N/A
Protected

Academic year: 2021

Aktie "3. Diagnostics and Remedial Measures"

Copied!
36
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

3. Diagnostics and Remedial Measures

So far, we took data (Xi, Yi) and we assumed

Yi = β0 + β1Xi + ²i i = 1,2, . . . , n,

where

²i iid N(0, σ2),

β0, β1 and σ2 are unknown parameters,

Xi’s are fixed constants.

Question:

What are the possible mistakes or violations of these assumptions?

1

(2)

1. Regression function is not linear (E(Y ) 6= β0 + β1X)

2. Error terms do not have a constant variance (var(²i) 6= σ2, i = 1, . . . , n) 3. Error terms are not independent (cor(²i, ²i0) 6= 0, i 6= i0)

4. Model fits all but one or a few outlying observations 5. The error terms are not normally distributed

6. Simple linear regression is not reasonable (model should have more predictors) We will use Residual Plots to diagnose the problems

Residuals: ei = Yi Yˆi = Yi (b0 + b1Xi) Sample Mean: e¯ = n1 P

iei = 0

2

(3)

Sample Variance n−11 P

i(ei e)¯ 2 = n−11 P

i e2i MSE

We will sometimes use standardized (semistudentized) residuals ei = ei e¯

MSE = ei

MSE

3

(4)

Nonlinearity of Regression Function (1.)

Residual plot against the predictor variable, X. Or use a residual plot against the fitted values, Yˆ. Look for systematic tendencies!

Example:

Xi = amount of water/week

Yi = plant growth in first 2 months

4

(5)

water/week

plant growth

ei<0 ei>0 ei<0

water/week

residuals

ei<0

ei>0

ei<0

0

5

(6)

Nonconstancy of Error Variance (2.)

We diagnose nonconstant error variance by observing a residual plot against X and looking for structure.

Example:

Xi = salary

Yi = money spent on entertainment

6

(7)

salary

entertainment

salary

residuals 0

7

(8)

Nonindependence of Error Terms (3.)

We diagnose nonindependence of errors over time or in some sequence by observing a residual plot against time (or the sequence) and looking for a trend.

Example:

Xi = #hours worked Yi = #parts completed

8

(9)

#hours

#parts

#hours

residuals 0

9

(10)

But, if the data is like day 1: (X1, Y1)

day 2: (X2, Y2) ...

day n: (Xn, Yn)

then we can see the effect of learning.

10

(11)

#hours

residuals 0

day

residuals 0

11

(12)

Model fits all but a few observations (4.)

Example: LS Estimates with 2 outlying points (solid) and without them (dashed).

Rule of Thumb: If |ei| > 3, then check data point (ensure that it was not recorded incorrectly)!

Do not throw points away simply because they are outliers (relative to the assumed SLR)!

Outliers are detected by observing a plot of ei vs. Xi.

12

(13)

x

y

x

eiMSE −30+3

13

(14)

Errors not normally distributed (5.)

We assumed ²1, . . . , ²n iid N(0, σ2) but we can’t observe these error terms!

We will be convinced that this assumption is reasonable, if e1, . . . , en appear to be iid N(0,MSE).

Fact: If e1, . . . , en iid N(0,MSE), then one can show that the expected value of the ith smallest is

MSE

· z

µi 3/8 n + 1/4

¶¸

, i = 1,2, . . . , n

14

(15)

Then we have pairs

residual expected residual

emin

MSE h

z

³1−0.375 n+0.25

´i e2nd smallest

MSE h

z

³2−0.375 n+0.25

´i

... ...

emax

MSE h

z

³n−0.375 n+0.25

´i

15

(16)

Notice: If Y1, . . . , Y4 iid N(0, σ2), then E(Y1) = · · · = E(Y4) = 0, and E( ¯Y ) = 0, but

E(Ymin) = σ h

z

³1−0.375 4+0.25

´i

= σz(0.147) = −1.05σ, E(Y2nd) = σ

h z

³2−0.375 4+0.25

´i

= σz(0.382) = −0.30σ, E(Y3rd) = σ

h z

³3−0.375 4+0.25

´i

= σz(0.618) = +0.30σ, E(Ymax) = σ

h z

³4−0.375 4+0.25

´i

= σz(0.853) = +1.05σ,

Thus, we plot ei against their expected values (Normal Probability Plot) to detect departures from normality.

16

(17)

−2 −1 0 1 2 3

−2−1012

expected residuals

semistud. residuals

−2 −1 0 1 2 3

−1.5−1.0−0.50.00.51.01.5

expected residuals

semistud. residuals

17

(18)

Omission of important predictors (6.)

Example:

Xi = #years of education Yi = salary

Suppose we also have: Zi = #years at current job

18

(19)

#years of education

salary

#years in job semistud. residuals 0

19

(20)

Means, that a better model would be (Multiple Regression Model) E(Yi) = β0 + β1Xi + β2Zi

20

(21)

Lack of Fit Test

Formal Test for: H0 : E(Y ) = β0 + β1X HA : Not H0

We can’t use this test unless there are multiple Y ’s observed at at least 1 value of X.

Motivation: SLR restricts the means to be on a line! How much better could we do without this restriction?

21

(22)

X

Y

X1 X2 X3 X4

E^(Y)=b0+b1X Y^

2

Y2 Y2j

22

(23)

The less restricting model puts no structure on the means at each level of X. New Notation: Y values are observed at c different levels of X, say X1, X2, . . . , Xc.

nj such Y values, say Y1j, Y2j, . . . , Ynjj, are observed at level Xj, j = 1,2, . . . , c, nj 1.

Let Y¯j = n1

j

P

i Yij be the average of the Y ’s at Xj and Yˆj = b0 +b1Xj the fitted mean under the SLR.

The data now look like

at X1 : (Y11, X1),(Y21, X1), . . . ,(Yn11, X1) Y¯1 at X2 : (Y12, X2),(Y22, X2), . . . ,(Yn22, X2) Y¯2

...

at Xc : (Y1c, Xc),(Y2c, Xc), . . . ,(Yncc, Xc) Y¯c 23

(24)

Note, that

Yij Yˆj = (Yij Y¯j) + ( ¯Yj Yˆj) Let’s partition the SSE into 2 pieces

SSE = SSPE + SSLF where

Xc

j=1 nj

X

i=1

(Yij Yˆj)2 =

Xc

j=1 nj

X

i=1

(Yij Y¯j)2 +

Xc

j=1 nj

X

i=1

( ¯Yj Yˆj)2

If SSPE SSE, it says that the means (4) are close to the fitted values (¤).

That is, even if we fit a less restrictive model, we can’t reduce the amount of unexplained variability.

If SSLF SSE, the means (4) are far away from the fitted values (¤) and the (linear) restriction seems unreasonable.

24

(25)

Thus,

SSTO = SSE + SSR = SSLF + SSPE + SSR Formal Test for: H0 : E(Y ) = β0 + β1X

HA : E(Y ) 6= β0 + β1X Define

MSLF = SSLF

c 2 and MSPE = SSPE n c Test Statistic: F = MSLF

MSPE

Rejection Rule: reject if F > F(1 α;c 2, n c)

25

(26)

This fits nicely into our ANOVA Table:

Source of

variation SS df M S

Regression SSR 1 MSR

Error SSE n 2 MSE

Lack of Fit SSLF c 2 MSLF Pure Error SSPE n c MSPE

Total SSTO n 1

26

(27)

Example: Suppose that the house prices follow a SLR in #bedrooms. The estimated regression function is

E(price/1,000) =b −37.2 + 43.0(#bedrooms)

Variation SS df M S

Regression 62,578 1 62,578

Error 117,028 91 1,286

Lack of Fit 4,295 3 1,432 Pure Error 112,733 88 1,281

Total 179,606 92

Because F = MSLF/MSPE = 1,432/1,281 = 1.12 < F(0.95; 3,88) = 2.71 we do not reject H0.

27

(28)

1 2 3 4 5

50100150200250300

bedrooms

price

28

(29)

Remedies for Problems 1. to 6.

Many of the remedies rely on more advanced material, so we won’t see them until later.

Transformations are one way to fix problem 1. (nonlinear regression function) and a combination of problems 1. and 2. (nonconstant error variances).

29

(30)

Motivation: Consider the function y = x2

x y

0 0

1 1

2 4

3 9

4 16

0 1 2 3 4

051015

x

y

y=x2

x2 y

0 0

1 1

4 4

9 9

16 16

0 5 10 15

051015

x2

y

If you have (x1, y1),(x2, y2), . . . ,(xn, yn) and you know y = f(x), then (f(x1), y1),(f(x2), y2), . . . ,(f(xn), yn) will be on a straight line.

30

(31)

Two situations in which transformations may help.

Situation 1: nonlinear regression function with constant error variances (1.)

Note that E(Y ) doesn’t appear to be a linear function of X, that is, the points do not seem to lie on a line. The spread of the Y ’s at each level of X appears to be constant, however.

X

Y

X vs. Y

0 4 8 12 16

31

(32)

Remedy – Transform X We consider

X

Do not transform Y because this will disturb the spread of the Y ’s at each level X.

sqrt(X)

Y

sqrt(X) vs. Y

0 2 8 12 4

32

(33)

Situation 2: nonlinear regression function with nonconstant error variances (1.

with 2.)

Note that E(Y ) isn’t a linear func- tion of X.

The variance of the Y ’s at each level of X is increasing with X.

X

Y

X vs. Y

0 4 8 12 16

33

(34)

Remedy – Transform Y (or maybe X and Y ) We consider

Y

And hope that both problems are fixed.

X

sqrt(Y)

X vs. sqrt(Y)

0 4 8 12 16

34

(35)

Prototypes for Transforming Y

X

Y

X

Y

X

Y

Try

Y , log10 Y , or 1/Y

35

(36)

Prototypes for Transforming X

X

Y

X

Y

X

Y

Use

X or log10 X (left); X2 or exp(X) (middle); 1/X or exp(−X) (right).

36

Referenzen

ÄHNLICHE DOKUMENTE

We consider an electron in a molecule, which consists of three atoms A,

(The assumption of median as opposed to mean unbiasedness is not really important since there is no way to distinguish between the two cases in practice. The advantage to

To match the market stochasticity we introduce the new market-based price probability measure entirely determined by probabilities of random market time-series of the

Our quantitative findings suggest that migrants prefer a remittance to arrive as cash than as groceries when stakes are high ($400), but not when they are low ($200).. This result

F¨ ur die erste Person kann der Geburtstag frei gew¨ ahlt werden, f¨ ur die zweite gibt es dann 364 Tage, an denen die erste nicht Geburtstag hat, etc... that the maximum is sharp

We consider the change of coor- dinates on the surface from (x, z) to (y, z) and use the properties of Jacobi deter- minants established previously (see Eq.. (8) from the exercise

Berechnen Sie die Arbeit, die vom System in jedem Schritt geleistet wird und die W¨ arme, die vom Gas w¨ arend der isothermen Expansion (Kompression) aufgenommen (abgegeben) wird..

I If not specified otherwise, spaces are topological space, maps are continuous etc. I There might be typos on the exercise sheets, my bad, so