• Keine Ergebnisse gefunden

Solution to Series 3

N/A
N/A
Protected

Academic year: 2022

Aktie "Solution to Series 3"

Copied!
7
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Solution to Series 3

1. Estimate all effects in the following 3×3 designs. Do interactions exist?

a)

B Total

1 2 3

1 10 15 20 15

A 2 10 15 20 15

3 10 15 20 15

Total 10 15 20 15

interaction effects B main effects A

1 2 3

1 0 0 0 0

A 2 0 0 0 0

3 0 0 0 0

main effects B -5 0 5 µˆ= 15

b)

B Total

1 2 3

1 26 22 21 23

A 2 23 19 18 20

3 17 13 12 14

Total 22 18 17 19

interaction effects B main effects A

1 2 3

1 0 0 0 4

A 2 0 0 0 1

3 0 0 0 -5

main effects B 3 -1 -2 µˆ= 19

c)

B Total

1 2 3

1 26 23 20 23

A 2 18 19 23 20

3 13 15 14 14

Total 19 19 19 19

interaction effects B main effects A

1 2 3

1 3 0 -3 4

A 2 -2 -1 3 1

3 -1 1 0 -5

main effects B 0 0 0 µˆ= 19

2. a) Plot the data.

> drill <- read.table(file="http://stat.ethz.ch/Teaching/Datasets/drill.txt",header=TRUE)

> drill$A <- as.factor(drill$A)

> drill$B <- as.factor(drill$B)

> drill$C <- as.factor(drill$C)

> drill$D <- as.factor(drill$D)

> par(mfrow=c(2,2))

> plot(drill$A,drill$Y,xlab="A")

> plot(drill$B,drill$Y,xlab="B")

> plot(drill$C,drill$Y,xlab="C")

> plot(drill$D,drill$Y,xlab="D")

(2)

−1 1

51015

A

−1 1

51015

B

−1 1

51015

C

−1 1

51015

D

From the plots we see that there could be an significant effect for the factors B, C and D but probably not for A. Also the interactions BC and CD look quite promising from the interaction plots.

b) Do an analysis with all main effects and all interactions.

> mod1 <- aov(Y~A*B*C*D,data=drill)

> summary(mod1)

Df Sum Sq Mean Sq

A 1 3.33 3.33

B 1 43.49 43.49

C 1 165.51 165.51

D 1 20.88 20.88

A:B 1 0.09 0.09

A:C 1 1.42 1.42

B:C 1 9.06 9.06

A:D 1 2.84 2.84

B:D 1 0.78 0.78

C:D 1 10.21 10.21

A:B:C 1 0.11 0.11

A:B:D 1 1.39 1.39

A:C:D 1 2.28 2.28

B:C:D 1 0.13 0.13

A:B:C:D 1 1.16 1.16

We get a strange result due to the fact that we do not have any degrees of freedom left for the residuals.

We have 15 effects for factors and the overall mean with, that is 16 df and only 16 observations. The model is therefore saturated.

c) Do an analysis with all main effects and all 2-fold interactions.

> mod2 <- aov(Y~A+B+C+D+A:B+A:C+A:D+B:C+B:D+C:D,data=drill)

> summary(mod2)

Df Sum Sq Mean Sq F value Pr(>F)

A 1 3.33 3.33 3.285 0.12968

B 1 43.49 43.49 42.894 0.00124 **

C 1 165.51 165.51 163.225 5.23e-05 ***

D 1 20.88 20.88 20.597 0.00618 **

A:B 1 0.09 0.09 0.089 0.77774

A:C 1 1.42 1.42 1.397 0.29044

(3)

A:D 1 2.84 2.84 2.800 0.15512

B:C 1 9.06 9.06 8.935 0.03048 *

B:D 1 0.78 0.78 0.772 0.41969

C:D 1 10.21 10.21 10.067 0.02473 * Residuals 5 5.07 1.01

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We see that the main factors B, C and D are significant on a 5%level as well as the 2-fold interactions BC and CD.

d) Check the residuals and improve your model if necessary.

> par(mfrow=c(2,2),mar=c(3,2,3,2))

> plot(mod2)

2 4 6 8 10 12 14

−1.00.00.51.0

Fitted values

Residuals vs Fitted

16

8 15

−2 −1 0 1 2

−1012

Theoretical Quantiles

Standardized residuals

Normal Q−Q

16

8 15

2 4 6 8 10 12 14

0.00.40.81.2

Scale−Location

16

8 15

−2−1012

Standardized residuals

−1 1

A :

Constant Leverage:

Residuals vs Factor Levels

16

8 15

> library(faraway)

> halfnorm(mod2$effects[-1],labs=names(mod2$effects[-1]))

(4)

0.0 0.5 1.0 1.5

024681012

Half−normal quantiles

Sorted Data

B1

C1

We see that probably the heteroscedasticity assumption is violated. We try a log-transform of the response variable.

> drill.e <- drill

> drill.e$Y <- log(drill$Y)

> mod3 <- aov(Y~A+B+C+D+A:B+A:C+A:D+B:C+B:D+C:D,data=drill.e)

> summary(mod3)

Df Sum Sq Mean Sq F value Pr(>F) A 1 0.068 0.068 10.119 0.024511 * B 1 1.346 1.346 201.504 3.12e-05 ***

C 1 5.331 5.331 798.098 1.04e-06 ***

D 1 0.427 0.427 63.854 0.000496 ***

A:B 1 0.005 0.005 0.707 0.438754 A:C 1 0.000 0.000 0.064 0.810121 A:D 1 0.018 0.018 2.680 0.162530 B:C 1 0.010 0.010 1.509 0.273906 B:D 1 0.001 0.001 0.134 0.729647 C:D 1 0.039 0.039 5.768 0.061498 . Residuals 5 0.033 0.007

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> par(mfrow=c(2,2),mar=c(3,2,3,2))

> plot(mod3)

(5)

0.5 1.0 1.5 2.0 2.5

−0.100.000.050.10

Fitted values

Residuals vs Fitted

15 7

16

−2 −1 0 1 2

−2−1012

Theoretical Quantiles

Standardized residuals

Normal Q−Q

15

7 16

0.5 1.0 1.5 2.0 2.5

0.00.40.81.2

Scale−Location

15 7

16

−2−1012

Standardized residuals

−1 1

A :

Constant Leverage:

Residuals vs Factor Levels

15 7

16

We see that the Tukey-Anscombe plot looks better after the log-transform.

3. a) Plot the data.

> soft <- read.table(file="http://stat.ethz.ch/Teaching/Datasets/softdrinkANOVA.txt",header=TRUE)

> soft$sugar <- as.factor(soft$sugar)

> soft$soda <- as.factor(soft$soda)

> soft$water <- as.factor(soft$water)

> soft$temp <- as.factor(soft$temp)

> par(mfrow=c(2,2))

> plot(soft$sugar,soft$score,sub="sugar")

> plot(soft$soda,soft$score,sub="soda")

> plot(soft$water,soft$score,sub="water")

> plot(soft$temp,soft$score,sub="temp")

(6)

1 2

160170180190200

sugar

1 2

160170180190200

soda

1 2

160170180190200

1 2

160170180190200

From the plots we can say that probably the factors sugar, soda and temp have an significant influence on the flavor of softdrinks.

b) Analyze the data. Which factors are important?

> modS <- aov(score~sugar*soda*water*temp,data=soft)

> summary(modS)

Df Sum Sq Mean Sq F value Pr(>F) sugar 1 2312.0 2312.0 241.778 4.45e-11

soda 1 946.1 946.1 98.941 2.96e-08

water 1 21.1 21.1 2.209 0.157

temp 1 561.1 561.1 58.680 9.69e-07

sugar:soda 1 3.1 3.1 0.327 0.575

sugar:water 1 0.1 0.1 0.013 0.910

soda:water 1 0.5 0.5 0.052 0.822

sugar:temp 1 666.1 666.1 69.660 3.19e-07

soda:temp 1 12.5 12.5 1.307 0.270

water:temp 1 12.5 12.5 1.307 0.270

sugar:soda:water 1 4.5 4.5 0.471 0.503 sugar:soda:temp 1 0.0 0.0 0.000 1.000 sugar:water:temp 1 2.0 2.0 0.209 0.654 soda:water:temp 1 0.1 0.1 0.013 0.910 sugar:soda:water:temp 1 21.1 21.1 2.209 0.157

Residuals 16 153.0 9.6

sugar ***

soda ***

water

temp ***

sugar:soda sugar:water soda:water

sugar:temp ***

soda:temp water:temp sugar:soda:water sugar:soda:temp sugar:water:temp

(7)

soda:water:temp sugar:soda:water:temp Residuals

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We see that the factors sugar, soda and temp are very significant. This supports our sugestions from taska). We also see that the interaction of sugar and temp is highly significant.

> par(mfrow=c(2,2),mar=c(3,2,3,2))

> plot(modS)

160 170 180 190

−4−2024

Fitted values

Residuals vs Fitted

3 4 26

−2 −1 0 1 2

−1.5−0.50.51.5

Theoretical Quantiles

Standardized residuals

Normal Q−Q

3

4 26

160 170 180 190

0.00.40.81.2

Scale−Location

3 4 26

−2−1012

Standardized residuals

1 2

sugar :

Constant Leverage:

Residuals vs Factor Levels

4

3

26

> library(faraway)

> halfnorm(modS$effects[-1],labs=names(modS$effects[-1]))

0.0 0.5 1.0 1.5 2.0

01020304050

Half−normal quantiles

Sorted Data

● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●

soda2

sugar2

Referenzen

ÄHNLICHE DOKUMENTE

The model without the predictor smoking fits sufficiently well. Moreover, the result of the Chi-squared test for the Null deviance and both deviance based individual tests

Disregarding these, the mean of the residuals is negative, so the zero expectation assumption seems to be violated.. The constant variance assumption seems to be fine (without

The model without the predictor smoking fits sufficiently well. Moreover, the result of the Chi-squared test for the Null deviance and both deviance based individual tests

Order 17 seems quite high (and hence difficult to interpret), so orders 6 or 9 would be preferred. However, we cannot see by eye whether order 6 or 9 is really sufficient; this can

This means if we calculate exp(r.pred$pred), we get the predicted values on the original scale. In Part b), however, the formation of differences needs reversing (by taking

1. a) An experimenter wishes to compare four treatments in blocks of two runs. Find a BIBD with six blocks. Find a BIBD with seven blocks. Analyze these data in a split plot anova..

Der Traininszustand (Variable “Exercise”) scheint gem¨ ass dem Plot etwas erstaunlicherweise keinen Einfluss auf die Herzfrequenz zu haben. Weiter scheint es, dass zwis- chen

Applied Statistical Regression Dr. Thus a line with slope c and axis intercept d is drawn in this subtask. Which means the line drawn is described by the equation x = cy + d, i.e..