• Keine Ergebnisse gefunden

Extending univariate methods

N/A
N/A
Protected

Academic year: 2022

Aktie "Extending univariate methods"

Copied!
18
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Extending univariate methods

Applied Multivariate Statistics – Spring 2013

(2)

Overview

 Multivariate t-test (one sample, two samples)

 MANOVA

 Multivariate Linear Regression

(3)

Revision: One-sample z-Test

1. Model: 𝑋1, … , 𝑋𝑛 ∼ 𝑁 𝜇, 𝜎𝑋2 𝑖𝑖𝑑, 𝜎𝑋 known 2. Hypotheses: 𝐻0: 𝜇 = 𝜇0, 𝐻𝐴: 𝜇 ≠ 𝜇0

3. Test statistics:

If 𝐻0 is true: and thus 𝑇 ~ 𝑁(0,1) 4. Make observation of test statistics: t

5. Compute p-value: Probability of seeing something as extreme as t or even more extreme than t if 𝐻0 is true:

𝑃( 𝑇 > 𝑡 )

T = X¾n¡¹0

Xn

= p

nXn¾¡¹0

X

Xn » N0; ¾nX2 )

(4)

Revision: One-sample t-Test

1. Model: 𝑋1, … , 𝑋𝑛 ∼ 𝑁 𝜇, 𝜎𝑋2 𝑖𝑖𝑑, 𝜎𝑋 unknown 2. Hypotheses: 𝐻0: 𝜇 = 𝜇0, 𝐻𝐴: 𝜇 ≠ 𝜇0

3. Test statistics:

If 𝐻0 is true: and thus 𝑇 ~ 𝑡𝑛−1 4. Make observation of test statistics: t

5. Compute p-value: Probability of seeing something as extreme as t or even more extreme than t if 𝐻0 is true:

𝑃( 𝑇 > 𝑡 )

T = X¾^n¡¹0

Xn

= p

nXn¾^¡¹0

X

Xn » N0; ¾nX2 )

(5)

Hotelling’s one-sample T-Test: 𝚺 known

1. Model: 𝑋1, … , 𝑋𝑛 ∼ 𝑁 𝜇, Σ 𝑖𝑖𝑑, Σ known; p dimensions 2. Hypotheses: 𝐻0: 𝜇 = 𝜇0, 𝐻𝐴: 𝜇 ≠ 𝜇0

3. Test statistics:

If 𝐻0 is true:

4. Make observation of test statistics: t

5. Compute p-value: Probability of seeing something as extreme as t or even more extreme than t if 𝐻0 is true:

𝑃( 𝑇 > 𝑡 )

T = n(Xn ¡ ¹0)T§¡1(Xn ¡¹0) Squared Mahalanobis Distance between sample mean and 𝜇0

T » Â2p

(6)

Hotelling’s one-sample T-Test: 𝚺 unknown

1. Model: 𝑋1, … , 𝑋𝑛 ∼ 𝑁 𝜇, Σ 𝑖𝑖𝑑, Σ unknown; p dimensions 2. Hypotheses: 𝐻0: 𝜇 = 𝜇0, 𝐻𝐴: 𝜇 ≠ 𝜇0

3. Test statistics:

If 𝐻0 is true:

4. Make observation of test statistics: t

5. Compute p-value: Probability of seeing something as extreme as t or even more extreme than t if 𝐻0 is true:

𝑃( 𝑇 > 𝑡 )

T = n(Xn ¡¹0)TS¡1(Xn ¡ ¹0) Estimated Sq. Mahalanobis Distance between sample mean and 𝜇0

T » Fp;n¡p

R: Function “HotellingsT2” in package “ICSNP”

(7)

F distribution

Fm;n = ÂÂ2m2=m

n=n

(8)

Example: Change in Pulmonary Response after Exposure to Cotton Dust

12 worker:

measure lung capacity (3 dimensions)

Same 12 worker:

measure lung capacity again

(3 dimensions) 6 hours cotton dust

Paired test: Take difference

for each worker and each variable

(9)

Revision: Two-sample t-Test

1. Model: 𝑋1, … , 𝑋𝑛 ∼ 𝑁 𝜇𝑋, 𝜎𝑋2 𝑖𝑖𝑑, 𝜎𝑋 unknown 𝑌1, … , 𝑌𝑚 ∼ 𝑁 𝜇𝑌, 𝜎𝑋2 𝑖𝑖𝑑

2. Hypotheses: 𝐻0: 𝜇𝑋 = 𝜇𝑌, 𝐻𝐴: 𝜇𝑋 ≠ 𝜇𝑌 3. Test statistics:

If 𝐻0 is true: 𝑇 ~ 𝑡𝑛+𝑚−2

4. Make observation of test statistics: t

5. Compute p-value: Probability of seeing something as extreme as t or even more extreme than t if 𝐻0 is true:

𝑃( 𝑇 > 𝑡 )

T = (Xn¡Y¾^n)¡X¡¹Y)

Xn¡Y n

Can be extended to 𝜎𝑋 ≠ 𝜎𝑌

(10)

Hotelling’s Two-Sample T-Test: 𝚺 unkown, but equal in both groups

1. Model: 𝑋1, … , 𝑋𝑛 ∼ 𝑀𝑉𝑁 𝜇𝑋, Σ 𝑖𝑖𝑑, Σ unknown, p dims.

𝑌1, … , 𝑌𝑚 ∼ 𝑀𝑉𝑁 𝜇𝑌, Σ 𝑖𝑖𝑑

2. Hypotheses: 𝐻0: 𝜇𝑋 = 𝜇𝑌, 𝐻𝐴: 𝜇𝑋 ≠ 𝜇𝑌 3. Test statistics:

If 𝐻0 is true: 𝑇 ~𝐹𝑝,𝑛+𝑚−𝑝−1

4. Make observation of test statistics: t

5. Compute p-value: Probability of seeing something as extreme as t or even more extreme than t if 𝐻0 is true:

𝑃( 𝑇 > 𝑡 )

T = (n+m)(n+m(n+m¡p¡1)nm¡2)p(Xn ¡ Y n)TS¡1(Xn ¡ Y n)

R: Function “HotellingsT2” in package “ICSNP”

(11)

Example: Quality control for screws

20 screws:

- winding [mm]

- length [mm]

- diameter [mm]

Plant struck by lightning:

Machines still adjusted correctly?

15 screws:

- winding - length - diameter

(12)

Revision: One-way ANOVA

 Are the expected values in three groups the same?

 ANOVA:

- Compare variation within groups and between groups - Assume normality

G = 1 G = 2 G = 3

Common expected value

plausible ?

(13)

MANOVA

 Are the multi-dimensional expected values in three groups the same?

 MANOVA:

- Compare within groups and between groups covariance matrices (test statistics based on eigenvalues)

- Assume normality

G = 1 G = 2 G = 3

Common expected value

plausible ?

x x x

(14)

Revision: Univariate (Multiple) Linear Regression

 N samples, p predictors, 1 response

 Univariate Linear Regression model:

For N samples using matrix notation:

where

Y: N*1 matrix, X: N*(p+1), 𝛽: (p+1)*1, E: N*1

 Criterion to optimize:

 Solution:

Y = ¯0 +Pp

j=1 Xj¯j +² = f(X) +²

RSS(¯) = PN

i=1(yi ¡f(xi))2

¯^ = (XTX)¡1XTY Y = + E

(15)

Multivariate (Multiple) Linear Regression

 N samples, p predictors, K responses

 Univariate Linear Regression model for each response:

𝐶𝑜𝑣 𝜖 = Σ, errors between responses can be correlated For N samples using matrix notation:

where

Y: N*K matrix, X: N*(p+1), B: (p+1)*K, E: N*K

 Criterion to optimize:

 Solution:

Yk = ¯0k +Pp

j=1 Xj¯jk + ²k = fk(X) +²k

RSS(B; §) = PN

i=1(yi ¡f(xi))T§¡1(yi ¡f(xi)) B^ = (XTX)¡1XTY

Y = XB + E

(16)

Is MANOVA and Multivariate Linear Regression useful?

 Multivariate Regression, MANOVA not well supported in statistical software (including R)

 Useful, if you want to test if a predictor has an influence on any response

 Possible in theory, but not well supported:

- simultaneous confidence intervals for several parameters - Tests among parameters of different responses

 R: Function “lm” with matrix as y and “summary(…, test =

“Wilks”)”

(17)

Concepts to know

 Hotelling’s T-test

 Idea of MANOVA and Multivariate Regression

(18)

R functions to know

 “HotellingsT2”

 “Manova”

 “lm” with y being a matrix

Referenzen

ÄHNLICHE DOKUMENTE

In the subjects who had been using fluoridated salt from 1966 to 1985, caries prevalence was intermediate but consistently lower than that of the residents of the villages where

Of the old patients accepted for a diag- nostic coronary angiogram because of suspected or previously documented coronary artery disease at our centre, about 40% subsequently

Despite the potential importance of lung metabolism for respiratory therapies, relatively little is known about the actual activity and protein abundance of

Keywords: High mountain ecology, arctic-alpine environments, climate change, land use and land cover change, tree line alteration, range shifts, altitudinal zonation.. 1

Since we investigate food scares and the corresponding changes in the demand, we focus on information strategies starting with negative information releases regarding the food

DIFFERENT AGE GROUPS IN THE POPULATION HAVE DIFFERENT SUSCEP- TIBILITIES TO ENVIRONMENTAL POLLUTANTS.. Age is one of t h e primary causes of differential

56 c, 831Ð837 (2001); received March 29/April 27, 2001 Black Lipid Membrane, Magnetic Field Effect, Quantum Effects.. Biological effects of weak magnetic fields are widespread,

► This nested case–control study of German uranium miners suggests that the risk of first acute myocardial infarction (AMI) might increase with RQ exposure.. ► This observation