• Keine Ergebnisse gefunden

Visualization 1

N/A
N/A
Protected

Academic year: 2022

Aktie "Visualization 1"

Copied!
20
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Visualization 1

Applied Multivariate Statistics – Spring 2013

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAAAA

(2)

Goals

 Covariance, Correlation (true / sample version)

 Test for zero correlation: Fisher’s z-Transformation

 Scatterplot / Scatterplotmatrix

 Covariance matrix / Correlation matrix

 Multivariate Normal Distribution

 Mahalanobis distance

(3)

Visualization in 1d

(4)

Normal distribution in 1d:

Most common model choice

'

¹;¾2

(x) =

p 1

2¼¾2

exp( ¡

12

¢

(x¡¾2¹)2

)

(5)

'

¹;¾2

(x) =

p 1

2¼¾2

exp( ¡

12

¢

(x¡¾2¹)2

)

Normal distribution in 1d:

Most common model choice

Squared Mahalanobis Distance

=

Sq. Distance from mean in standard deviations

(6)

One variable: Expected value and variance

 Expected value: 𝜇 = 𝐸 𝑋 = 𝑥𝑓 𝑥 𝑑𝑥 Estimate: Mean 𝜇 = 𝑥 = 𝑛1 ∑𝑥𝑖

 Variance:

𝜎𝑋2 = 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 − 𝐸 𝑋 2 = 𝑥 − 𝐸 𝑋 2𝑑𝑥 Estimate: Sample Variance

𝜎 = 1𝑋2

𝑛 − 1 ∑ 𝑥𝑖 − 𝑥 2

 Standard deviation: 𝜎𝑋 = 𝑉𝑎𝑟 𝑋

Estimate: Square root of Sample Variance

(7)

Two variables: Covariance and Correlation

 Covariance:

 Correlation:

 Sample covariance:

 Sample correlation:

 Correlation is invariant to changes in units, covariance is not

(e.g. kilo/gram, meter/kilometer, etc.)

Cov(X; Y ) = E[(X ¡E[X])(Y ¡E[Y])] 2 [¡1;1]

Corr(X; Y ) = Cov(X;Y¾ )

X¾Y 2 [¡1; 1]

Cov(x; y) =d n¡11 Pn

i=1(xi ¡ x)(yi ¡ y)

rxy = Cor(x; y) =d Cov(x;y)c

^

¾x¾^y

(8)

Scatterplot: Correlation is scale invariant

(9)

Intuition and pitfalls for correlation Correlation = LINEAR relation

Source: Wikipedia

(10)

Test for zero correlation: Fisher’s z-Test

 X, Y (bivariate) normal distributed with true correlation ½

 Take n samples

 Compute sample correlation r

Compute

Compute

 For large n:

 Use cor.test() in R to test and get confidence intervals

z = 12 log¡1+r

1¡r

¢

» = 12 log¡1+½

1¡½

¢

pn¡ 1(z ¡») » N(0;1)

(11)

Many dimensions: Scatterplot matrix

(12)

Covariance matrix / correlation matrix:

Table of pairwise values

 True covariance matrix:

 True correlation matrix:

 Sample covariance matrix:

Diagonal: Variances

 Sample correlation matrix:

Diagonal: 1

§ij = Cov(Xi; Xj) Cij = Cor(Xi; Xj)

Sij = Cov(xd i; xj)

Rij = Cor(xd i; xj)

(13)

Multivariate Normal Distribution:

Most common model choice

f(x;¹;§) = (2¼)(p=2)1

j§j(1=2) exp¡

¡ 12 ¢ (x¡ ¹)T§¡1(x¡ ¹)¢

(14)

Multivariate Normal Distribution:

Funny facts

If X1, …, Xp multivariate normal, then

 every linear combination Y = a1 X1 + … + ap Xp is normally distributed

 every projection on a subspace is multivariate normally distributed

If margins are normally distributed, then it is NOT

GUARANTEED that the underlying distribution is multivariate normal

(i.e., “multivariate” is stronger than just normal margins)

(15)

Multivariate Normal Distribution: Example 1 1000 random samples

§ =

µ 1 0 0 1

¹ =

µ 0 0

;

Variance along X1

Variance along X2 Covariance btw.

X1 and X2

(16)

Multivariate Normal Distribution: Example 2 1000 random samples

§ =

µ 10 3 3 2

¹ =

µ 5 10

;

Variance along X1

Variance along X2 Covariance btw.

X1 and X2

(17)

Multivariate Normal Distribution:

Most common model choice (p dimensions)

f(x;¹;§) = (2¼)(p=2)1

j§j(1=2) exp¡

¡ 12 ¢ (x¡ ¹)T§¡1(x¡ ¹)¢

Sq. Mahalanobis Distance MD2(x)

=

Sq. distance from mean in standard deviations

IN DIRECTION OF X

(18)

Mahalanobis distance: Example

§ =

µ 25 0 0 1

¹ =

µ 0 0

;

(20,0)

MD = 4 MD = 10

MD = 7.3

(10,7) (0,10)

(19)

Concepts to know

 Covariance, Correlation (true / sample version)

 Test for zero correlation: Fisher’s z-Transformation

 Scatterplot / Scatterplotmatrix

 Covariance matrix / Correlation matrix

 Multivariate Normal Distribution

 Mahalanobis distance

(20)

R commands to know

 read.csv, head, str, dim

 colMeans, cov, cor

 mvrnorm, t, solve, %*%

Referenzen

ÄHNLICHE DOKUMENTE

The events in Egypt that occurred after 3 July when the army deposed President Mohamed Morsi then crushed the Muslim Brotherhood’s counter demonstrations, resulting in hundreds

Markedly better learning outcomes in private schools may also help to explain why around half of all parents in Uttar Pradesh—where the average rural household spends 57 percent

We use Erd¨ os’ probabilistic method: if one wants to prove that a structure with certain desired properties exists, one defines an appropriate probability space of structures and

2 In particular we do not allow that all voters cast abstain/negative votes for all candidates. With this requirement we avoid stating that all candidates must be elected in case

To match the market stochasticity we introduce the new market-based price probability measure entirely determined by probabilities of random market time-series of the

The complimentary operation of the instrument as an underfocussed medium- resolution shadow microscope [3] has recently been accompanied by the introduction of such techniques

In the last two years, the Baltimore Museum, San Francisco Museum of Modern Art, and Art Gallery of Ontario have deacces- sioned works by white male artists in order to sell them

 Correlation is invariant to changes in units, covariance is