• Keine Ergebnisse gefunden

Multidimensional Scaling

N/A
N/A
Protected

Academic year: 2022

Aktie "Multidimensional Scaling"

Copied!
28
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Multidimensional Scaling

Applied Multivariate Statistics – Spring 2012

(2)

Outline

 Fundamental Idea

 Classical Multidimensional Scaling

 Non-metric Multidimensional Scaling

2 Appl. Multivariate Statistics - Spring 2012

(3)

Basic Idea

How to represent in two dimensions?

(4)

Idea 1: Projection

4 Appl. Multivariate Statistics - Spring 2012

(5)

Idea 2: Squeeze on table

Close points stay close

(6)

Which idea is better?

6 Appl. Multivariate Statistics - Spring 2012

(7)

Idea of MDS

 Represent high-dimensional point cloud in few (usually 2) dimensions keeping distances between points similar

 Classical/Metric MDS: Use a clever projection R: cmdscale

 Non-metric MDS: Squeeze data on table R: isoMDS

(8)

Classical MDS

 Problem: Given euclidean distances among points, recover the position of the points!

 Example: Road distance between 21 European cities (almost euclidean, but not quite)

8 Appl. Multivariate Statistics - Spring 2012

(9)

Classical MDS

 First try:

(10)

Classical MDS

 Flip axes:

10 Appl. Multivariate Statistics - Spring 2012

Can identify points up to - shift

- rotation - reflection

(11)

Classical MDS

 Another example: Airpollution in US cities

 Range of manu and popul is much bigger than range of wind

 Need to standardize to give every variable equal weight

(12)

Classical MDS

12 Appl. Multivariate Statistics - Spring 2012

(13)

Classical MDS: Theory

 Input: Euclidean distances between n objects in p dimensions

 Output: Position of points up to rotation, reflection, shift

 Two steps:

- Compute inner products matrix B from distance - Compute positions from B

(14)

Classical MDS: Theory – Step 1

 Inner products matrix B = XXT

 Connect to distance:

 Center points to avoid shift invariance

 Invert realtionship:

“doubly centered”

14 Appl. Multivariate Statistics - Spring 2012

d2ij = bii +bjj ¡ 2bij

bij = ¡12(d2ij ¡d2i: ¡ d2:j +d2::)

(15)

Classical MDS: Theory – Step 2

 Since B = XXT, we need the “square root” of B

 B is a symmetric and positive definite n*n matrix

 Thus, B can be diagonalized:

D is a diagonal matrix with on diagonal (“eigenvalues”)

V contains as columns normalized eigenvectors

 Some eigenvalues will be zero; drop them:

 Take “square root”:

B = V ¤VT

¸1 ¸ ¸2 ¸ ::: ¸ ¸n

X = V1¤¡1 12

B = V1¤1V1T

(16)

Classical MDS: Low-dim representation

 Keep only few (e.g. 2) largest eigenvalues and corresponding eigenvectors

 The resulting X will be the low-dimensional representation we were looking for

 Goodness of fit (GOF) if we reduce to m dimensions:

(should be at least 0.8)

 Finds “optimal” low-dim representation: Minimizes

16 Appl. Multivariate Statistics - Spring 2012

GOF =

Pm i=1¸i

Pn i=1¸i

S = Pn i=1

Pn j=1

³d2ij ¡ (d(m)ij )2´

(17)

Classical MDS: Pros and Cons

+ Optimal for euclidean input data

+ Still optimal, if B has non-negative eigenvalues (pos. semidefinite)

+ Very fast

- No guarantees if B has negative eigenvalues

However, in practice, it is still used then. New measures for Goodness of fit:

GOF =

Pm

i=1 j¸ij

Pn

i=1 j¸ij

GOF =

Pm i=1¸2i

Pn

i=1¸2i GOF =

Pm

i=1max(0;¸i)

Pn

i=1max(0;¸i)

(18)

Non-metric MDS: Idea

 Sometimes, there is no strict metric on original points

 Example: How much do you like the portraits?

(1: Not at all, 10: Very much)

18 Appl. Multivariate Statistics - Spring 2012

2 6 9

OR 1 5 10 ??

(19)

Non-metric MDS: Idea

 Absolute values are not that meaningful

 Ranking is important

 Non-metric MDS finds a low-dimensional representation, which

respects the ranking of distances

> >

(20)

Non-metric MDS: Theory

 is the true dissimilarity, dij is the distance of representation

 Minimize STRESS ( is an increasing function):

 Optimize over both position of points and µ

 is called “disparity”

 Solved numerically (isotonic regression);

Classical MDS as starting value;

very time consuming

20 Appl. Multivariate Statistics - Spring 2012

S = P

i<j(µ(±ij)¡dij)2

P

i<j d2ij

±ij

µ

d^ij = µ(±ij)

(21)

Non-metric MDS: Example for intuition (only)

True points in

high dimensional space

3

2 5

A B

C

dAB < dBC < dAC

STRESS = 19.7

Compute best representation

(22)

Non-metric MDS: Example for intuition (only)

22 Appl. Multivariate Statistics - Spring 2012

True points in

high dimensional space

2.7

2 4.8

A B

C

dAB < dBC < dAC

STRESS = 20.1

Compute best representation

(23)

Non-metric MDS: Example for intuition (only)

True points in

high dimensional space

2.9

2 5.2

A B

C

dAB < dBC < dAC

STRESS = 18.9

Stop if minimal STRESS is found.

We will finally represent the distances dAB = 2, dBC = 2.9, dAC = 5.2

Compute best representation

(24)

Non-metric MDS: Pros and Cons

+ Fulfills a clear objective without many assumptions (minimize STRESS)

+ Results don’t change with rescaling or monotonic variable transformation

+ Works even if you only have rank information

- Slow in large problems

- Usually only local (not global) optimum found - Only gets ranks of distances right

24 Appl. Multivariate Statistics - Spring 2012

(25)

Non-metric MDS: Example

 Do people in the same party vote alike?

 Agreement of 15 congressman in 19 votes

(26)

Non-metric MDS: Example

26 Appl. Multivariate Statistics - Spring 2012

(27)

Concepts to know

 Classical MDS:

- Finds low-dim projection that respects distances - Optimal for euclidean distances

- No clear guarantees for other distances - fast

 Non-metric MDS:

- Squeezes data points on table

- respects only rankings of distances - (locally) solves clear objective

- slow

(28)

R commands to know

 cmdscale included in standard R distribution

 isoMDS from package “MASS”

28 Appl. Multivariate Statistics - Spring 2012

Referenzen

ÄHNLICHE DOKUMENTE

It is important to realize, however, that the problem of the Ising-nematic quantum critical metal at finite temperature differs from the disordered electron gas in one crucial

In this note, I am going to give proofs to a few results about tensor products as well as tensor, pseudoexterior, symmetric and exterior powers of k-modules (where k is a

toy models (2D strings, black hole evaporation,. ) classically: spherically symmetric sector of general relativity (critical collapse). semi-classically: near horizon

Ordination method non-metric multidimensional scaling (MetaMDS) with predefined K=3. Ordination method non-metric multidimensional scaling (MetaMDS) with

• Three applications illustrated the flexibility of multidimensional scaling and its scalability to larger problems: The joint visual analysis of impor- tance and hierarchy in

Metric fixed point theory, nonexpansive maps, invariant subspace problem, metric

We examined such point sets for n = d + 1 and received the following table of numbers of nonisomorphic integral simplices by computer calculations.. Here we call the largest

The stress value reflects how well the ordination summarizes the observed distances among the samples. Several “rules of thumb” for stress have been proposed, but have been criticized