• Keine Ergebnisse gefunden

Multidimensional Scaling

N/A
N/A
Protected

Academic year: 2022

Aktie "Multidimensional Scaling"

Copied!
28
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Multidimensional Scaling

Applied Multivariate Statistics – Spring 2013

TexPoint fonts used in EMF.

(2)

Outline

 Fundamental Idea

 Classical Multidimensional Scaling

 Non-metric Multidimensional Scaling

Appl. Multivariate Statistics - Spring 2013

(3)

Basic Idea

How to represent in two dimensions?

(4)

Idea 1: Projection

Appl. Multivariate Statistics - Spring 2013

(5)

Idea 2: Squeeze on table

Close points stay close

(6)

Which idea is better?

Appl. Multivariate Statistics - Spring 2013

(7)

Idea of MDS

 Represent high-dimensional point cloud in few (usually 2) dimensions keeping distances between points similar

 Classical/Metric MDS: Use a clever projection R: cmdscale

 Non-metric MDS: Squeeze data on table, only conserve ranks

R: isoMDS

(8)

Classical MDS

 Problem: Given euclidean distances among points, recover the position of the points!

 Example: Road distance between 21 European cities (almost euclidean, but not quite)

Appl. Multivariate Statistics - Spring 2013

(9)

Classical MDS

 First try:

(10)

Classical MDS

 Flip axes:

Appl. Multivariate Statistics - Spring 2013

Can identify points up to - shift

- rotation - reflection

(11)

Classical MDS

 Another example: Airpollution in US cities

 Range of manu and popul is much bigger than range of wind

 Need to standardize to give every variable equal weight

(12)

Classical MDS

Appl. Multivariate Statistics - Spring 2013

(13)

Classical MDS: Theory

 Input: Euclidean distances between n objects in p dimensions

 Output: Position of points up to rotation, reflection, shift

 Two steps:

- Compute inner products matrix B from distance - Compute positions from B

(14)

Classical MDS: Theory – Step 1

 Inner products matrix B = XXT

 Connect to distance:

 Center points to avoid shift invariance

 Invert relationship:

“doubly centered”

(Hint for middle of page 108: Plug in (4.3) and equations on top of page 108 to show that the expression involving d’s is equal to bij)

 Thus, we obtained B from the distance matrix

Appl. Multivariate Statistics - Spring 2013

d2ij = Pq

k=1(xik ¡xjk)2 = ::: = bii +bjj ¡2bij

bij = ¡12(d2ij ¡d2i: ¡d2:j + d2::) bij = Pq

k=1 xikxjk

n * q data matrix

³x = 0 ! Pn

i=1 xik = 0 ! P

i or j bij = 0´

(15)

Classical MDS: Theory – Step 2

 Since B = XXT, we need the “square root” of B

 B is a symmetric and positive definite n*n matrix

 Thus, B can be diagonalized:

D is a diagonal matrix with on diagonal (“eigenvalues”)

V contains as columns normalized eigenvectors

 Some eigenvalues will be zero; drop them:

 Take “square root”:

 Thus we obtained the position of points from the distances between all points

B = V ¤VT

¸1 ¸ ¸2 ¸ ::: ¸ ¸n

B = V1¤1V1T X = V1¤112

(16)

Classical MDS: Low-dim representation

 Keep only few (e.g. 2) largest eigenvalues and corresponding eigenvectors

 The resulting X will be the low-dimensional representation we were looking for

 Goodness of fit (GOF) if we reduce to m dimensions:

(should be at least 0.8)

 Finds “optimal” low-dim representation: Minimizes

Appl. Multivariate Statistics - Spring 2013

GOF =

Pm i=1¸i

Pn i=1¸i

S = Pn i=1

Pn j=1

³d2ij ¡ (d(m)ij )2´

(17)

Classical MDS: Pros and Cons

+ Optimal for euclidean input data

+ Still optimal, if B has non-negative eigenvalues (pos. semidefinite)

+ Very fast

- No guarantees if B has negative eigenvalues

However, in practice, it is still used then. New measures for Goodness of fit:

GOF =

Pm

i=1 j¸ij

Pn

i=1 j¸ij

GOF =

Pm i=1¸2i

Pn

i=1¸2i GOF =

Pm

i=1max(0;¸i)

Pn

i=1max(0;¸i) Used in R function “cmdscale”

(18)

Non-metric MDS: Idea

 Sometimes, there is no strict metric on original points

 Example: How beautiful are these persons?

(1: Not at all, 10: Very much)

Appl. Multivariate Statistics - Spring 2013

2 6 9

OR 1 5 10 ??

(19)

Non-metric MDS: Idea

 Absolute values are not that meaningful

 Ranking is important

 Non-metric MDS finds a low-dimensional representation, which

respects the ranking of distances

> >

(20)

Non-metric MDS: Theory

 is the true dissimilarity, dij is the distance of representation

 Minimize STRESS ( is an increasing function):

 Optimize over both position of points and µ

 is called “disparity”

 Solved numerically (isotonic regression);

Classical MDS as starting value;

very time consuming

Appl. Multivariate Statistics - Spring 2013

S = P

i<j(µ(±ij)¡dij)2

P

i<j d2ij

±ij

µ

d^ij = µ(±ij)

(21)

Non-metric MDS: Example for intuition (only)

True points in

high dimensional space

3

2 5

A B

C

STRESS = 19.7

Compute best representation

±AB < ±BC < ±AC

(22)

Non-metric MDS: Example for intuition (only)

Appl. Multivariate Statistics - Spring 2013

True points in

high dimensional space

2.7

2 4.8

A B

C

STRESS = 20.1

Compute best representation

±AB < ±BC < ±AC

(23)

Non-metric MDS: Example for intuition (only)

True points in

high dimensional space

2.9

2 5.2

A B

C

STRESS = 18.9

We will finally represent the

“transformed true distances”

(called disparities):

Compute best representation

±AB < ±BC < ±AC d^AB = 2; d^BC = 2:9; d^AC = 5:2

instead of the true distances:

±AB = 2; ±BC = 3; ±AC = 5

Stop if minimal STRESS is found.

(24)

Non-metric MDS: Pros and Cons

+ Fulfills a clear objective without many assumptions (minimize STRESS)

+ Results don’t change with rescaling or monotonic variable transformation

+ Works even if you only have rank information

- Slow in large problems

- Usually only local (not global) optimum found - Only gets ranks of distances right

Appl. Multivariate Statistics - Spring 2013

(25)

Non-metric MDS: Example

 Do people in the same party vote alike?

 Number of votes where 15 congressmen disagreed in 19 votes

(26)

Non-metric MDS: Example

Appl. Multivariate Statistics - Spring 2013

(27)

Concepts to know

 Classical MDS:

- Finds low-dim projection that respects distances - Optimal for euclidean distances

- No clear guarantees for other distances - fast

 Non-metric MDS:

- Squeezes data points on table

- respects only rankings of distances - (locally) solves clear objective

- slow

(28)

R commands to know

 cmdscale included in standard R distribution

 isoMDS from package “MASS”

Appl. Multivariate Statistics - Spring 2013

Referenzen

ÄHNLICHE DOKUMENTE

Ordination method non-metric multidimensional scaling (MetaMDS) with predefined K=3. Ordination method non-metric multidimensional scaling (MetaMDS) with

In this note, I am going to give proofs to a few results about tensor products as well as tensor, pseudoexterior, symmetric and exterior powers of k-modules (where k is a

toy models (2D strings, black hole evaporation,. ) classically: spherically symmetric sector of general relativity (critical collapse). semi-classically: near horizon

It is important to realize, however, that the problem of the Ising-nematic quantum critical metal at finite temperature differs from the disordered electron gas in one crucial

Metric fixed point theory, nonexpansive maps, invariant subspace problem, metric

We examined such point sets for n = d + 1 and received the following table of numbers of nonisomorphic integral simplices by computer calculations.. Here we call the largest

The stress value reflects how well the ordination summarizes the observed distances among the samples. Several “rules of thumb” for stress have been proposed, but have been criticized

• Three applications illustrated the flexibility of multidimensional scaling and its scalability to larger problems: The joint visual analysis of impor- tance and hierarchy in