Multidimensional Scaling

(1)

Multidimensional Scaling

Applied Multivariate Statistics – Spring 2013

TexPoint fonts used in EMF.

(2)

Outline

 Fundamental Idea

 Classical Multidimensional Scaling

 Non-metric Multidimensional Scaling

Appl. Multivariate Statistics - Spring 2013

(3)

Basic Idea

How to represent in two dimensions?

(4)

Idea 1: Projection

(5)

Idea 2: Squeeze on table

Close points stay close

(6)

Which idea is better?

(7)

Idea of MDS

 Represent high-dimensional point cloud in few (usually 2) dimensions keeping distances between points similar

 Classical/Metric MDS: Use a clever projection R: cmdscale

 Non-metric MDS: Squeeze data on table, only conserve ranks

R: isoMDS

(8)

Classical MDS

 Problem: Given euclidean distances among points, recover the position of the points!

 Example: Road distance between 21 European cities (almost euclidean, but not quite)

…

(9)

Classical MDS

 First try:

(10)

Classical MDS

 Flip axes:

Can identify points up to - shift

- rotation - reflection

(11)

Classical MDS

 Another example: Airpollution in US cities

 Range of manu and popul is much bigger than range of wind

 Need to standardize to give every variable equal weight

(12)

Classical MDS

(13)

Classical MDS: Theory

 Input: Euclidean distances between n objects in p dimensions

 Output: Position of points up to rotation, reflection, shift

 Two steps:

- Compute inner products matrix B from distance - Compute positions from B

(14)

Classical MDS: Theory – Step 1

 Inner products matrix B = XX^T

 Connect to distance:

 Center points to avoid shift invariance

 Invert relationship:

“doubly centered”

(Hint for middle of page 108: Plug in (4.3) and equations on top of page 108 to show that the expression involving d’s is equal to b_ij)

 Thus, we obtained B from the distance matrix

d²_ij = Pq

k=1(x_ik ¡x_jk)² = ::: = b_ii +b_jj ¡2b_ij

b_ij = ¡¹₂(d²_ij ¡d²_i: ¡d²_:j + d²_::) b_ij = Pq

k=1 x_ikx_jk

n * q data matrix

³x = 0 ! Pn

i=1 x_ik = 0 ! P

i or j b_ij = 0´

(15)

Classical MDS: Theory – Step 2

 Since B = XX^T, we need the “square root” of B

 B is a symmetric and positive definite n*n matrix

 Thus, B can be diagonalized:

D is a diagonal matrix with on diagonal (“eigenvalues”)

V contains as columns normalized eigenvectors

 Some eigenvalues will be zero; drop them:

 Take “square root”:

 Thus we obtained the position of points from the distances between all points

B = V ¤V^T

¸₁ ¸ ¸₂ ¸ ::: ¸ ¸_n

B = V₁¤₁V₁^T X = V₁¤₁¹²

(16)

Classical MDS: Low-dim representation

 Keep only few (e.g. 2) largest eigenvalues and corresponding eigenvectors

 The resulting X will be the low-dimensional representation we were looking for

 Goodness of fit (GOF) if we reduce to m dimensions:

(should be at least 0.8)

 Finds “optimal” low-dim representation: Minimizes

GOF =

Pm i=1¸_i

Pn i=1¸_i

S = Pn i=1

Pn j=1

³d²_ij ¡ (d^(m)_ij )²´

(17)

Classical MDS: Pros and Cons

+ Optimal for euclidean input data

+ Still optimal, if B has non-negative eigenvalues (pos. semidefinite)

+ Very fast

- No guarantees if B has negative eigenvalues

However, in practice, it is still used then. New measures for Goodness of fit:

GOF =

Pm

i=1 j¸_ij

Pn

i=1 j¸_ij

GOF =

Pm i=1¸²_i

Pn

i=1¸²_i GOF =

Pm

i=1max(0;¸_i)

Pn

i=1max(0;¸_i) Used in R function “cmdscale”

(18)

Non-metric MDS: Idea

 Sometimes, there is no strict metric on original points

 Example: How beautiful are these persons?

(1: Not at all, 10: Very much)

2 6 9

OR ¹ ⁵ ¹⁰ ??

(19)

Non-metric MDS: Idea

 Absolute values are not that meaningful

 Ranking is important

 Non-metric MDS finds a low-dimensional representation, which

respects the ranking of distances

> >

(20)

Non-metric MDS: Theory

 is the true dissimilarity, d_ij is the distance of representation

 Minimize STRESS ( is an increasing function):

 Optimize over both position of points and µ

 is called “disparity”

 Solved numerically (isotonic regression);

Classical MDS as starting value;

very time consuming

S = P

i<j(µ(±_ij)¡d_ij)²

P

i<j d²_ij

±_ij

µ

d^_ij = µ(±_ij)

(21)

Non-metric MDS: Example for intuition (only)

True points in

high dimensional space

3

2 5

A B

C

STRESS = 19.7

Compute best representation

±_AB < ±_BC < ±_AC

(22)

Non-metric MDS: Example for intuition (only)

True points in

2.7

2 4.8

A B

C

STRESS = 20.1

±_AB < ±_BC < ±_AC

(23)

Non-metric MDS: Example for intuition (only)

True points in

2.9

2 5.2

A B

C

STRESS = 18.9

We will finally represent the

“transformed true distances”

(called disparities):

±_AB < ±_BC < ±_AC d^_AB = 2; d^_BC = 2:9; d^_AC = 5:2

instead of the true distances:

±_AB = 2; ±_BC = 3; ±_AC = 5

Stop if minimal STRESS is found.

(24)

Non-metric MDS: Pros and Cons

+ Fulfills a clear objective without many assumptions (minimize STRESS)

+ Results don’t change with rescaling or monotonic variable transformation

+ Works even if you only have rank information

- Slow in large problems

- Usually only local (not global) optimum found - Only gets ranks of distances right

(25)

Non-metric MDS: Example

 Do people in the same party vote alike?

 Number of votes where 15 congressmen disagreed in 19 votes

…

(26)

Non-metric MDS: Example

(27)

Concepts to know

 Classical MDS:

- Finds low-dim projection that respects distances - Optimal for euclidean distances

- No clear guarantees for other distances - fast

 Non-metric MDS:

- Squeezes data points on table

- respects only rankings of distances - (locally) solves clear objective

- slow

(28)

R commands to know

 cmdscale included in standard R distribution

 isoMDS from package “MASS”

Multidimensional Scaling