An embedding f from the metric space (X, distX) to (Y, distY ) is a K-bi-Lipschitz if there exists a c &gt

(1)

Dimension reduction, embeddings

S´andor Kisfaludi-Bak

Computaional Geometry Summer semester 2020

(2)

Overview

• Embeddings, distortion, Johnson-Lindenstrauss

(3)

Overview

• Random partitions

(4)

Overview

• Embedding into HSTs

(5)

Overview

• Embedding into HSTs

• Further embeddings into Euclidean space

(6)

Embeddings, distortion

Definition. An embedding f from the metric space (X, dist_X) to (Y, dist_Y ) is a K-bi-Lipschitz if there exists a c > 0 such

that for all x, x⁰ ∈ X we have

cdist_X(x, x⁰) ≤ dist_Y (f(x), f(x⁰)) ≤ cKdist_X(x, x⁰).

(7)

Definition. The distortion of an embedding f : X → Y is the smallest ∆ s.t. f is ∆-bi-Lipscchitz.

(8)

Definition. The distortion of an embedding f : X → Y is the smallest ∆ s.t. f is ∆-bi-Lipscchitz.

If Y = R^d, then we want

dist(x, x⁰) ≤ kf(x) − f(x⁰)k₂ ≤ ∆dist(x, x⁰)

(9)

Why distortion is necessary

Take Y = R^d, and X =

1

1 a 1

b c

d

(10)

1

1 a 1

b c

d

b c

Where to put a?

min(max{ka − bk, ka − ck, ka − dk}) attained when a is circumcenter

2

≥ 2

(11)

1

1 a 1

b c

d

b c

Where to put a?

a 2

≥ 2

(12)

1

1 a 1

b c

d

b c

Where to put a?

a 2

... and when bcd is equilateral of sidelength 2.

Distortion is kb − ak/dist_X(a, b) = 2/√ 3

≥ 2

(13)

1

1 a 1

b c

d

b c

Where to put a?

a 2

... and when bcd is equilateral of sidelength 2.

In general, n-star needs distortion Ω(n^1/d) when Y = R^d In general, n-star needs distortion Ω(n^1/d) when Y = R^d ... and when bcd is equilateral of sidelength 2.

≥ 2

(14)

The Johnson-Lindenstrauss Lemma

Theorem (Johnson, Lindenstrauss 1984) Given n points P ⊆ Rⁿ⁻¹ and ε ∈ (0, 1], there is an embedding f : P → R^d with distortion 1 + ε where d = O( ^log_ε₂ⁿ).

a.k.a. ”dimension reduction”, ”JL lemma”

(15)

The Johnson-Lindenstrauss Lemma

Theorem (Johnson, Lindenstrauss 1984) Given n points P ⊆ Rⁿ⁻¹ and ε ∈ (0, 1], there is an embedding f : P → R^d with distortion 1 + ε where d = O( ^log_ε₂ⁿ).

• works for R^any

• f can be: orthogonal projection to random d-subspace

• can be derandomized (Engebretsen et al. 2002) a.k.a. ”dimension reduction”, ”JL lemma”

(16)

Almost equidistant set in R^O(log ⁿ⁾

Let e_i = (0, . . . , 0, 1, 0 . . . , 0).

The set e₁, . . . , e_n is equidistant. (unit simplex).

Can’t be embedded isometrically into R^d if d < n − 1. But!

i

(17)

Let e_i = (0, . . . , 0, 1, 0 . . . , 0).

i

Folklore. For any fixed ε > 0, there is a set P of n points in R^O^(log ⁿ⁾ s.t. kp − p⁰k₂ ∈ [1, 1 + ε] for all p, p⁰ ∈ P .

(18)

Let e_i = (0, . . . , 0, 1, 0 . . . , 0).

i

Folklore. For any fixed ε > 0, there is a set P of n points in R^O^(log ⁿ⁾ s.t. kp − p⁰k₂ ∈ [1, 1 + ε] for all p, p⁰ ∈ P .

Proof. Use JL lemma on simplex above.

(19)

Random partitions

(20)

Partitions, probabilistic partitions

Goal: partition (X, dist) into clusters of diameter at most ∆, s.t. x, y ∈ X are in the same cluster iff dist(x, y) ≤ ∆.

(21)

Clearly unattainable!

(22)

P_X: set of all partitions of X. Pick a random partition Π ∈ P_X from some distribution D over P_X.

Revised goal: Pr(x, x⁰ are separated in Π) is small if dist(x, x⁰) is small.

(23)

P_X: set of all partitions of X. Pick a random partition Π ∈ P_X from some distribution D over P_X.

Revised goal: Pr(x, x⁰ are separated in Π) is small if dist(x, x⁰) is small.

Example: X = R.

Partition: [x₀ + i∆, x₀ + (i + 1)∆], where x₀ is random shift.

Pr(x, y are separated) ≤ |x − y|

∆

(24)

Random partition for any metric space

Set ∆ = 2^u.

Let σ be uniform random permutation of X, α ∈ [1/4, 1/2] uniform random.

(25)

Set ∆ = 2^u.

Greedy partiton:

Put all points within distance R := α∆ of σ₁ into first cluster.

Remove the cluster from σ, repeat.

(26)

Set ∆ = 2^u.

Greedy partiton:

Put all points within distance R := α∆ of σ₁ into first cluster.

Remove the cluster from σ, repeat.

Cluster dimater is 2R = 2α∆ ≤ ∆ X

(27)

Clustering quality

Lemma. For any x ∈ X and t ≤ ∆/8,

Pr

B(x, t) 6⊆ Π(x)

≤ 8t

∆ ln M m where m = # of pts at distance ≤ ∆/8

and M = # of pts at distance ≤ ∆

(28)

Clustering quality

Lemma. For any x ∈ X and t ≤ ∆/8,

Pr

B(x, t) 6⊆ Π(x)

≤ 8t

∆ ln M m where m = # of pts at distance ≤ ∆/8

and M = # of pts at distance ≤ ∆

Proof. Let U = pts w where B(w, ∆/2) ∩ B(x, t) 6= ∅

U = (w₁, . . . , w_|U_|) := sorted by increasing distance from x.

E_k := event that w_k is first in σ s.t. Π(w_k) ∩ B(x, t) 6= ∅, BUT B(x, t) 6∈ Π(w_k)

If B(x, t) 6⊆ Π(x) then some E_k must occur.

(29)

E_k only if R in some range

Let I_k = [dist(x, w_k) − t, dist(x, w_k) + t].

Claim: R 6∈ I_k ⇒ Pr(E_k) = 0

(30)

If d(x, w_k) < R − t, then B(w_k, R) ⊇ B(x, t), so Pr(E_k) = 0.

If d(x, w_k) > R + t, then B(w_k, R) ∩ B(x, t) = ∅, so E_k is impossible.

⇒ Pr(w_i) = 0 if i ≤ m or i > M t x

∆/8

w_k R ∈ [∆/4,∆/2]

(31)

If d(x, w_k) < R − t, then B(w_k, R) ⊇ B(x, t), so Pr(E_k) = 0.

If d(x, w_k) > R + t, then B(w_k, R) ∩ B(x, t) = ∅, so E_k is impossible.

⇒ Pr(w_i) = 0 if i ≤ m or i > M Pr(E_k) = Pr E_k ∩ (R ∈ I_k)

= Pr(R ∈ I_k)Pr(E_k | R ∈ I_k)

≤ ^length(I_{∆/2−∆/4}^k⁾ = _∆/4^2t = ^8t_∆

If w₁, . . . , w_k−1 are closer to x than w_k, so if one of them (w_i) occurs before w_k in σ, then w_k is not first to scoop from B(x, t) as dist(x, w_i) ≤ d(x, w_t) ≤ R + t

⇒ Pr(E_k | R ∈ I_k) ≤ _k¹

x t

∆/8

w_k R ∈ [∆/4,∆/2]

(32)

Random partition quality estimate

Pr(B(x, t) 6⊆ Π(x)) =

|U|

X

k=1

Pr(E_k) =

M

X

k=m+1

Pr(E_k)

=

M

X

k=m+1

Pr(R ∈ I_k)Pr(E_k | R ∈ I_k)

≤

M

X

k=m+1

8t

∆ 1 k

< 8t

∆ ln M m

(33)

Embedding into HSTs

(34)

HSTs and quadtrees

Definition. A hierarchically well-separated tree (HST) is a metric space on the leaves of a rooted tree T where each vertex has a label ∆ ≥ 0 s.t.

• leaves have label ∆_v = 0

• each internal vertex v has ∆_v > 0, and for any child u:

∆_u ≤ ∆_v.

• if x, x⁰ leaves, then dist_T (x, x⁰) = ∆_lca(x,x⁰₎

(35)

HSTs and quadtrees

Example: quadtree.

T = quadtree, ∆_v = diameter of cell v.

kx − x⁰k₂ ≤ ∆_lca(x,x⁰₎ = dist_T (x, x⁰) a bad embedding of P ⊂ R^d into a tree metric

∆_u ≤ ∆_v.

(36)

HSTs and quadtrees

Example: quadtree.

T = quadtree, ∆_v = diameter of cell v.

kx − x⁰k₂ ≤ ∆_lca(x,x⁰₎ = dist_T (x, x⁰) a bad embedding of P ⊂ R^d into a tree metric

k-HST: a HST where ∆_u ≤ ∆_v/k

∆_u ≤ ∆_v.

(37)

Probabilistic embedding into a 2-HST

Randomized alg. for non-contracting embedding from X into a HST T has probabilistic distortion:

x,y∈Xmax

E(dist_T (x, y)) dist_X(x, y)

Theorem. Given (X, dist), there is a randomized embedding into a 2-HST with prob. distortion ≤ 24 ln n

(38)

x,y∈Xmax

Proof. Wlog. scale X so diam(X) = 1.

Start with P = X, set T ’s root label to 1.

Compute random partition with ∆ = diam(P )/2, set the diam of partition classes as child labels. Recurse on each child.

(39)

x,y∈Xmax

Proof. Wlog. scale X so diam(X) = 1.

Start with P = X, set T ’s root label to 1.

Compute random partition with ∆ = diam(P )/2, set the diam of partition classes as child labels. Recurse on each child.

level of node v in T : dlog(∆_v)e ≤ 0

(40)

Bounding distortion of rand. HST embedding

x, y ∈ X have lca u in T .

dist_T (x, y) = ∆_u ≤ 2^level(u) σ: path from root of T to leaf x.

σ_i: level i node in σ (if exists)

E_i: event that B_X x, dist_X(x, y)

6⊆ Π(σ_i).

Y_i: indicator that E_i occurs but for all j > i event E_j does not

(41)

6⊆ Π(σ_i).

Y_i: indicator that E_i occurs but for all j > i event E_j does not We have d_T (x, y) ≤ P

i 2ⁱY_i. Set j := blog dist_X(x, y)c.

If i < j, then Pr(E_i) = 0 ⇒ E(Y_i) = 0.

(42)

6⊆ Π(σ_i).

Y_i: indicator that E_i occurs but for all j > i event E_j does not We have d_T (x, y) ≤ P

i 2ⁱY_i. Set j := blog dist_X(x, y)c.

If i < j, then Pr(E_i) = 0 ⇒ E(Y_i) = 0.

If i ≥ j, then

E(Y_i) = Pr(E_i∩E_i+1∩· · ·∩E₀) ≤ 8dist_X(x, y)

2ⁱ ln |B_X(x, 2ⁱ)|

|B_X(x, 2ⁱ/8)|

(43)

E(d_T (x, y)) ≤ E

X

i

2ⁱY_i

= X

i

2ⁱE(Y_i)

≤

0

X

i=j

2ⁱ 8t

2ⁱ ln n_i

n_i−3 = 8t ln





0

Y

i=j

n_i n_i−3





≤ 8t ln(n₀n₁n₂) ≤ 24t ln n.

Distortion bound wrap-up

Set n_i = B_X(x, 2ⁱ), and t := dist_X(x, y).

(44)

k-median in HST

Computing k-median in HST is “easy”

• make it into binary HST

(new nodes have same label)

(45)

k-median in HST

• Dyanimc program.

Subproblem at v, param ` ∈ [k]:

what is cheapest `-median for descendants of v?

(46)

k-median in HST

• Recursive step: for each a, b with a + b = `, compute

a-median in left child subtree and b-median in right child subtree.

(47)

k-median in HST

• Recursive step: for each a, b with a + b = `, compute

a-median in left child subtree and b-median in right child subtree.

O(k²n)

(48)

Application: k-median approximation in metric spaces

Theorem. There is an O(log n)-approximation for k-median in any metric space (X, dist_X).

(49)

Proof. Emebed P ⊆ X into a HST T . Compute cluster centers C in T .

C induces clustering X in P (center of p is nn_X(p, C).

Return C, X . OPT: (C_opt, X_opt)

γ(C, dist_X) ≤ γ(C, dist_T ) ≤ γ(C_opt, dist_T )

= X

p∈P

dist_T (p, C_opt) ≤ X

p∈P

dist_T (p, nn_X(p, C_opt)) Theorem. There is an O(log n)-approximation for k-median in any metric space (X, dist_X).

(50)

Proof. Emebed P ⊆ X into a HST T . Compute cluster centers C in T .

C induces clustering X in P (center of p is nn_X(p, C).

Return C, X . OPT: (C_opt, X_opt)

γ(C, dist_X) ≤ γ(C, dist_T ) ≤ γ(C_opt, dist_T )

= X

p∈P

dist_T (p, C_opt) ≤ X

p∈P

dist_T (p, nn_X(p, C_opt)) Theorem. There is an O(log n)-approximation for k-median in any metric space (X, dist_X).

E(γ(C, dist_X)) = X

p∈P

E dist_T (p, nn_X(p, C_opt))

= X

p∈P

O dist_X(p, nn_X(p, C_opt)) log n

= O γ(X_opt, C_opt, dist_X) · log n

(51)

Further embeddings into `₂

(52)

Embedding into `₂

Theorem (Bourgain 1985). Any n-pt metric space can be embedded into R^O^(log ⁿ⁾ (with `₂ metric) with distortion

O(log n).

This is tight for coonstant-degree expanders.

(53)

Embedding into `₂

Theorem (Bourgain 1985). Any n-pt metric space can be embedded into R^O^(log ⁿ⁾ (with `₂ metric) with distortion

O(log n).

Some proof ideas for weaker version:

• forget dimension (use JL in the end)

• for a given resolution r, use O(log n) random HST embedding of diameter r.

Flip coin for each cluster; if heads, create an anchor set Y_j.

• embedding: j-th coord of x wrt. anchors Y_j is dist(x, Y ).

This is non-contracting.

For each resolution we get O(log n) coords each This is tight for coonstant-degree expanders.

(54)

• spread Φ: ratio of largest/smallest distance in X. By

’snapping’ distances less than r/n or much more than r, we get new metrics on X with spread Φ = O(n²), and there are O(n²) distinct metrics, get coords from each.

• Let x, y arbitrary, and r a resolution where

r/2 < dist(x, y)/2 < r. ⇒ x and y are in different

clusters, and with prob. 1/2 the ball B(x, O(1/ log n)) is contained in the cluster of x

• Chernoff ⇒ w.h.p. a constant proportion of the

coordiantes j will differ by Ω(r/ log n) (when x, y get different coin flips)

• if they differ on k flips, then these cords contribute distance at least Ω(√

k/ log n).

Proof ideas for weak Bourgain, ctd.

(55)

Embedding special metrics into `₂

Tree metric: induced by possitively edge-weighted tree.

Distortion bound is tight (up to constant factors.)

Theorem (Matouˇsek 1999). Any tree metric can be embedded into `₂ with distortion O(√

log log n).

(56)

Embedding special metrics into `₂

Tree metric: induced by possitively edge-weighted tree.

Distortion bound is tight (up to constant factors.)

Theorem (Rao 1999). Let G be graph class that excludes some forbidden minor H (e.g. planar graphs.). Then any G-metric can be mebedded into `₂ with distortion O(√

log n).

Theorem (Matouˇsek 1999). Any tree metric can be embedded into `₂ with distortion O(√

log log n).