• Keine Ergebnisse gefunden

An embedding f from the metric space (X, distX) to (Y, distY ) is a K-bi-Lipschitz if there exists a c &gt

N/A
N/A
Protected

Academic year: 2021

Aktie "An embedding f from the metric space (X, distX) to (Y, distY ) is a K-bi-Lipschitz if there exists a c &gt"

Copied!
56
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Dimension reduction, embeddings

andor Kisfaludi-Bak

Computaional Geometry Summer semester 2020

(2)

Overview

Embeddings, distortion, Johnson-Lindenstrauss

(3)

Overview

Embeddings, distortion, Johnson-Lindenstrauss

Random partitions

(4)

Overview

Embeddings, distortion, Johnson-Lindenstrauss

Embedding into HSTs

Random partitions

(5)

Overview

Embeddings, distortion, Johnson-Lindenstrauss

Embedding into HSTs

Random partitions

Further embeddings into Euclidean space

(6)

Embeddings, distortion

Definition. An embedding f from the metric space (X, distX) to (Y, distY ) is a K-bi-Lipschitz if there exists a c > 0 such

that for all x, x0 X we have

cdistX(x, x0) distY (f(x), f(x0)) cKdistX(x, x0).

(7)

Embeddings, distortion

Definition. The distortion of an embedding f : X Y is the smallest s.t. f is ∆-bi-Lipscchitz.

Definition. An embedding f from the metric space (X, distX) to (Y, distY ) is a K-bi-Lipschitz if there exists a c > 0 such

that for all x, x0 X we have

cdistX(x, x0) distY (f(x), f(x0)) cKdistX(x, x0).

(8)

Embeddings, distortion

Definition. The distortion of an embedding f : X Y is the smallest s.t. f is ∆-bi-Lipscchitz.

If Y = Rd, then we want

dist(x, x0) ≤ kf(x) f(x0)k2 ∆dist(x, x0)

Definition. An embedding f from the metric space (X, distX) to (Y, distY ) is a K-bi-Lipschitz if there exists a c > 0 such

that for all x, x0 X we have

cdistX(x, x0) distY (f(x), f(x0)) cKdistX(x, x0).

(9)

Why distortion is necessary

Take Y = Rd, and X =

1

1 a 1

b c

d

(10)

Why distortion is necessary

Take Y = Rd, and X =

1

1 a 1

b c

d

d

b c

Where to put a?

min(max{ka bk, ka ck, ka dk}) attained when a is circumcenter

2

2

2

(11)

Why distortion is necessary

Take Y = Rd, and X =

1

1 a 1

b c

d

d

b c

Where to put a?

min(max{ka bk, ka ck, ka dk}) attained when a is circumcenter

a 2

2

2

(12)

Why distortion is necessary

Take Y = Rd, and X =

1

1 a 1

b c

d

d

b c

Where to put a?

min(max{ka bk, ka ck, ka dk}) attained when a is circumcenter

a 2

... and when bcd is equilateral of sidelength 2.

Distortion is kb ak/distX(a, b) = 2/ 3

2

2

(13)

Why distortion is necessary

Take Y = Rd, and X =

1

1 a 1

b c

d

d

b c

Where to put a?

min(max{ka bk, ka ck, ka dk}) attained when a is circumcenter

a 2

... and when bcd is equilateral of sidelength 2.

Distortion is kb ak/distX(a, b) = 2/ 3

In general, n-star needs distortion Ω(n1/d) when Y = Rd In general, n-star needs distortion Ω(n1/d) when Y = Rd ... and when bcd is equilateral of sidelength 2.

Distortion is kb ak/distX(a, b) = 2/ 3

2

2

(14)

The Johnson-Lindenstrauss Lemma

Theorem (Johnson, Lindenstrauss 1984) Given n points P Rn−1 and ε (0, 1], there is an embedding f : P Rd with distortion 1 + ε where d = O( logε2n).

a.k.a. ”dimension reduction”, ”JL lemma”

(15)

The Johnson-Lindenstrauss Lemma

Theorem (Johnson, Lindenstrauss 1984) Given n points P Rn−1 and ε (0, 1], there is an embedding f : P Rd with distortion 1 + ε where d = O( logε2n).

works for Rany

f can be: orthogonal projection to random d-subspace

can be derandomized (Engebretsen et al. 2002) a.k.a. ”dimension reduction”, ”JL lemma”

(16)

Almost equidistant set in RO(log n)

Let ei = (0, . . . , 0, 1, 0 . . . , 0).

The set e1, . . . , en is equidistant. (unit simplex).

Can’t be embedded isometrically into Rd if d < n 1. But!

i

(17)

Almost equidistant set in RO(log n)

Let ei = (0, . . . , 0, 1, 0 . . . , 0).

The set e1, . . . , en is equidistant. (unit simplex).

Can’t be embedded isometrically into Rd if d < n 1. But!

i

Folklore. For any fixed ε > 0, there is a set P of n points in RO(log n) s.t. kp p0k2 [1, 1 + ε] for all p, p0 P .

(18)

Almost equidistant set in RO(log n)

Let ei = (0, . . . , 0, 1, 0 . . . , 0).

The set e1, . . . , en is equidistant. (unit simplex).

Can’t be embedded isometrically into Rd if d < n 1. But!

i

Folklore. For any fixed ε > 0, there is a set P of n points in RO(log n) s.t. kp p0k2 [1, 1 + ε] for all p, p0 P .

Proof. Use JL lemma on simplex above.

(19)

Random partitions

(20)

Partitions, probabilistic partitions

Goal: partition (X, dist) into clusters of diameter at most ∆, s.t. x, y X are in the same cluster iff dist(x, y) ∆.

(21)

Partitions, probabilistic partitions

Goal: partition (X, dist) into clusters of diameter at most ∆, s.t. x, y X are in the same cluster iff dist(x, y) ∆.

Clearly unattainable!

(22)

Partitions, probabilistic partitions

Goal: partition (X, dist) into clusters of diameter at most ∆, s.t. x, y X are in the same cluster iff dist(x, y) ∆.

Clearly unattainable!

PX: set of all partitions of X. Pick a random partition Π ∈ PX from some distribution D over PX.

Revised goal: Pr(x, x0 are separated in Π) is small if dist(x, x0) is small.

(23)

Partitions, probabilistic partitions

Goal: partition (X, dist) into clusters of diameter at most ∆, s.t. x, y X are in the same cluster iff dist(x, y) ∆.

Clearly unattainable!

PX: set of all partitions of X. Pick a random partition Π ∈ PX from some distribution D over PX.

Revised goal: Pr(x, x0 are separated in Π) is small if dist(x, x0) is small.

Example: X = R.

Partition: [x0 + i∆, x0 + (i + 1)∆], where x0 is random shift.

Pr(x, y are separated) |x y|

(24)

Random partition for any metric space

Set ∆ = 2u.

Let σ be uniform random permutation of X, α [1/4, 1/2] uniform random.

(25)

Random partition for any metric space

Set ∆ = 2u.

Let σ be uniform random permutation of X, α [1/4, 1/2] uniform random.

Greedy partiton:

Put all points within distance R := α∆ of σ1 into first cluster.

Remove the cluster from σ, repeat.

(26)

Random partition for any metric space

Set ∆ = 2u.

Let σ be uniform random permutation of X, α [1/4, 1/2] uniform random.

Greedy partiton:

Put all points within distance R := α∆ of σ1 into first cluster.

Remove the cluster from σ, repeat.

Cluster dimater is 2R = 2α∆ X

(27)

Clustering quality

Lemma. For any x X and t ∆/8,

Pr

B(x, t) 6⊆ Π(x)

8t

ln M m where m = # of pts at distance ∆/8

and M = # of pts at distance

(28)

Clustering quality

Lemma. For any x X and t ∆/8,

Pr

B(x, t) 6⊆ Π(x)

8t

ln M m where m = # of pts at distance ∆/8

and M = # of pts at distance

Proof. Let U = pts w where B(w, ∆/2) B(x, t) 6=

U = (w1, . . . , w|U|) := sorted by increasing distance from x.

Ek := event that wk is first in σ s.t. Π(wk) B(x, t) 6= ∅, BUT B(x, t) 6∈ Π(wk)

If B(x, t) 6⊆ Π(x) then some Ek must occur.

(29)

Ek only if R in some range

Let Ik = [dist(x, wk) t, dist(x, wk) + t].

Claim: R 6∈ Ik Pr(Ek) = 0

(30)

Ek only if R in some range

Let Ik = [dist(x, wk) t, dist(x, wk) + t].

Claim: R 6∈ Ik Pr(Ek) = 0

If d(x, wk) < R t, then B(wk, R) B(x, t), so Pr(Ek) = 0.

If d(x, wk) > R + t, then B(wk, R) B(x, t) = ∅, so Ek is impossible.

Pr(wi) = 0 if i m or i > M t x

∆/8

wk R [∆/4,/2]

(31)

Ek only if R in some range

Let Ik = [dist(x, wk) t, dist(x, wk) + t].

Claim: R 6∈ Ik Pr(Ek) = 0

If d(x, wk) < R t, then B(wk, R) B(x, t), so Pr(Ek) = 0.

If d(x, wk) > R + t, then B(wk, R) B(x, t) = ∅, so Ek is impossible.

Pr(wi) = 0 if i m or i > M Pr(Ek) = Pr Ek (R Ik)

= Pr(R Ik)Pr(Ek | R Ik)

length(I∆/2−∆/4k) = ∆/42t = 8t

If w1, . . . , wk−1 are closer to x than wk, so if one of them (wi) occurs before wk in σ, then wk is not first to scoop from B(x, t) as dist(x, wi) d(x, wt) R + t

Pr(Ek | R Ik) k1

x t

∆/8

wk R [∆/4,/2]

(32)

Random partition quality estimate

Pr(B(x, t) 6⊆ Π(x)) =

|U|

X

k=1

Pr(Ek) =

M

X

k=m+1

Pr(Ek)

=

M

X

k=m+1

Pr(R Ik)Pr(Ek | R Ik)

M

X

k=m+1

8t

1 k

< 8t

ln M m

(33)

Embedding into HSTs

(34)

HSTs and quadtrees

Definition. A hierarchically well-separated tree (HST) is a metric space on the leaves of a rooted tree T where each vertex has a label 0 s.t.

leaves have label v = 0

each internal vertex v has v > 0, and for any child u:

u v.

if x, x0 leaves, then distT (x, x0) = ∆lca(x,x0)

(35)

HSTs and quadtrees

Example: quadtree.

T = quadtree, v = diameter of cell v.

kx x0k2 lca(x,x0) = distT (x, x0) a bad embedding of P Rd into a tree metric

Definition. A hierarchically well-separated tree (HST) is a metric space on the leaves of a rooted tree T where each vertex has a label 0 s.t.

leaves have label v = 0

each internal vertex v has v > 0, and for any child u:

u v.

if x, x0 leaves, then distT (x, x0) = ∆lca(x,x0)

(36)

HSTs and quadtrees

Example: quadtree.

T = quadtree, v = diameter of cell v.

kx x0k2 lca(x,x0) = distT (x, x0) a bad embedding of P Rd into a tree metric

k-HST: a HST where u v/k

Definition. A hierarchically well-separated tree (HST) is a metric space on the leaves of a rooted tree T where each vertex has a label 0 s.t.

leaves have label v = 0

each internal vertex v has v > 0, and for any child u:

u v.

if x, x0 leaves, then distT (x, x0) = ∆lca(x,x0)

(37)

Probabilistic embedding into a 2-HST

Randomized alg. for non-contracting embedding from X into a HST T has probabilistic distortion:

x,y∈Xmax

E(distT (x, y)) distX(x, y)

Theorem. Given (X, dist), there is a randomized embedding into a 2-HST with prob. distortion 24 ln n

(38)

Probabilistic embedding into a 2-HST

Randomized alg. for non-contracting embedding from X into a HST T has probabilistic distortion:

x,y∈Xmax

E(distT (x, y)) distX(x, y)

Theorem. Given (X, dist), there is a randomized embedding into a 2-HST with prob. distortion 24 ln n

Proof. Wlog. scale X so diam(X) = 1.

Start with P = X, set T ’s root label to 1.

Compute random partition with ∆ = diam(P )/2, set the diam of partition classes as child labels. Recurse on each child.

(39)

Probabilistic embedding into a 2-HST

Randomized alg. for non-contracting embedding from X into a HST T has probabilistic distortion:

x,y∈Xmax

E(distT (x, y)) distX(x, y)

Theorem. Given (X, dist), there is a randomized embedding into a 2-HST with prob. distortion 24 ln n

Proof. Wlog. scale X so diam(X) = 1.

Start with P = X, set T ’s root label to 1.

Compute random partition with ∆ = diam(P )/2, set the diam of partition classes as child labels. Recurse on each child.

level of node v in T : dlog(∆v)e ≤ 0

(40)

Bounding distortion of rand. HST embedding

x, y X have lca u in T .

distT (x, y) = ∆u 2level(u) σ: path from root of T to leaf x.

σi: level i node in σ (if exists)

Ei: event that BX x, distX(x, y)

6⊆ Π(σi).

Yi: indicator that Ei occurs but for all j > i event Ej does not

(41)

Bounding distortion of rand. HST embedding

x, y X have lca u in T .

distT (x, y) = ∆u 2level(u) σ: path from root of T to leaf x.

σi: level i node in σ (if exists)

Ei: event that BX x, distX(x, y)

6⊆ Π(σi).

Yi: indicator that Ei occurs but for all j > i event Ej does not We have dT (x, y) P

i 2iYi. Set j := blog distX(x, y)c.

If i < j, then Pr(Ei) = 0 E(Yi) = 0.

(42)

Bounding distortion of rand. HST embedding

x, y X have lca u in T .

distT (x, y) = ∆u 2level(u) σ: path from root of T to leaf x.

σi: level i node in σ (if exists)

Ei: event that BX x, distX(x, y)

6⊆ Π(σi).

Yi: indicator that Ei occurs but for all j > i event Ej does not We have dT (x, y) P

i 2iYi. Set j := blog distX(x, y)c.

If i < j, then Pr(Ei) = 0 E(Yi) = 0.

If i j, then

E(Yi) = Pr(Ei∩Ei+1∩· · ·∩E0) 8distX(x, y)

2i ln |BX(x, 2i)|

|BX(x, 2i/8)|

(43)

E(dT (x, y)) E

X

i

2iYi

= X

i

2iE(Yi)

0

X

i=j

2i 8t

2i ln ni

ni−3 = 8t ln

0

Y

i=j

ni ni−3

8t ln(n0n1n2) 24t ln n.

Distortion bound wrap-up

Set ni = BX(x, 2i), and t := distX(x, y).

(44)

k-median in HST

Computing k-median in HST is “easy”

make it into binary HST

(new nodes have same label)

(45)

k-median in HST

Computing k-median in HST is “easy”

make it into binary HST

(new nodes have same label)

Dyanimc program.

Subproblem at v, param ` [k]:

what is cheapest `-median for descendants of v?

(46)

k-median in HST

Computing k-median in HST is “easy”

make it into binary HST

(new nodes have same label)

Dyanimc program.

Subproblem at v, param ` [k]:

what is cheapest `-median for descendants of v?

Recursive step: for each a, b with a + b = `, compute

a-median in left child subtree and b-median in right child subtree.

(47)

k-median in HST

Computing k-median in HST is “easy”

make it into binary HST

(new nodes have same label)

Dyanimc program.

Subproblem at v, param ` [k]:

what is cheapest `-median for descendants of v?

Recursive step: for each a, b with a + b = `, compute

a-median in left child subtree and b-median in right child subtree.

O(k2n)

(48)

Application: k-median approximation in metric spaces

Theorem. There is an O(log n)-approximation for k-median in any metric space (X, distX).

(49)

Application: k-median approximation in metric spaces

Proof. Emebed P X into a HST T . Compute cluster centers C in T .

C induces clustering X in P (center of p is nnX(p, C).

Return C, X . OPT: (Copt, Xopt)

γ(C, distX) γ(C, distT ) γ(Copt, distT )

= X

p∈P

distT (p, Copt) X

p∈P

distT (p, nnX(p, Copt)) Theorem. There is an O(log n)-approximation for k-median in any metric space (X, distX).

(50)

Application: k-median approximation in metric spaces

Proof. Emebed P X into a HST T . Compute cluster centers C in T .

C induces clustering X in P (center of p is nnX(p, C).

Return C, X . OPT: (Copt, Xopt)

γ(C, distX) γ(C, distT ) γ(Copt, distT )

= X

p∈P

distT (p, Copt) X

p∈P

distT (p, nnX(p, Copt)) Theorem. There is an O(log n)-approximation for k-median in any metric space (X, distX).

E(C, distX)) = X

p∈P

E distT (p, nnX(p, Copt))

= X

p∈P

O distX(p, nnX(p, Copt)) log n

= O γ(Xopt, Copt, distX) · log n

(51)

Further embeddings into `2

(52)

Embedding into `2

Theorem (Bourgain 1985). Any n-pt metric space can be embedded into RO(log n) (with `2 metric) with distortion

O(log n).

This is tight for coonstant-degree expanders.

(53)

Embedding into `2

Theorem (Bourgain 1985). Any n-pt metric space can be embedded into RO(log n) (with `2 metric) with distortion

O(log n).

Some proof ideas for weaker version:

forget dimension (use JL in the end)

for a given resolution r, use O(log n) random HST embedding of diameter r.

Flip coin for each cluster; if heads, create an anchor set Yj.

embedding: j-th coord of x wrt. anchors Yj is dist(x, Y ).

This is non-contracting.

For each resolution we get O(log n) coords each This is tight for coonstant-degree expanders.

(54)

spread Φ: ratio of largest/smallest distance in X. By

’snapping’ distances less than r/n or much more than r, we get new metrics on X with spread Φ = O(n2), and there are O(n2) distinct metrics, get coords from each.

Let x, y arbitrary, and r a resolution where

r/2 < dist(x, y)/2 < r. x and y are in different

clusters, and with prob. 1/2 the ball B(x, O(1/ log n)) is contained in the cluster of x

Chernoff w.h.p. a constant proportion of the

coordiantes j will differ by Ω(r/ log n) (when x, y get different coin flips)

if they differ on k flips, then these cords contribute distance at least Ω(

k/ log n).

Proof ideas for weak Bourgain, ctd.

(55)

Embedding special metrics into `2

Tree metric: induced by possitively edge-weighted tree.

Distortion bound is tight (up to constant factors.)

Theorem (Matouˇsek 1999). Any tree metric can be embedded into `2 with distortion O(

log log n).

(56)

Embedding special metrics into `2

Tree metric: induced by possitively edge-weighted tree.

Distortion bound is tight (up to constant factors.)

Distortion bound is tight (up to constant factors.)

Theorem (Rao 1999). Let G be graph class that excludes some forbidden minor H (e.g. planar graphs.). Then any G-metric can be mebedded into `2 with distortion O(

log n).

Theorem (Matouˇsek 1999). Any tree metric can be embedded into `2 with distortion O(

log log n).

Referenzen

ÄHNLICHE DOKUMENTE

Ebert

Observe that the above lower bound can be extended to an arbitrary bounded metric space by adjusting the analysis: only the moving costs of the static algorithms will change, namely

Axel Gr¨ unrock.. UBUNGEN ZUR ANALYSIS

Dies zeigt, dass die Menge der offenen Operatoren in L(X, Y )

Skorohod’s Theorem is based on a general method to transform uniformly distributed ‘random numbers’ from ]0, 1[ into ‘random numbers’ with distribution Q.. Namely, if U is

Fachbereich Mathematik und

Der Kreis wird in der Schule gewöhnlich definiert als die Menge aller Punkte P, welche von einem gegebenen Punkt M konstanten Abstand haben:.. MP =

Figure A.1: The relations of the program running time, the quality score, and the number of calculated distances in the case of dataset 2a with average linkage...