Uniqueness of Ordinal Embedding

(1)

Uniqueness of Ordinal Embedding

Matth¨aus Kleindessner KLEINDESSNER@INFORMATIK.UNI-HAMBURG.DE

Ulrike von Luxburg ^LUXBURG@^INFORMATIK.^UNI-^HAMBURG.^DE Department of Computer Science, University of Hamburg, Germany

Abstract

Ordinal embedding refers to the following problem: all we know about an unknown set of points x1, . . . , xn∈R^dare ordinal constraints of the formkxi−xjk<kxk−xlk; the task is to construct a realizationy1, . . . , yn∈R^dthat preserves these ordinal constraints. It has been conjectured since the 1960ies that upon knowledge of all ordinal constraints a large but finite set of points can be approximately reconstructed up to a similarity transformation. The main result of our paper is a formal proof of this conjecture.

Keywords:non-metric multidimensional scaling, monotone mapping, isotonic mapping

1. Introduction

We consider the problem of ordinal embedding, also called ordinal scaling, non-metric multidimensional scaling, monotonic embedding, or isotonic embedding. Consider a set x₁, . . . , x_nin some metric space(X,dist), but assume that the distances between these points are unknown. All we get to see are ordinal relationships, namely whetherdist(x_i, x_j)<dist(x_k, x_l)or vice versa. The goal of ordinal embedding is to constructy₁, . . . , y_n∈R^dsuch that all ordinal constraints are preserved (throughout the paper,k · kdenotes the Euclidean norm):

dist(x_i, x_j)<dist(x_k, x_l)⇒ ky_i−y_jk<ky_k−y_lk.

The problem of ordinal embedding has first been studied in the psychometric community by Shepard(1962a,b) andKruskal(1964a,b), see also the monographBorg and Groenen(2005). Lately it has drawn quite some attention in the machine learning community (Quist and Yona,2004;Ros- ales and Fung,2006;Agarwal et al., 2007;Shaw and Jebara, 2009;McFee and Lanckriet, 2009;

Jamieson and Nowak, 2011a;McFee and Lanckriet,2011;Tamuz et al.,2011;Ailon,2012), also in its special case of ranking (Ouyang and Gray,2008;McFee and Lanckriet,2010;Jamieson and Nowak, 2011b; Lan et al., 2012; Wauthier et al., 2013). Even though ordinal embedding dates back to the 1960ies and is widely used in practice, surprisingly little is known about its theoreti- cal properties. Particularly striking, one of the most elementary properties, namely the uniqueness of ordinal embeddings, has never been established in a finite sample setting. It is widely believed that, upon knowledge of all ordinal relationships, a point setx₁, . . . , x_n∈R^dcan be approximately reconstructed up to a similarity transformation if nis “large enough” (p. 294 ofShepard, 1966;

Section 2.2 ofBorg and Groenen,2005; Section 4.13.2 ofDattorro,2005). Numerous simulation experiments have been published as supporting evidence (Shepard,1966;Young,1970;Sherman, 1972). The main result of our paper is a formal proof that this uniqueness conjecture is indeed true:

Consider a sequence of points(x_n)_n∈N that are dense in some “nice” setK ⊆R^d. Letyⁿ₁, . . . , y_nⁿ

(2)

KLEINDESSNER VONLUXBURG

be any ordinal embedding ofx₁, . . . , x_n. Then, asn→ ∞, the set of embedded points always con- verges to the set of original points, up to similarity transformations such as rotations, translations, rescalings, or reflections. This even holds if we only know about “local ordinal relationships”, that is distance comparisons between points in small subregions.

Our proofs are elementary in the sense that we do not apply any heavy mathematical machinery.

However, details are delicate and require a careful treatment.

2. Setup, definitions and notation

We start this section with the definition of the two central notions in our paper, ordinal embeddings and isotonic functions. We will see below that these two notions are closely related.

Definition 1 (Ordinal embedding) Consider two sets Xn = {x₁, . . . , x_n} ⊆ R^d and Yn = {y₁, . . . , y_n} ⊆R^d.Ynis anordinal embeddingofXnif for all1≤i, j, k, l≤n,

kx_i−x_jk<kx_k−x_lk ⇒ ky_i−y_jk<ky_k−y_lk. (1) Ynis a weak ordinal embeddingofXnif (1) holds for all1 ≤ i, j, k, l ≤ nwith i = k. Yn is a strong ordinal embeddingofXnif (1)holds for all1≤i, j, k, l≤n, and additionallykx_i−x_jk= kx_k−x_lk ⇒ ky_i−y_jk=ky_k−y_lkfor all1≤i, j, k, l≤n.

Definition 2 (Isotonic functions) LetΩ ⊆ R^dandf : Ω → R^dbe an arbitrary function. f is a similarityif there isλ > 0 such that for allx, y ∈ Ωwe havekf(x)−f(y)k = λkx−yk. f is isotonicor anisotonyif for allx, y, z, w∈Ω,

kx−yk<kz−wk ⇒ kf(x)−f(y)k<kf(z)−f(w)k.

f isweakly isotonicif this property only holds forx, y, z, w∈Ωwithx=z.f isstrongly isotonic if it is isotonic and additionally satisfieskx−yk=kz−wk ⇒ kf(x)−f(y)k=kf(z)−f(w)kfor allx, y, z, w∈Ω. We say thatf islocallya similarity / (weakly / strongly) isotonic if for each point x ∈ Ωthere exists a neighborhoodU(x)inΩsuch thatf|U(x) has the corresponding property. If we want to emphasize that a functionf : Ω → R^dhas a property not only locally but on all ofΩ, we sometimes say thatf isgloballya similarity / (weakly / strongly) isotonic.

Let us mention some obvious but important observations. Similaritiesf :R^d→R^dare nothing else than the well-known similarity transformations, given byf(x) =λOx+bfor some orthogonal matrixOand an offsetb∈R^d. For generalΩ, they are simply given by the restrictions of similarity transformations toΩ(see LemmaAin AppendixA). Obviously, we have

similarity⇒strongly isotonic⇒isotonic⇒weakly isotonic,

but for general Ωnone of the converses are true. Any weakly isotonic function is injective. Iff is a similarity or a strong isotony, so is f⁻¹, but this does not necessarily hold for isotonies. A composition of similarities / (weak / strong) isotonies is again a similarity / (weak / strong) isotony.

Obviously, y₁, . . . , y_n is a (weak / strong) ordinal embedding of x₁, . . . , x_nif and only if the mappingf :{x₁, . . . , x_n} → {y₁, . . . , y_n}given byf(x_i) =y_iis (weakly / strongly) isotonic. The uniqueness question for ordinal embedding can thus be formalized as follows: iff is a (weakly / strongly) isotonic mapping between two finite point sets, can it be approximated by a similarity? It

(3)

is well-known that any strongly isotonic functionf :R^d→ R^ddefined on the full domainR^dis a similarity transformation. One can see this by exploiting properties of sphere-preserving mappings in Euclidean geometry (see McKemie and Väisälä(1999) and also the argumentation inShepard, 1966), by an elegant argument related to positive definite functions (Schoenberg,1938), and also by the Beckman-Quarles theorem (Beckman and Quarles,1953). The key question of this paper is in what sense such a property already holds for functions defined on a finite set.

Let us conclude this section with some standard notation for the rest of the paper. For any subset A ⊆ R^d we denote its linear hull by[A] = {Pn

i=1λ_ia_i : n ∈ N, a_i ∈ A, λ_i ∈ R}and its affine hull byH(A) ={Pn

i=1λiai : n∈N, ai ∈A, λi ∈ R,Pn

i=1λi = 1}. Forz ∈ R^dandr > 0the open ball with centerzand radiusr isU_r(z) = {x ∈ R^d : kx−zk < r} and the closed ball is U_r(z) = {x ∈ R^d :kx−zk ≤ r}. For a vector-valued functionf :X → R^d andj = 1, . . . , d we write f^j for the jth component of f. For two functions f : X1 → R^d and g : X2 → R^d and an arbitrary subsetX ⊆ X₁ ∩X₂ we denote the supremum norm betweenf andgon X by kf−gk∞(X)= sup_x_∈_Xkf(x)−g(x)k. At some points we will speak of a cross-polytope. By this we mean the imageT(C)of thed-dimensional standard cross-polytopeC, which is given by the convex hull of all permutations of(±1/0/0/ . . . /0)∈R^d, under some similarity transformationT. 3. Main results

In this section we present our main results. The proofs of the theorems are deferred to Sections4 and5. Our key question is to what extent an isotonic functionf is uniquely determined, up to a similarity transformation. Our first result concerns the infinite case. We show that iff is defined on a dense subset of some “nice” setGandf is locally isotonic, then it is actually a similarity.

Theorem 3 (Isotonic on a dense set implies similarity) LetG ⊆ R^d be an open and connected domain andΩ⊆Ga dense subset. Letf : Ω→R^dbe a locally isotonic function. Then there exists a unique extension off to a similarity transformationF :R^d→R^d.

The next theorem deals with the finite case and is the main result of this paper. We consider Xn = {x₁, . . . , x_n} ⊆ R^d and an isotonic mappingϕ_n : Xn → ϕ_n(Xn) — hence ϕ_n(Xn) = {ϕn(x1), . . . , ϕn(xn)}is an ordinal embedding ofXn. We prove thatϕncan be approximated by a similarity transformation, up to arbitrary precision asn→ ∞.

Theorem 4 (Isotonic on a finite set implies approximate similarity)

1. Global Version: LetK =U_r(z)⊆R^dbe a closed and bounded ball (for some arbitraryr >0, z∈R^d). Let(x_n)_n_∈_Nbe a sequence of pointsx_n∈Ksuch that{x_n:n∈N}is dense inK. Let 0 < R < ∞and(ϕn)n∈N be a sequence of isotonic functionsϕn :{x1, . . . , xn} →UR(0)⊆ R^d. Then there exists a sequence(S_n)_n∈Nof similarity transformationsS_n:R^d→R^dsuch that kSn−ϕnk_∞({x1,...,xn})→0 as n→ ∞. (2) 2. Local Version: More generally, letK =S_k

i=1K_i⊆R^dbe a finite union of closed and bounded balls such thatS_k

i=1K_i^◦ is connected. Let(x_n)_n_∈_N be a sequence of pointsx_n ∈ K such that

(4)

{x_n : n ∈ N} is dense in K. Let 0 < R < ∞ and (ϕ_n)_n_∈_N be a sequence of functions ϕ_n:{x₁, . . . , x_n} →U_R(0)⊆R^dsuch that

∀i∈ {1, . . . , k}:ϕ_n|_{x1,...,xn}∩Kiis isotonic.

Then there exists a sequence(S_n)_n∈Nof similarity transformationsS_n:R^d→R^dwith(2). Our proofs show that we can replace the set Kin Part1of Theorem4by a cross-polytope or any convex set “between a cross-polytope and a ball”. Consequently, we can replaceKin Part2by any finite union of such sets if we additionally assume that all these sets satisfyKi⊆K_i^◦. Note that the assumption that all functionsϕ_nmap to the same bounded ballU_R(0)is necessary. Otherwise we could blow up the configuration of the image points by a larger and larger constant and prevent the approximation errorkSn−ϕnk∞from converging.

4. Proof of Theorem3(the infinite case)

The proof of Theorem3consists of a number of steps, which we formulate as separate lemmas.

Lemma 5 (Isotonic implies continuous) LetΩ ⊆R^dandf : Ω→R^dbe a locally isotonic function. Thenf is continuous. If we additionally assumeΩto be a set with at least one limit point which is contained in it andf to be globally isotonic, thenf is even uniformly continuous.

Proof (sketch) Since continuity is a local property, it suffices to show that for any pointx ∈ Ω there is a neigborhoodU(x)inΩsuch thatf|U(x)is continuous. Hence, w.l.o.g. we may assumef to be globally isotonic. The key observation is that iff was discontinuous at one point, the distance between different points inf(Ω)would be bounded from below by a positive constant. In case that Ωis uncountable, this immediately contradicts the separability of R^d. In general, a compactness argument leads to the desired contradiction. In case Ωhas a limit point which is contained in it, denote one such point byx₀and letε >0be arbitrary. We already know thatf is continuous, and hence there existsδ >0such thatkf(x)−f(x₀)k< εfor allx∈Ωwithkx−x₀k< δ. Letx⁰ ∈Ω with0<kx⁰−x0k =δ⁰ < δ(sincex0 is a limit point, there is such a pointx⁰). For allx, y ∈Ω withkx−yk< δ⁰we havekx−yk<kx⁰−x₀kand hencekf(x)−f(y)k<kf(x⁰)−f(x₀)k< ε.

The next lemma shows that ifΩ ⊆ R^d is a ball andf : Ω → R^dis weakly isotonic, then it is even strongly isotonic, at least on a slightly smaller ball.

Lemma 6 (Weakly isotonic implies strongly isotonic on balls) Let Ω = U_ε(z) ⊆ R^d and f : Ω→R^dbe weakly isotonic. Thenf|Uε/4(z)is strongly isotonic.

Proof (sketch) In order to prove thatf|Uε/4(z)is isotonic, we have to show thatkf(x)−f(y)k<

kf(v)−f(w)k for all x, y, v, w ∈ U_ε/4(z) with kx−yk < kv −wk. The idea is to use in- termediate points u₁, . . . , u_n ∈ Ω such that kx −yk < ky −u₁k < ku₁ −u₂k < . . . <

kun−1 −unk < kun−vk < kv −wk. Since f is assumed to be weakly isotonic, it follows thatkf(x)−f(y)k < kf(y)−f(u₁)k < kf(u₁) −f(u₂)k < . . . < kf(u_n−1)−f(u_n)k <

kf(u_n)−f(v)k< kf(v)−f(w)k. Using continuity we can show thatf|Uε/4(z) is even strongly isotonic.

(5)

The following proposition already shows that for functions defined on all points of an open and connected domain, all the properties we defined in Definition2are equivalent. The key ingredient in the proof is that the midpoint of a line segment between two points inΩis mapped by an isotony to the midpoint of the line segment between the corresponding image points.

Proposition 7 (Weakly isotonic implies similarity) Let Ω ⊆ R^d be an open and connected domain andf : Ω→R^dbe a locally weakly isotonic function. Thenf is globally a similarity.

Proof (details can be found in AppendixB) First we consider a globally strongly isotonic function f : Ω = Ur(z) → R^d. This allows us to define a function µ : [0,diam Ω] → [0,diamf(Ω)] by µ(kx−yk) = kf(x)−f(y)kfor all x, y ∈ Ω. In order to show thatf is a similarity, we have to show thatµ is linear. By showing that the midpoint of a line segment between two points in Ωis mapped by f to the midpoint of the line segment between the corresponding image points, we iteratively obtainµ(₂^jidiam Ω) = ₂^jidiamf(Ω),i∈ N,j ∈ {0, . . . ,2ⁱ}(see AppendixBfor details). By Lemma5,f is continuous and so isµ, implying thatµ(t) =t·(diamf(Ω)/diam Ω).

Now assume thatΩis open and connected andf : Ω→R^dis a locally weakly isotonic function.

By Lemma6,f is locally strongly isotonic. Hence, givenx∈Ωwe can chooseε(x)>0such that U_ε(x)(x)⊆Ωand thatf|_U_ε(x)_(x):U_ε(x)(x)→ R^dis globally strongly isotonic. It follows from the above thatf|_U

ε(x)(x)is a similarity and hencef : Ω→R^dis locally a similarity. By LemmaB(see AppendixA),f is even globally a similarity.

Finally, a continuous extension of an isotonic mapping is isotonic too. The proof is elementary.

Lemma 8 (Continuous extension inherits isotony) LetΩ ⊆R^dsuch thatK = Ωis convex. Let f : Ω→R^dbe isotonic andF :K →R^dbe a continuous extension off. ThenF is isotonic.

Now, we have collected all ingredients to prove Theorem3.

Proof of Theorem3(sketch) In case thatfis globally isotonic andΩ =Gis convex, we consider the unique continuous extensionFeoff toΩ. This is possible sincef is uniformly continuous by Lemma5. By Lemma8, Feis isotonic. According to Proposition7,Fe|G is even a similarity. By Lemma A(see AppendixA),Fe|G can be uniquely extended to a similarity F : R^d → R^d. For the general case we restrictf to several intersections ofΩand small balls. Considering one such a restriction, we are in the situation of the previous case and obtain a unique extension. We can show that all these extensions have to coincide similarly to the proof of LemmaBfrom AppendixA.

5. Proof of Theorem4(the finite case)

Cased = 1. The case d = 1 is particularly simple: it is easy to see that any weakly isotonic functionf : Ω→R(withΩ⊆R) is either strictly increasing or decreasing. The following lemma is the main step of the proof in the one-dimensional case. It considers points that approximate a grid, and proves that this property remains intact after an isotonic mapping. See Figure1 for an illustration.

Lemma 9 (Isotonic maps approximately preserve a grid) LetN ∈ N. For someε1 <1/2^2N+1 setε_k =ε₁2^k⁻¹,2 ≤k ≤N, andδ = ε₁/2. Fork∈ {1, . . . , N}andi ∈ {1,3, . . . ,2^k−1}set

(6)

JMLR: Workshop and Conference Proceedings vol 35:1–2, 2014 27th Annual Conference on Learning Theory

Full Title of Article

author names withheld Editor:Under Review for COLT 2014

Abstract This is a great paper and it has a concise abstract.

Keywords:List of keywords 1. Introduction

0 ¹2 1

y^l_1,1 y_1,1^r

0 1 ¹2 1

4 3

4

y^l_1,1 y_1,1^r

y^l_2,1 y_2,1^r y^l_2,3 y_2,3^r

c 2014 .

(a)N = 1

JMLR: Workshop and Conference Proceedings vol 35:1–2, 2014 27th Annual Conference on Learning Theory

author names withheld Editor:Under Review for COLT 2014

Abstract This is a great paper and it has a concise abstract.

Keywords:List of keywords 1. Introduction

0 ¹2 1

y^l_1,1 y^r_1,1

0 1 ¹2 1

4 3

4

y^l_1,1 y_1,1^r

y^l_2,1 y_2,1^r y^l_2,3 y_2,3^r

c 2014 .

(b)N = 2

Figure 1: The idea of Lemma9is to place points in small intervals close to the grid pointsi/2^N (y_k,i^l on the left side,y^r_k,i on the right side) in such a way that the ordinal constraints between all these points are sufficient to determine the grid cells they belong to, independent of their exact location within the intervals.

x_k,i=i/2^kand lety_k,i^l ,y^r_k,ibe arbitrary elements of(x_k,i−ε_k−δ, x_k,i−ε_k)and(x_k,i+ε_k, x_k,i+ ε_k+δ), respectively. Letϕ:{0,1} ∪ {y^m_k,i:m∈ {l, r}, k≤N, i∈ {1,3, . . . ,2^k−1}} →[0,1]

be a weakly isotonic function withϕ(0) = 0andϕ(1) = 1. Then it holds that y^m_k,i−ϕ(y_k,i^m)< 1

2^N, m∈ {l, r}, k≤N, i∈ {1,3, . . . ,2^k−1}. (3) Proof (details can be found in AppendixB) By induction overN we prove

ϕ(y_k,i^l )∈

2^N−ki−1

2^N ,2^N−ki 2^N

, ϕ(y^r_k,i)∈

2^N−ki

2^N ,2^N^−ki+ 1 2^N

,

for all k ≤ N, i ∈ {1,3, . . . ,2^k−1}, which immediately implies (3). The basis is clear (see Figure 1(a)): Due to ϕ(0) = 0 and ϕ(1) = 1, ϕ is strictly increasing and hence0 = ϕ(0) <

ϕ(y_1,1^l ) < ϕ(y_1,1^r ) < ϕ(1) = 1. Since |y_1,1^l −0| < |y_1,1^l −1|and ϕ is weakly isotonic, we have |ϕ(y_1,1^l )−ϕ(0)| < |ϕ(y^l_1,1)−ϕ(1)|and thus can conclude thatϕ(y^l_1,1) ∈ (0,1/2). In the same way we obtainϕ(y_1,1^r ) ∈ (1/2,1). We demonstrate the inductive step by proving that the statement also holds forN = 2(see Figure1(b)): We already know thatϕ(y_1,1^l ) ∈ (0,1/2)and ϕ(y_1,1^r ) ∈ (1/2,1). Furthermore, due to ϕ being strictly increasing, we have 0 < ϕ(y_2,1^l ) <

ϕ(y_2,1^r ) < ϕ(y_1,1^l ) < ϕ(y^r_1,1) < ϕ(y^l_2,3) < ϕ(y_2,3^r ) < 1. The choice of (ε_k)₁_≤_k_≤_N andδ guar- antees that |y^l_2,1 −0| < |y_2,1^l −y_1,1^l |and |y_2,1^r −y^r_1,1| < |y_2,1^r −0|leading to|ϕ(y_2,1^l )−0| <

|ϕ(y^l_2,1)−ϕ(y^l_1,1)|and|ϕ(y^r_2,1)−ϕ(y^r_1,1)|<|ϕ(y^r_2,1)−0|. This yields2ϕ(y_2,1^l )< ϕ(y_1,1^l )<1/2 and1/2 < ϕ(y_1,1^r ) < 2ϕ(y^r_2,1)and thusϕ(y_2,1^l ) ∈ (0,1/4)andϕ(y^r_2,1), ϕ(y^l_1,1) ∈(1/4,1/2). In the same way we can show thatϕ(y_2,3^r )∈(3/4,1)andϕ(y^l_2,3), ϕ(y^r_1,1)∈(1/2,3/4).

Now it is straightforward to prove Theorem4for the cased= 1(Proposition10implies Part1 of Theorem4; the proof of Part2is the same as for the cased≥2, which follows later on).

Proposition 10 (Statement ford= 1) Let I = [a, b] (for some−∞ < a < b < ∞) and let (xn)n∈Nbe a sequence of pointsxn ∈ I such that{xn :n ∈ N}is dense inI. Let0 < R < ∞ and(ϕ_n)_n∈Nbe a sequence of weakly isotonic functionsϕ_n:{x₁, . . . , x_n} →[−R, R]. Then there exists a sequence(S_n)_n_∈_Nof similarity transformationsS_n:R→Rwith(2).

Proof (sketch) By appropriately rescaling the domain and the image ofϕnwe may assume that I = [0,1]and that ϕ_n maps to [0,1] withϕ_n(0) = 0, ϕ_n(1) = 1. We use Lemma 9 in order to show thatϕ_n for large values ofncan be approximated by the identity: Choose N ∈ Nsuch that 1/2^N is sufficiently small. Since {xn : n ∈ N} is dense in I, there exists N0 ∈ N such that in each of the intervals(x_k,i−ε_k−δ, x_k,i−ε_k)and(x_k,i+ε_k, x_k,i+ε_k+δ)as defined in

(7)

Lemma9(for the chosenN) there lies an element of{x₁, . . . , x_N₀}. Ifn≥N₀,y∈ {x₁, . . . , x_n}, andy is one of these elements, we immediately obtain|y−ϕ_n(y)| < 1/2^N according to (3). If yis not one of these elements, we can use the monotonicity ofϕ_nto infer that|y−ϕ_n(y)|is small.

Case d ≥ 2. The case d ≥ 2 is harder to deal with. Our basic idea is to show that an isotonic mappingϕn : {x1, . . . , xn} → R^d, up to some rescaling, is anε(n)-nearisometry, that is ϕ_nsatisfies

kx−yk −ε(n)≤ kϕn(x)−ϕn(y)k ≤ kx−yk+ε(n), x, y∈ {x1, . . . , xn}. (4) Then, by a theorem ofAlestalo et al.(2001),ϕ_ncan be approximated by an isometry up to an error depending (essentially) only onε(n)and going to zero asε(n)→0.

For proving thatϕ_nis anε(n)-nearisometry we observe the following: sinceϕ_n is isotonic, it is sufficient to prove (4) only for some pairsx, ysuch thatkx−ykis roughly uniformly distributed in[0,diam{x1, . . . , xn}]. Hence, we would like to consider points close to a straight line and argue in a way similar to Lemma9that their relative positions along the line are almost preserved by an isotonic mapping. Yet the problem is that, in general, there is no guarantee that the points are still close to a straight line after applying an isotony. However, assuming that there are points located close to the vertices of a cross-polytope and that these are “fixed” points (this is Assumption (?) in the following lemma), we can show that this is indeed the case and Lemma 9can be generalized in the following sense. Here we just provide a sketch of the lemma (see also Figure 2 for an explanation). A detailed version can be found in AppendixC.

Lemma 11 (Under Assumption(?)isotonic mappings preserve an approximately straight line) Letd≥2. LetN ∈Nsuch that

ω= 24 Γ(^d₂ + 1) π^d²

!¹_d 1 2^N −1

¹_d

< 1 2(d−1)

be fixed. Let U_s⁺, U_s⁻,Ue_s⁺, Ue_s⁻, s = 1, . . . , d, and U_k,i^j , U_k,i^l ,U_k,i^r , k ≤ N, i ∈ {1,3, . . . ,2^k− 1}, j ∈ {2, . . . , d}, be open balls with some certain properties (see Appendix Cfor details). Let X_s⁺, X_s⁻ ∈ R^d,s= 1, . . . , d, be arbitrary elements ofU_s⁺andU_s⁻, respectively,z^j_k,i ∈ R^dbe an arbitrary element ofU_k,i^j , andy_k,i^l ,y_k,i^r ∈ R^dbe arbitrary elements ofU_k,i^l andU_k,i^r , respectively.

Letϕ:{X₁⁺, X₁⁻, . . . , X_d⁺, X_d⁻} ∪ {z_k,i^j :k≤N, i∈ {1,3, . . . ,2^k−1}, j∈ {2, . . . , d}} ∪ {y_k,i^m : m∈ {l, r}, k≤N, i∈ {1,3, . . . ,2^k−1}} →R^dbe an isotonic function and assume that

ϕ(X_s⁺)∈Ue_s⁺, ϕ(X_s⁻)∈Ue_s⁻, s= 1, . . . , d. (?) Setγ(−1) =γ(1) = ˜α₁ andγ(0) = ˜α₁+^d⁻₂¹(ω+ρ)(whereα˜₁is the radius of the ballsUe₁⁺,Ue₁⁻ andρa small number depending on size and location of the ballsUe_s⁺,Ue_s⁻,s= 2, . . . , d), and define for2≤k≤N andi∈ {1,3, . . . ,2^k−1}the positive expressionγ(−1 +i/2^k⁻¹)recursively by

γ

−1 + i 2^k⁻¹

= 1 2

γ

−1 +i−1 2^k⁻¹

+γ

−1 +i+ 1 2^k⁻¹

+ (d−1)(ω+ 2ρ)

.

(8)

KLEINDESSNER VONLUXBURG B

A

Z2 Z₂⁺

Z3 Z⁺₃

x12U(( 1 +¹_n/0/0)) e⁺_3,(1/_1/1)2U"

⇣ E_3,(1/⁺ _1/1)⌘

( 1/0)

1 2

1

X₁ 2U₁ 2 X₁⁺2U₁⁺ X₂⁺2U₂⁺

X₂ 2U₂ z²2,12U2,1² z²_1,12U_1,1²

y^r1,12U1,1^r

y2,1^l 2U2,1^l

2

(a)

SHORTTITLE

( 1/0)

3 +3

⇢ >

{p:kp '(X₂⁺)k<kp '(X₂)k}

'(X₁)2Ue₁

'(X₁⁺)2Ue₁⁺ '(X₂⁺)2Ue₂⁺

'(X₂)2Ue₂

3

(b)

Figure 2: Explanation of Lemma11(ford= 2)We consider an isotonic mapϕdefined on the following point set (see2(a)): (i)X₁⁺, X₁⁻, X₂⁺, X₂⁻are located in small balls around the vertices of a cross-polytope and assumed to be “fixed” underϕ(this is Assumption (?)). (ii) The pointsy^l_k,i, y^r_k,iapproximate a grid as in Lemma9on the line segment betweenX₁⁻ andX₁⁺and are closer toX₂⁻ than toX₂⁺. (iii) The points z_k,i² are close to the pointsy^l_k,i andy_k,i^r but are closer toX₂⁺ than toX₂⁻. Sinceϕis isotonic, the points ϕ(y^l_k,i), ϕ(y_k,i^r )are closer toϕ(X₂⁻)than toϕ(X₂⁺)and henceϕ²(y_k,i^l ), ϕ²(y^r_k,i)< ρwhereas for the points ϕ(z²_k,i)it is the other way round such thatϕ²(z_k,i² )>−ρ(see2(b)). However,y_k,i^m (m∈ {l, r}) andz_k,i² are close to each other and so areϕ(y_k,i^m)andϕ(z_k,i² ). We can conclude that all pointsϕ(y_k,i^m)are close to the first coordinate axis. This allows us to estimate the location ofϕ(y^m_k,i)along similar lines as in Lemma9.

LetN^∗ < N such thatN^∗·2^N^∗< 5(d+1)(ω+ρ+ ˜¹ α1). Then we have y^m_k,i−ϕ y_k,i^m< γ(x_k,i) +ω+ (d−1)(ω+ρ)<3d√

ω, m∈ {l, r}, (5) wherex_k,i=−1 +₂kⁱ−1, for all1≤k≤N^∗andi∈ {1,3, . . . ,2^k−1}.

Proof (sketch) We prove that for all1≤k≤N^∗andi∈ {1,3, . . . ,2^k−1}, ϕ(y^l_k,i)∈(x_k,i−γ(x_k,i)−ω, x_k,i+γ(x_k,i))×(−ρ−ω, ρ)^d⁻¹,

ϕ(y^r_k,i)∈(x_k,i−γ(x_k,i), x_k,i+γ(x_k,i) +ω)×(−ρ−ω, ρ)^d−1. (6) It is elementary to show thatγ(x_k,i) < ¹₂(d−1)√

3ω, k ≤ N^∗, and because ofy_k,i^l ∈ (x_k,i− ω, x_k,i)×(−ω,0)^d−1, y^r_k,i∈(x_k,i, x_k,i+ω)×(−ω,0)^d−1this immediately yields (5).

All pointsy^l_k,i, y^r_k,i, z_k,i^j lie in the convex hull of the pointsX₁⁺, X₁⁻, . . . , X_d⁺, X_d⁻. Sinceϕis isotonic and satisfies Assumption (?), one can roughly estimate that

ϕ(y^l_k,i), ϕ(y_k,i^r ), ϕ(z_k,i^j )∈[−3,3]^d. (7) The idea for provingϕ^j(y_k,i^l ), ϕ^j(y_k,i^r ) ∈ (−ρ−ω, ρ), j ∈ {2, . . . , d}, is the following: Letj be fixed. Form∈ {l, r},k∈ {1, . . . , N},i∈ {1,3, . . . ,2^k−1}we haveky_k,i^m −X_j⁻k<ky_k,i^m −X_j⁺k and kz^j_k,i−X_j⁺k < kz_k,i^j −X_j⁻k. Since ϕ is isotonic, it follows thatkϕ(y^m_k,i) −ϕ(X_j⁻)k <

kϕ(y_k,i^m)−ϕ(X_j⁺)kandkϕ(z_k,i^j )−ϕ(X_j⁺)k < kϕ(z_k,i^j )−ϕ(X_j⁻)k. Because of (?) and (7), we can conclude thatϕ^j(y^m_k,i) < ρ andϕ^j(z^j_k,i) > −ρ(see Figure 2(b)). The distance between any

(9)

two points z^j_k

1,i1, z_k^j

2,i2 is larger than the distance between any two pointsz_k,i^j , y^l_k,i (or z^j_k,i, y_k,i^r , respectively), that is form∈ {l, r},k∈ {1, . . . , N},i∈ {1,3, . . . ,2^k−1}it holds that

kz_k,i^j −y_k,i^mk<minn

ku−vk:u6=v∈n z_˜^j

k,˜i : ˜k≤N,˜i∈n

1,3, . . . ,2^˜^k−1ooo

. (8)

Letm∈ {l, r},k₀≤N,i₀∈ {1,3, . . . ,2^k⁰−1}be arbitrary and writer=kϕ(z_k^j

0,i0)−ϕ(y^m_k

0,i0)k. Due to (8) andϕbeing isotonic, all pointsϕ(z_k,i^j )are located at distance larger thanrto each other which implies that the intersection of two balls (whether open or closed) with radiusr/2and centers ϕ(z^j_k

1,i1)andϕ(z_k^j

2,i2), respectively, is empty. Recall (7). Due to (?) and, again,ϕbeing isotonic, we clearly haver ≤3. Hence, with each pointϕ(z_k,i^j )at least a fraction of1/2^dof the volume of the ballU_r/2(ϕ(z_k,i^j ))is contained in[−3,3]^dtoo. We can infer that

(2^N −1)1 2^d

π^d² Γ(^d₂ + 1)

r 2

d

≤6^d, or equivalentlyr ≤ω. Hence, we have|ϕ^j(z_k^j

0,i0)−ϕ^j(y_k^m

0,i0)| ≤ kϕ(z_k^j

0,i0)−ϕ(y^m_k

0,i0)k ≤ωand finally obtainϕ^j(y^l_k₀_,i₀), ϕ^j(y^r_k₀_,i₀)∈(−ρ−ω, ρ). Similar to (8), we also have

ky_k,i^l −y^r_k,ik<minn

ku−vk:u6=v∈n z_˜^j

k,˜i: ˜k≤N,˜i∈n

1,3, . . . ,2^k^˜−1ooo , and with the same argument as above obtain|ϕ¹(y_k,i^l )−ϕ¹(y^r_k,i)| ≤ω,k≤N,i∈ {1,3, . . . ,2^k− 1}. Now, (6) can be shown by induction overk.

The following lemma shows that the Assumption (?), which says that points close to the vertices of a cross-polytope are mapped approximately to themselves, can be taken as satisfied if the isotonic function acts on sufficiently many points. See Figure 3 for an explanation. Again, here we just provide a sketch of the lemma and the detailed version is in AppendixC.

Lemma 12 (Assumption(?)can be taken as satisfied) Letd≥2. LetN⁰ ∈Nsuch that

ω⁰ = 32 Γ(^d₂+ 1) π^d²

!¹_d 1

√d

N⁰

is sufficiently small andr <1andµ, δ, ε > 0be appropriately chosen real numbers (see Appendix Cfor details). Define pointsA, B ∈R^dandZ_s⁻, Z_s⁺∈R^d,s∈ {2, . . . , d}, by

A= (−1/0/ . . . /0), B = (1/0/ . . . /0), Z₂⁻= (0/−r/0/0/ . . .),

Z₂⁺= (0/r/0/0/ . . .), Z₃⁻= (0/0/−r/0/ . . .), Z₃⁺= (0/0/r/0/ . . .), and so forth.

Fors∈ {2, . . . , d}andv∈ {−1,1}^dsetE_s,v⁻ =Z_s⁻+µv,E_s,v⁺ =Z_s⁺+µvand lete⁻_s,v, e⁺_s,v∈R^dbe arbitrary elements ofU_ε(E_s,v⁻ )andU_ε(E_s,v⁺ ), respectively. Fori∈ {1, . . . ,2N⁰−1}letx_i ∈R^dbe an arbitrary element ofU_δ((−1 +_Nⁱ₀/0/ . . . /0)). Letϕ:{A, B} ∪ {e⁻_s,v, e⁺_s,v :s∈ {2, . . . , d}, v∈

(10)

KLEINDESSNER VONLUXBURG JMLR: Workshop and Conference Proceedings vol 35:1–1, 2014 27th Annual Conference on Learning Theory

author names withheld

Editor:Under Review for COLT 2014

Abstract

This is a great paper and it has a concise abstract.

Keywords:List of keywords

1. Introduction

B

A

Z₂ Z₂⁺

Z₃ Z₃⁺

x12U (( 1 +_N¹₀/0/0)) e⁺_3,(1/ _1/1)2U"

⇣E⁺_3,(1/ _1/1)⌘

c 2014 .

Figure 3: Explanation of Lemma 12We consider an isotonic mapping ϕdefined on the following point set: (i)AandB are op- posite vertices of a cross-polytope. (ii) The points e⁻_s,v, e⁺_s,v are located in small balls around the vertices of hypercubes placed around the remaining vertices of the cross- polytope. (iii) Numerous points xi are located in small balls which are placed equidistantly betweenAandB. This yields ordinal constraints sufficient to show that all points e⁻s,v, e⁺s,v are “fixed” under ϕ up to some similarity transformation. The figure shows the setting of Lemma12ford= 3.

{−1,1}^d} ∪ {x_i :i = 1, . . . ,2N⁰−1} →R^dbe an isotonic function withkϕ(A)−ϕ(B)k= 2.

Then there exist a constantC(depending only ond) and an isometryS :R^d→R^dsuch that kA−S(ϕ(A))k ≤Cp

A(ω⁰), kB−S(ϕ(B))k ≤Cp A(ω⁰), kZ_s^m−S(ϕ(e^m_s,v))k ≤Cp

A(ω⁰), m∈ {−,+}, s∈ {2, . . . , d},

wherev= (1/1/1/ . . . /1)andA(ω⁰)only depends onω⁰anddand satisfiesA(ω⁰)→0asω⁰ →0.

Proof (sketch) With an argument similar to the one subsequent to (8) in the proof of Lemma 11 we can show that all pointsp₁, p₂ in the domain ofϕand withkp₁−p₂k<kA−x₁kmust satisfy kϕ(p₁)−ϕ(p₂)k< ω⁰/2. The parametersµ, δandεare chosen in such a way thatke^m_s,v₁−e^m_s,v₂k<

kA−x₁kfor anym ∈ {−,+},s ∈ {2, . . . , d}and for allv₁, v₂ ∈ {−1,1}^d. Using this and the assumption ofϕbeing isotonic, we can show that

2−ω⁰<kp⁺_s −p⁻_sk ≤2, s= 1, . . . , d,

kp⁺_s −p⁻_s₀k −ω⁰ <kp⁺_s −p⁺_s₀k<kp⁺_s −p⁻_s₀k+ω⁰, s6=s⁰ ∈ {1, . . . , d}, kp⁻_s −p⁻_s₀k −ω⁰ <kp⁻_s −p⁺_s₀k<kp⁻_s −p⁻_s₀k+ω⁰, s6=s⁰ ∈ {1, . . . , d},

(9)

wherep⁺₁ = ϕ(B),p⁻₁ = ϕ(A)andp⁺_s = ϕ(e⁺_s,v),p⁻_s = ϕ(e⁻_s,v)fors = 2, . . . , d. For example, let us provekp⁻₁ −p⁻₂k −ω⁰ <kp⁻₁ −p⁺₂k<kp⁻₁ −p⁻₂k+ω⁰: Elementary calculations show that kA−e⁻_2,vk<kA−e⁺_2,vkandkA−e⁺_2,vck<kA−e⁻_2,vckwithv^c= (−1/−1/−1/ . . . /−1). We inferkp⁻₁ −p⁻₂k<kp⁻₁ −p⁺₂kandkp⁻₁ −ϕ(e⁺_2,vc)k<kp⁻₁ −ϕ(e⁻_2,vc)kand thus obtain

kp⁻₁ −p⁺₂k ≤ kp⁻₁ −ϕ(e⁺_2,vc)k+kϕ(e⁺_2,vc)−p⁺₂k<kp⁻₁ −ϕ(e⁻_2,vc)k+kϕ(e⁺_2,vc)−p⁺₂k

≤ kp⁻₁ −p⁻₂k+kp⁻₂ −ϕ(e⁻_2,vc)k+kϕ(e⁺_2,vc)−p⁺₂k<kp⁻₁ −p⁻₂k+ω⁰. From (9) we can infer that

|hp⁺_s −p⁻_s, p⁺_s₀−p⁻_s₀i|<10ω⁰, s6=s⁰ ∈ {1, . . . , d}. (10)

(11)

Furthermore, we can show thatk(p⁺_s +p⁻_s)−(p⁺_s₀+p⁻_s₀)k,s6=s⁰ ∈ {1, . . . , d}, is small (provided ω⁰is small), that is

k(p⁺_s +p⁻_s)−(p⁺_s₀+p⁻_s₀)k² ≤d 20ω⁰+ 8^10dω_4d−1⁰(4d)^d⁻¹ 2−ω⁰−^10dω_4d₋₁⁰(4d)^d−1

!2

, s6=s⁰∈ {1, . . . , d}. (11) This is done by first applying the Gram-Schmidt process to the vectors(p⁺_s −p⁻_s),s = 1, . . . , d.

By doing so we obtain an orthonormal basis ofR^d whose elements (appropriately rescaled) differ from the vectors(p⁺_s −p⁻_s)only up to some small error (depending onω⁰). Considering the Fourier coefficients of(p⁺_s +p⁻_s)−(p⁺_s₀+p⁻_s₀)with respect to this orthonormal basis then leads to (11) .

Now, setting Z₁⁻ = A, Z₁⁺ = B, we consider the mappingf : {Z₁⁻, Z₁⁺, . . . , Z_d⁻, Z_d⁺} → {p⁻₁, p⁺₁, . . . , p⁻_d, p⁺_d}given byf(Z_s^m) =p^m_s form∈ {−,+},s∈ {1, . . . , d}. Using (10) and (11) it is straightforward to show thatf is a2p

A(ω⁰)-nearisometry, that is it holds that kx−yk −2p

A(ω⁰)≤ kf(x)−f(y)k ≤ kx−yk+ 2p

A(ω⁰), x, y∈ {Z₁⁻, Z₁⁺, . . . , Z_d⁻, Z_d⁺}. According toAlestalo et al.(2001), Theorem 3.3, there exists a constantC⁰ (depending only ond

— can be chosen independently of the parametersr, µ, δ, ε) and an isometryT :R^d→R^dsuch that kT(x)−f(x)k ≤ 2C⁰p

A(ω⁰),x ∈ {Z₁⁻, Z₁⁺, . . . , Z_d⁻, Z_d⁺}. SettingS =T⁻¹andC = 2C⁰ the statement of Lemma12follows immediately.

Now we can prove Theorem4ford≥2.

Proof of Part 1of Theorem4 (sketch) By Lemma C(see AppendixA) it is sufficient to prove that for everyε₀ > 0 there exists N(ε₀) ∈ Nsuch that for all n ≥ N(ε₀) there is a similarity transformationS(n, ε₀) :R^d→R^dwithkϕ_n−S(n, ε₀)k∞({x1,...,xn})< ε₀.

In a nutshell, the basic idea is the following: Assume K is a ball with diameter only slightly larger than two and containing all the balls of Lemma11. Ifn∈Nis sufficiently large, in each of these balls there is an element of{x1, . . . , xn}. Assume for the moment thatϕnsatisfies Assump- tion (?) of Lemma11. Then from (5) we obtain an estimate for the expressionkϕ_n(x)−ϕ_n(y)k for roughly uniformly distributed values ofkx−ykin[0,2]≈[0,diam{x₁, . . . , x_n}]. Sinceϕ_nis isotonic, this gives an estimate forkϕn(x)−ϕn(y)kfor allx, y∈ {x1, . . . , xn}which is sufficient to show thatϕ_nis anε-nearisometry for some smallε. Hence, we can uniformly approximateϕ_n by an isometry according toAlestalo et al.(2001). It remains to be argued why Assumption (?) of Lemma11indeed can be taken as satisfied. However, this is the statement of Lemma12.

A bit more precisely, the main steps of the proof can be summarized as follows:

1. Since {x_n : n ∈ N} is dense in K, we can choose N₀ ∈ N so large that there are points x_A, x_B∈ {x₁, . . . , x_N₀}and a similarity transformT :R^d→R^dwith the following properties:

• T(x_A) = (−1/0/0/ . . . /0),T(x_B) = (1/0/0/ . . . /0)

• U1(0)⊆T(K),diamT(K)is “sufficiently small”

• ∀y ∈ T(K) :U_r₀(y)∩ {T(x₁), . . . , T(x_N₀)} 6=∅wherer₀ >0is smaller than the minimal radius of the finitely many open balls considered in Step 3 and smaller thanδ0from Step 6.

In the following, we considerϕ_nfor a fixedn≥N₀.