Also a ﬁrst-order asymptoticexpansion for the variance of the cost is derived and results on exponential moments are given

(1)

2001, Vol. 11, No. 2, 452–469

LIMIT LAWS FOR PARTIAL MATCH QUERIES IN QUADTREES

By Ralph Neininger and Ludger R ¨uschendorf Universit¨at Freiburg

It is proved that in an idealized uniform probabilisticmodel the cost of a partial match query in a multidimensional quadtree after normalization converges in distribution. The limiting distribution is given as a fixed point of a random affine operator. Also a first-order asymptoticexpansion for the variance of the cost is derived and results on exponential moments are given. The analysis is based on the contraction method.

1. Introduction. A partial match query is one of several types of queries in a ﬁle which maintains the organization of multidimensional data. Databases for multidimensional data are of special interest for applications in geogra- phical information systems, computer graphics and computational geometry.

Structures maintaining multiattribute keys should support the usual dictio- nary operations as well as someassociative queries. Examples of such associa- tive queries are nearest neighbor queries, partial match queries and convex or orthogonal range queries. Relevant data structures which support associa- tive queries are considered in the books of Knuth (1998) and Samet (1990a, b). These structures can be divided into comparison based algorithms and methods based ondigital techniques. The digital techniques use binary representations of the keys. Examples are triesanddigital search trees. Examples of comparison based structures are quadtrees and multidimensional binary search trees (K-d-trees). These algorithms work with comparisons of whole keys instead of binary representations. For an analysis of the performance of basicparameters for these structures see Mahmoud (1992).

In this paper we give an asymptoticprobabilisticanalysis of the cost of partial match queries in quadtrees. We assume the data to belong to some d-dimensional domain D = D₁ × · · · ×D_d, which using binary encodings we can assimilate into the unit cube 01^d. For a partial match query a query q = q₁ q_d is given where q_i ∈ 01 ∪ ∗ for 1 ≤ i ≤ d. Here

∗ denotes that this component is left unspecified. Then all data in the file have to be retrieved, which match the query q. This means to report all the keys which are identical toqin all the components whereqis specified, that is, the components withq_i = ∗. For the probabilisticanalysis of partial match retrieval we assume the uniform probabilistic model following Flajolet and Puech (1986). The uniform probabilistic model assumes all components in the data and the specified components in the query to be independent and

Received September 1999; revised June 2000.

AMS2000subject classiﬁcations. Primary 68Q25, 60F05; secondary 68P10.

Key words and phrases. Quadtree, partial match query, contraction method, multidimensional data structure, analysis of algorithms.

452

(2)

uniformly distributed on01. For comparison based algorithms this is equivalent to the more general model where the components are assumed to be drawn independently from any continuous distribution. However, we assume throughout this work the idealization that queries in subtrees are independent.

The quadtree structure is due to Finkel and Bentley (1974). It extends the classical idea of binary search trees to multidimensional data. For the construction of the quadtree we refer to Mahmoud (1992). Essentially a data point partitions the search space by the hyperplanes perpendicular to the axes. Used recursively this principle leads to a decomposition of the search space into quadrants. The quadtree corresponds to this partitioning. For a partial match query in a quadtree we have to start at the root of the tree.

According to the comparisons of the speciﬁed components of the query with the corresponding components of the root some of the subtrees of the root have to be considered recursively for the further search. The cost of a partial match query in a quadtree is measured by the number of nodes traversed during this search. We denote this cost in a quadtree containingnnodes byC_n.

The cost has already been studied in the uniform probabilistic model. In dimension d=2 and withs =1 component speciﬁed, the ﬁrst-order asymp- toticexpansions for the mean and variance are known. Flajolet, Gonnet, Puech and Robson (1993) derived

ƐC_n∼γn^α−1 with

α=

√17−1

2 and γ= 2α

2³α (1)

Mart´ınez, Panholzer and Prodinger (2000) recently found VarC_n ∼βn^2α−2

with

β= 2α−12α

3αα−1⁴α−²2α 4⁶α (2)

In arbitrary dimension d with 1 ≤ s ≤ d−1 components speciﬁed Flajolet, Gonnet, Puech and Robson (1993) derived

ƐC_n∼γ_{s d}n^α−1 (3)

whereγ_{s d}is a (unknown) positive constant which can be approximated numerically andα∈ 12is the unique solution of theindicial equation,

α^d−sα+1^s=2^d (4)

An expansion for the variance of C_n was not known up to now.

In this paper we give limit laws for C_n in any dimension and derive the ﬁrst-order asymptoticexpansion of the variance of C_n, and results on exponential moments. The normalized cost

X_n= C_n−ƐC_n n^α−1 (5)

(3)

converges weakly to a random variable which is characterized as the ﬁxed point of a random afﬁne operator. For the proof we use the contraction method.

This method was introduced by Rösler (1991) for the analysis of Quicksort. The contraction method has been further developed independently in Rösler (1992) and Rachev and R üschendorf (1995). For a recent survey of this method see also Rösler and R üschendorf (2000).

Limit laws for the cost of partial match queries in the uniform probabilistic model for the K−d tries and some variants of K−d trees were recently derived in Schachinger (2000) and Neininger (2000), respectively. In all these data structures the mean and standard deviation for the cost of a partial match query are known to be of the same order of magnitude. Therefore we do not need the second-order term in the expansion of the mean of C_n in order to derive a limiting operator. The second-order term for the mean has turned out to be crucial for the problem of the internal path length of related random trees [see Dobrow and Fill (1999), Rösler (2000) and Neininger and R üschendorf (1999)]. From this point of view the problem of partial match query bears some similarity with the running time of the FIND-algorithm in the model of Mahmoud, Modarres and Smythe (1995). Nevertheless, for the FIND problem it is easier to derive information on the limit distribution from the fixed-point equation owing to the purely one-sided character of the FIND-algorithm.

2. Standard quadtrees in dimensiond=2. We denote byW= U V the ﬁrst key to be inserted, which is stored in the root of the random two- dimensional quadtree. The variablesUandVare independent and uniformly distributed on 01. The rootW=w= u v partitions the unit square into four quadrants with volumes given by w = uv u1−v1−uv1−u 1−v. We denote byIⁿ the vector of the cardinalities of the subtrees of the root of a random quadtree withnnodes. Then conditionally givenW=wthe vectorIⁿ is multinomialMn−1wdistributed

^Iⁿ^W=w=Mn−1w The weak law of large numbers for Iⁿ then implies

Iⁿ n

−→ W = UV U1 −V1−UV1−U1−V (6)

withU Vuniformly distributed on01².

This implies L₁-convergence of bounded continuous functionals of Iⁿ/n, in particular,

Ɛ Iⁿ_k

n _2α−2

−→ƐUV^2α−2= 1 2α−1² Ɛ

1_Y<U

Iⁿ₁ n

_2α−2

−→Ɛ

1_Y<UUV^2α−2

= 1 2α2α−1 (7)

ifYis uniformly distributed on01and independent of Iⁿ andU.

(4)

For a partial match query in dimension 2 one component of the search pattern is specified. W.l.g. we can assume the first component is specified, so the pattern is of the formS∗. For the distributional analysis of partial match query let C_n denote the number of nodes traversed in the quadtree during a partial match retrieval. We assume that the first component of the search pattern Y is uniformly distributed on01 and independent of the random quadtree according to the uniform probabilistic model. ThenC₁=1; we define C₀=0. Conditionally givenIⁿ the subtrees are mutually independent and distributed as quadtrees. For this reason the number of traversed nodes C_n satisfies the distributional recursive equation

C_n= 1_Y<U

C¹

Iⁿ1 +C²

Iⁿ2

+1_Y≥U

C³

Iⁿ3 +C⁴

Iⁿ4

+1 (8)

Here Y U V, and the sequences C¹_i C⁴_i are independent, Y U V are uniformly distributed on 01, C^k_i = C_i for k =1 4 i∈ ₀, and Iⁿis multinomialMn−1wdistributed givenU V =w. Related independence properties are used throughout the paper in recursive equations of a similar form without stating them explicitly in each case.

Recall the ﬁrst-order expansions for the mean and variance ofC_n. Flajolet, Gonnet, Puech and Robson (1993) derived

ƐC_n∼γn^α−1 (9)

withαandγgiven in (1).

In Mart´ınez, Panholzer and Prodinger (2000) it is shown that VarC_n ∼βn^2α−2

(10)

holds withβas in (2).

Therefore a normalized versionX_nof C_n is given by X_n= C_n−ƐC_n

n^α−1 The modiﬁed recursion forX_n is given by

X_n= 1_Y<U ²

k=1

Iⁿ_k n

_α−1 X^k_In

k +γ+o1

+1_Y≥U ⁴

k=3

Iⁿ_k n

_α−1 X^k_In

k +γ+o1 −γ+o1

(11)

This recursion and (6) suggest that a limitXofX_nis a solution of the limiting equation,

X=1_Y<U

UV^α−1X¹+γ + U1−V^α−1X²+γ +1_Y≥U

1−UV^α−1X³+γ + 1−U1−V^α−1X⁴+γ

−γ (12)

(5)

Here Y, U, V, X¹ X⁴ are independent, Y, U, V are uniformly distributed on01, andX^k= Xfork=1 4.

We deﬁne

M₀₂= µ∈M¹¹¹ Ɛµ=0 Varµ <∞

where Ɛµ and Varµ are defined, respectively, as the expectation and variance of a corresponding random variable and M¹¹¹ denotes the space of probability measures on the real line. We define the random affine operator corresponding to (12) by

TM¹¹¹ →M¹¹¹ Tµ= 1_Y<U

UV^α−1Z¹+γ + U1−V^α−1Z²+γ +1_Y≥U

1−UV^α−1Z³+γ

+ 1−U1−V^α−1Z⁴+γ

−γ (13)

where Y U V, Z¹ Z⁴ are independent, Y U V are uniformly distributed on01andZ^k= µfork=1 4.

Our aim is to show thatTis thelimiting operatorof the recursive sequence X_nin (11). We supplyM₀₂⊂M¹¹¹with the minimal"₂-metric,

"₂µ ν =infƐX−Y²^1/2X= µ Y= ν (14)

For random variables X, Y we use synonymously "₂X Y = "₂^X^Y. ThenM₀₂ "₂is a complete metric space and"₂µ_n µ →0 is equivalent to

µ_n→ µ and

x²dµ_nx →

x²dµx (15)

The inﬁmum in (14) is attained. Random variables X, Y with X = µ, Y= ν and "₂µ ν = ƐX−Y²^1/2 are called optimal couplings of µ ν.

See Rachev (1991) and Bickel and Freedman (1981) for basic facts on the minimal"₂-metric.

Lemma 2.1. TM₀₂→M₀₂, withTgiven in13is a contraction w.r.t."₂,

"₂Tµ Tν ≤ξ"₂µ ν for allµ ν∈M₀₂ ξ= 2

19−3√

17=0776· · · (16)

Proof. Obviously Var Tµ <∞. Furthermore, ƐTµ = 0 follows from Ɛ1_Y<UU^α−1V^α−1 =1/4. SoT is a well-deﬁned mappingT M₀₂ → M₀₂. To prove contractivity letµ,ν∈M₀₂and letW^k Z^k,Y,U,Vbe independent, andY,U,Vuniformly distributed on01. LetW^k Z^kbe optimal

(6)

couplings ofµ ν; that is,W^k= µ,Z^k= νand"²₂µ ν =ƐW^k−Z^k² fork=1 4. Then using the independence properties andƐW^k=ƐZ^k=0,

"²₂Tµ Tν

≤Ɛ1_Y<U

UV^α−1W¹−Z¹ + U1−V^α−1W²−Z² +1_Y≥U

1−UV^α−1W³−Z³

+ 1−U1−V^α−1W⁴−Z⁴ ²

=Ɛ

1_Y<U

UV^2α−2W¹−Z¹²+ U1−V^2α−2W²−Z²² +1_Y≥U

1−UV^2α−2W³−Z³²

+ 1−U1−V^2α−2W⁴−Z⁴²

=4Ɛ

1_Y<UUV^2α−2W¹−Z¹²

=4Ɛ

1_Y<UUV^2α−2

"²₂µ ν (17)

Now, from Ɛ

1_Y<UUV^2α−2

= 1

2α2α−1 = 1

19−3√ 17 the assertion follows. ✷

By Banach’s ﬁxed point theorem,Thas a unique ﬁxed pointρinM₀₂and

"₂Tⁿµ ρ →0 exponentially fast for anyµ∈M₀₂.

We call a random variable X with distribution ρ also a ﬁxed point of T [compare (12)].

The representation of the limiting operatorTcan be simpliﬁed. We have Tµ= U^α−1/2V^α−1Z¹+γ + 1−V^α−1Z²+γ −γ

(18)

withU,V,Z¹,Z²being independent,U,Vuniformly distributed on01, andZ¹,Z²= µ. The proof follows from an elementary calculation observ- ing that the sets of the indicator function in (13) are disjoint and√

Uhas the density 2xfor 0 ≤x≤ 1. By an additional translation it follows thatX is a ﬁxed point ofTinM₀₂if and only ifX=X+γis a ﬁxed point of

Tµ = U^α−1/2V^α−1Z¹+ 1−V^α−1Z² (19)

inM_γ₂= µ∈M¹¹¹ Ɛµ=γ Varµ <∞.

Theorem 2.2 (Limit theorem for partial match query in two-dimensional quadtrees). The normalized number of nodes traversed during a partial

(7)

match query in a random two-dimensional quadtree X_n converges w.r.t. "₂ to the unique ﬁxed pointXinM₀₂ of the limiting operatorT, that is,

"₂X_n X →0

The translated limiting distributionX=X+γis the unique solution inM_γ₂ of the limiting equation

Z= U^α−1/2V^α−1Z¹+ 1−V^α−1Z² (20)

with U, V, Z¹, Z² independent,U, Vuniformly distributed on01, and Z¹,Z²= Z.

Proof. We use random variablesX^kn = X_n,X^k= Xfork=1 4 such that X^kn X^k are optimal couplings of X_n, X; that is,"²₂X_n X = ƐX^kn −X^k². Furthermore, let Iⁿ be conditionally given U V = w multinomialMn−1wdistributed. Then by (6) it holdsIⁿ/n→ U V in probability. Furthermore, let U, V and Y be independent and uniformly distributed on01, and assume thatX¹n _n∈ X¹ X⁴n _n∈ X⁴, Iⁿ U V,Yare independent.

For the estimate of "₂X_n Xwe use the L₂-distance of the special representation of X_n and X given by (11) and (13), respectively. Then using the independence properties andƐX^k=ƐX^kn =0 we obtain

"²₂X_n X

≤Ɛ





1Y<U

Iⁿ₁ n

α−1 X¹

Iⁿ₁ +γ − UV^α−1X¹+γ +

Iⁿ₂ n

α−1 X²

Iⁿ₂ +γ − U1−V^α−1X²+γ +1_Y≥U

Iⁿ₃ n

α−1 X³

Iⁿ₃ +γ − 1−UV^α−1X³+γ +

Iⁿ₄ n

α−1 X⁴

Iⁿ₄ +γ − 1−U1−V^α−1X⁴+γ +o1





2



=Ɛ

1_Y<U Iⁿ₁

n

α−1 X¹

Iⁿ₁ +γ − UV^α−1X¹+γ

2

+ Iⁿ₂

n

α−1 X²

Iⁿ₂ +γ − U1−V^α−1X²+γ 2

+1_Y≥U Iⁿ₃

n

α−1 X³

Iⁿ₃ +γ − 1−UV^α−1X³+γ

2

+ Iⁿ₄

n

α−1 X⁴

Iⁿ₄ +γ − 1−U1−V^α−1X⁴+γ

2

+o1

(21)

(8)

where the mixed terms are o1 using independence and ƐIⁿ₁ /n^α−1− UV^α−1 =o1[analogously forIⁿ₂ ,Iⁿ₃ ,Iⁿ₄ ]. The four occurring summands in (21) are identically distributed. This implies

"²₂XnX ≤4Ɛ

1Y<U

Iⁿ₁ n

α−1 X¹

Iⁿ₁ +γ −UV^α−1X¹+γ

2 +o1

=4Ɛ

1_Y<U Iⁿ₁

n

α−1 X¹

Iⁿ₁ −X¹ +

Iⁿ₁ n

α−1

−UV^α−1 X¹+γ

2 +o1

=4Ɛ

1_Y<U Iⁿ₁

n

2α−2 X¹

Iⁿ₁ −X¹

2

+4Ɛ

1Y<U

Iⁿ₁ n

α−1

−UV^α−1

2

X¹+γ²

+8Ɛ

1_Y<U Iⁿ₁

n

α−1 X¹

Iⁿ₁ −X¹ Iⁿ₁ n

α−1

−UV^α−1 X¹+γ +o1

(22)

As a consequence of (6) we obtain Ɛ

Iⁿ₁ n

_α−1

− UV^α−1 ₂

→0 asn→ ∞ (23)

Therefore the second summand of (22) converges to 0. With the Cauchy–Schwarz inequality and (23) the third term in its absolute value is bounded from above by

2Ɛ

Iⁿ₁ n

_α−1

− UV^α−1 ₂

X¹+γ² _1/2

Ɛ

X¹

Iⁿ₁ −X¹ ₂_1/2

=o1Ɛ

X¹

Iⁿ₁ −X¹ ₂_1/2

≤o1Ɛ

X¹

Iⁿ₁ −X¹ ₂

+o1

For the last inequality observe that if the expectation is less than 1 then both sides areo1. Therefore, from (22) we derive witha_n="²₂X_n X,

a_n≤4Ɛ

1_Y<U Iⁿ₁

n _2α−2

+o1

X¹

Iⁿ₁ −X¹ ₂

+o1

=4ⁿ⁻¹

i=0

Ɛ

1_Iⁿ

1 =i1_Y<U i

n _2α−2

+o1

X¹_i −X¹ ₂

+o1

=4ⁿ⁻¹

i=0

Ɛ

1_Iⁿ

1 =i1_Y<U i

n _2α−2

+o1

"²₂X_i X +o1 (24)

(9)

Thus from (7), a_n≤4ⁿ⁻¹

i=0

Ɛ

1_Iⁿ

1 =i1_Y<U i

n _2α−2

+o1

1≤i≤n−1sup a_i+o1

= 4Ɛ

1_Y<UUV^2α−2

+o1

= ξ²+o1 sup

1≤i≤n−1a_i+o1 (25)

where ξis deﬁned in (16); in particularξ² <1. Thusa_n_n∈ is bounded. We denote a = lim sup_n→∞a_n. Now we can conclude as in R¨osler (1991). For a givenε >0 there existn₀∈andξ⁺<1 witha_n≤a+εandξ²+o1 ≤ξ⁺<1 for alln≥n₀. Then from (24) it follows that

a_n≤4ⁿ⁰⁻¹

i=0

Ɛ

1_Iⁿ

1 =i1_Y<U i

n _2α−2

+o1

a_i

+4ⁿ⁻¹

i=n₀

Ɛ

1_Iⁿ

1 =i1_Y<U i

n _2α−2

+o1

a+ε +o1

≤ξ⁺a+ε +o1 (26)

Now,n→ ∞yields a≤ξ⁺a+ε, which impliesa=0. ✷

Convergence in the "₂-metric implies weak convergence and convergence of the second moments. For this reason the constant β in (2) can be red- erived from the limiting equation (20). We will give this argument in detail in the general d-dimensional case with 1 ≤ s ≤ d−1 components speciﬁed in Section 3. In this case the ﬁrst-order expansion for the variance was not known up to now.

We cannot solve the limiting equation (20) explicitly. From a simulation we get an estimate for the density of the translated limiting distribution in (20) (see Fig. 1).

The plot was produced by iterating the translated limiting operatorTten times starting withδ_γ, the Diracmeasure inγ. We produced 15,000 samples of T¹⁰δ_γand applied a standard smoothing routine of S-Plus on the histogram of the data.

3. The multidimensional quadtree. We consider a partial match query for ad-dimensional random quadtree with 1≤s≤d−1 components speciﬁed.

By symmetry we assume w.l.g. these are the ﬁrst s coordinates. Then after comparing the search pattern with an internal node of the quadtree we have to inspect 2^d−ssubtrees at this node for the subsequent search. A nodew∈ 01^d partitions the quadrant it belongs to into 2^dsubquadrants. Let the index of a subquadrant be given by

d i=1

2^d−i1_w_i_≤p_i w= w_i p= p_i

(10)

Fig. 1. Estimated density of the translated limiting distribution.

ifpis a point in this subquadrant. A keypis inserted in thekth subtree if it belongs to thekth subquadrant. For the binary representation of 0≤k≤2^d−1,

k=^d

i=1

a_i2^d−i a_i =a_ik ∈ 01 let

Ek = i∈ 1 d a_ik =1 Nk = i∈ 1 d a_ik =0

Then equivalently,pis inserted in thekth subtree of a nodewifp_i≥w_i for alli∈Ekandp_i< w_i for alli∈Nk.

The volumes of the quadrants generated by the rootu∈ 01^d of the tree are given by

u_k=

i∈Nk

u_i

i∈Ek

1−u_i

here u = u₀ u₂^d₋₁ denotes the vector of the generated volumes.

The vectorIⁿof the cardinalities of the subtrees of a randomd-dimensional quadtree with n nodes is conditionally given the root U multinomial distributed:

^Iⁿ^U=u=Mn−1u

whereU, the ﬁrst key to be inserted, is uniformly distributed on01^d. As in the two-dimensional case, convergence in probability of Iⁿ/nfollows:

Iⁿ n

−→ U = U ₀ U₂d−1 (27)

(11)

We denote the sspecified components of the search pattern by Y= Y₁ Y_s. The variables Y_i are independent, uniformly distributed on 01, and independent of the random quadtree. In order to give a concise form of the recursive distributional equation for the numberC_nof nodes traversed during a partial match query in a randomd-dimensional quadtree withnnodes and 1≤s≤d−1 components specified we define forj₁ j_s∈ 01,

1_j₁_j_sU Y =

1≤i≤s j_i=0

1_Y_i_<U_i

1≤i≤s j_i=1

1_Y_i_≥U_i

Then analogously to (8) it holds that C_n=

j₁j_d=01

1_j₁_j_sU YC^j

Iⁿ_j +1 (28)

Here and in the followingj₁ j_dis the binary representation ofj, that is, j=^d

i=1

j_i2^d−i (29)

In (28) the variablesY= Y₁ Y_s,U= U₁ U_d,C⁰_i C²_i ^d⁻¹ are independent, Y and U are uniformly distributed on 01^s and 01^d, respectively, C^k_i = C_i and Iⁿ is conditionally given U = u multinomial Mn−1u distributed.

For d-dimensional quadtrees a ﬁrst-order expansion is only known for the mean of C_n. In Flajolet, Gonnet, Puech and Robson (1993) the asymptotic expansion

ƐC_n∼γ_{s d}n^α−1 (30)

is proved. Here γ_{s d} is a positive constant which can in principle be approximated numerically andα∈ 12is the unique solution of the indicial equation

α^d−sα+1^s=2^d (31)

An asymptoticexpansion for the variance ofC_n was unknown up to now. We will prove later that

VarC_n ∼β_{s d}n^2α−2

whereβ_{s d}>0 has an explicit representation in terms ofαandγ_{s d}. The normalized number of traversed nodes,

X_n= C_n−ƐC_n n^α−1 withαgiven by (31), satisﬁes the modiﬁed recursion,

X_n=

j1jd=01

1_j₁_j_sU Y Iⁿ_j

n _α−1

X^j_In j +γ_sd

−γ_{s d}+o1 (32)

(12)

Note that here and in the following we use (29) in our notation. We deﬁne U_j₁_j_d=

1≤i≤d

ji=0

U_i

1≤i≤d

ji=1

1−U_i

for j₁ j_d ∈ 01. By the convergence of the coefﬁcients in (27) it seems reasonable that a distributional limit Xof X_n is a solution of the limiting equation,

X=

j1jd=01

1_j₁_j_sU YU^α−1_j₁_j_dX^j+γ_sd

−γ_{s d} (33)

Therefore, we deﬁne the limiting operator

T:M¹¹¹ →M¹¹¹ Tµ=

j₁j_d=01

1_j₁_j_sY UU^α−1_j₁_j_dZ^j+γ_sd

−γ_{s d} (34)

where Y, U, Z⁰ Z²^d⁻¹ are independent, Y and U are uniformly distributed on01^sand01^d, respectively, andZ^j= µforj=0 2^d−1.

Lemma 3.1. The limiting operatorT:M₀₂ → M₀₂ is a contraction w.r.t.

"₂,

"₂Tµ Tν ≤ξ "₂µ ν for allµ ν∈M₀₂ (35)

ξ= 1

α^sα−1/2^d−s <1 (36)

Proof. Obviously VarTµ<∞. Since the summands in (34) are identically distributed we derive

ƐTµ =2^dƐ s

i=1

1_Y_i_<U_i^d

i=1

U^α−1_i

γ_{s d}−γ_{s d}

=2^dƐ

1_Y₁_<U₁U^α−1₁ _s Ɛ

U^α−1₁ _d−s

γ_{s d}−γ_{s d}

=2^dα+1^−sα^−d−sγ_{s d}−γ_{s d}

=2^d2^−dγ_{s d}−γ_{s d}=0

where the indicial equation (31) is used. So T:M₀₂ →M₀₂ is well deﬁned.

To prove contractivity let µ, ν ∈ M₀₂ and let W^k Z^k, Y be independent, Y, U uniformly distributed on 01^s and 01^d, respectively, and let W^k Z^k be optimal couplings ofµ ν, that is W^k = µ, Z^k = ν and

(13)

"²₂µ ν =ƐW^k−Z^k² fork=0 2^d−1. Then using the independence properties andƐW^k=ƐZ^k=0 we conclude similarly to (17),

"²₂Tµ Tν ≤Ɛ

j1jd=01

1_j₁_j_sY UU^2α−2_j₁_j_dZ^j−W^j²

=2^dƐ s

i=1

1_Y_i_<U_iU^2α−2_i ^d

i=s+1

U^2α−2_i

"²₂µ ν

=2^d2α^−s2α−1^−d−s"²₂µ ν

= 1

α^sα−1/2^d−s"²₂µ ν

This implies assertion (35). Sinceα∈ 12we have 2α > α+1 and 2α−1> α which together with (31) yields

ξ=

α^sα−1/2^d−s_−1/2

=

2^d 2α^s2α−1^d−s

_1/2

< 2^d

α+1^sα^d−s =1 ✷ As in the two-dimensional case, a simpliﬁcation of the limiting operatorT is possible [cf. (18), (19)]. We denote

U_j_s+1_j_d =

s+1≤i≤d

ji=0

U_i

s+1≤i≤d

ji=1

1−U_i

forj_s+1 j_d∈ 01. Then Tµ=

s i=1

U^α−1/2_i

js+1jd=01

U^α−1_j_s+1_j_dX^j^s+1^j^d+γ_sd

−γ_{s d}

whereU X^j^s+1^j^d:j_s+1 j_d=01is an independent family,U is uniformly distributed on 01^d, and X^j^s+1^j^d = µ for all j_s+1 j_d =01.

With an additional translation it follows thatXis a ﬁxed point ofTinM₀₂if and only ifX=X+γ_{s d}is a ﬁxed point of the operatorTinM_γ_{s d}₂given by

Tµ = s

i=1

U^α−1/2_i

js+1jd=01

U^α−1_j_s+1_j_dX^j^s+1^j^d (37)

In (37) againUX^j^s+1^j^dj_s+1 j_d=01is independent,Uuniformly distributed on01^d, andX^j^s+1^j^d= µ∈M_γ_{s d}₂.

Theorem 3.2 (Limit theorem for partial match query in quadtrees). The normalized number X_n of nodes traversed during a partial match query in

(14)

a random d-dimensional quadtree with 1 ≤ s ≤ d−1 components speciﬁed converges w.r.t."₂to the unique ﬁxed-pointXinM₀₂ of the limiting operator T given in34, that is,

"₂X_n X →0

The translated limiting distributionX=X+γ_{s d}is the unique ﬁxed point in M_γ_{s d}₂ of the operator

Tµ = s

i=1

U^α−1/2_i

js+1jd=01

U^α−1_j_s+1_j_dX^j^s+1^j^d

given in37.

Proof. Using (27), (32) and (35) the proof of the two-dimensional case can be extended to the multidimensional case. Witha_n="²₂X_n Xanalogously to (21)–(24) the recursion,

a_n≤2^dⁿ⁻¹

j=0

Ɛ _s

i=1

1_Y_i_<U_i1_Iⁿ

1 =jj/n^2α−2+o1

a_i+o1 (38)

can be derived. Then as in (25), a_n≤2^d

Ɛ

_s

i=1

1_Y_i_<U_iU^2α−2_i ^d

i=s+1

U^2α−2_i

+o1

= ξ²+o1 sup

1≤i≤n−1a_i+o1 (39)

with ξ given in (36). This implies that a_n_n∈N is bounded. The convergence then follows as in (26). ✷

Convergence in the "₂-metric implies convergence of the second moments [cf. (15)]. Therefore a first-order asymptotic of the variance of the number of traversed nodes C_n can be derived in dimension d with 1 ≤ s ≤ d−1 components in the search pattern specified. We use the simplified form of the fixed-point equation (37).

Corollary 3.3(Variance of partial match query in quadtrees). The vari- ance of the limiting distribution for the normalized number of nodes tra- versed during a partial match query in a random d-dimensional quadtree with1≤s≤d−1components speciﬁed is given by

β_{s d}=

2α−1Bα α +1^d−s−1 α^sα−1/2^d−s−1 −1

γ_{s d}² The variance of the numberC_nof nodes traversed satisﬁes

VarC_n ∼β_{s d}n^2α−2

The constants α and γ_{s d} are given by30 and 31, and B··denotes the Eulerian beta integral.

(15)

Proof. Note that

VarX =VarX = ƐX²−γ²_{s d} (40)

whereXis the ﬁxed point of the operator in (37). From (37) we obtain ƐX²=Ɛ

s i=1

U^α−1_i

js+1jd=01

ks+1kd=01

U^α−1_j_s+1_j_dU^α−1_k_s+1_k_dX^j^s+1^j^dX^k^s+1^k^d

=α^−s

∀iji=ki

ƐU^2α−2_j_s+1_j_dƐX²+

∃iji=ki

ƐU^α−1_j_s+1_j_dU^α−1_k_s+1_k_d ƐX ²

(41)

The expectations of the occurringU’s can be calculated explicitly:

ƐU^2α−2_j_s+1_j_d = 2α−1^−d−s and forj_s+1 j_dk_s+1 k_dand

h=cards+1≤i≤d:j_i=k_i ƐU^α−1_j_s+1_j_dU^α−1_k_s+1_k_d

=

ƐU^2α−2₁ _d−s−h

ƐU₁1−U₁^α−1_h

= 2α−1^−d−s−hBα α^h (42)

With (42) in (41) we derive ƐX²=α^−s

2^d−s2α−1^−d−sƐX²

+^d−s

h=1

2^d−s

d−s h

2α−1^−d−s−hBα α^h

γ_{s d}²

Using the binomial formula it follows,

ƐX²

1−α^−s 2 2α−1

_d−s

=α^−s2^d−s

Bα α + 1 2α−1

_d−s

− 1

2α−1 _d−s

γ²_{s d} A simpliﬁcation leads to

ƐX²= 2α−1Bα α +1^d−s−1 α^sα−1/2^d−s−1 γ_{s d}² Together with (40) this implies the ﬁrst assertion.

By convergence of the second moments ofX_n we ﬁnally conclude VarC_n =Varn^α−1X_n =VarX_nn^2α−2= VarX +o1n^2α−2

∼β_{s d}n^2α−2 ✷

(16)

The weak convergence in Theorem 2.2 also leads to results on exponential moments using the tools of R¨osler (1991, 1992).

Theorem 3.4(Convergence of Laplace transforms). The limitXof the nor- malized numberX_nof nodes traversed during a partial match query in a ran- dom d-dimensional quadtree with 1 ≤ s ≤ d−1 components speciﬁed has a ﬁnite Laplace transform in some neighborhood of0,

ƐexpλX<∞ for all λ∈ −λ₀ λ₀ For

0< s

d <ln4/3

ln5/3 =0563· · · (43)

existence and convergence of the Laplace transform holds on the whole real line ƐexpλX_n −→ƐexpλX for allλ∈

Proof. Note that the recursions forX_nandXgiven in (32) and (33) can be written in the form

X_n=

j1jd=01

n

α−1

X^j

Iⁿ_j +C_nU Y Iⁿ (44)

and

X=

j1jd=01

1_j₁_j_sU YU^α−1_j₁_j_dX^j

+CU Y (45)

with

C_nU Y Iⁿ =

j1jd=01

n

α−1

γ_{s d} (46)

−γ_{s d}+o1 and

CU Y =

j1jd=01

1_j₁_j_sU YU^α−1_j₁_j_d

γ_{s d}−γ_{s d}

The distributions and (in-)dependencies are as in (32) and (33). The recursion (45) satisﬁes the conditions of Theorem 6 in R¨osler (1992) with

T_j=1_j₁_j_sU Y Iⁿ_j

n

α−1

This implies the existence of a neighborhood−λ₀ λ₀of zero whereXhas a ﬁnite Laplace transform.

For the second assertion note that

ƐC_nU Y Iⁿ =0 for alln∈ (47)

(17)

since the variablesX_n andX^j

Iⁿ_j in (44) are centered. Deﬁne V_n=

j1jd=01

n

2α−2

−1

It is ²_j=0^d⁻¹Iⁿ_j = n−1. Condition (43) and the indicial equation (31) imply α≥3/2. Thus

V_n<0 for all n∈ (48)

The convergence of the coefﬁcients in (27) implies ƐV_n−→Ɛ

j1jd=01

1_j₁_j_sU YU^2α−2_j₁_j_d

−1=ξ−1<0

withξ given in (36). This yields

supn∈ƐV_n<0 (49)

From the representation (46) ofC_nU Y Iⁿit is obvious that supn∈C_n_∞<∞

(50)

The properties (47), (48), (49) and (50) are sufﬁcient to obtain ƐexpλX_n −→ƐexpλX

(51)

for allλ∈as in Lemma 4.1 and Theorem 4.2 in R¨osler (1991). ✷

In particular, Theorem 3.4 implies exponential tails and the existence of all moments of the limiting distributions. Under condition (43) additionally convergence of all moments follows and a bound for large deviations of the (unscaled) cost C_n can be established: for allλ > 0 there exists ac_λ > 0 so that for any sequencea_nof positive, real numbers holds

C_n≥a_n ≤c_λ exp

−λ a_n n^α−1

(52)

The existence of densities of the limiting distributions with respect to the Lebesgue measure can be deduced following the scheme of Theorem 2.1 in Tan and Hadjicostas (1995) for the limiting distribution of the running time of the Quicksort algorithm. The translated limit distributions [given by the operators (37)] are supported by 0∞. The densities are positive almost everywhere on0∞[cf. Theorem 2.4 in Tan and Hadjicostas (1995)].

(18)

Acknowledgment. The authors thank P. Flajolet for introducing them to the problem of partial match query and for his suggestion to analyze it by the contraction method.

REFERENCES

Bickel, P. J. andFreedman, P. A. (1981). Some asymptotictheory for the bootstrap.Ann. Statist.

91196–1217.

Dobrow, R. P. andFill, J. A. (1999). Total path length for random recursive trees. Combin.

Probab. Comput.8317–333.

Finkel, R. andBentley, J. (1974). Quad trees, a data structure for retrieval on composite keys.

Acta Inform.41–9.

Flajolet, P.,Gonnet, G.,Puech, C. andRobson, J. (1993). Analyticvariations on quadtrees.

Algorithmica10473–500.

Flajolet, P. andPuech, C. (1986). Partial match retrieval of multidimensional data.J. ACM33 371–407.

Knuth, D. E. (1998).The Art of Computer Programming: Sorting and Searching 3, 2nd ed.

Addison-Wesley, Reading, MA.

Mahmoud, H. (1992). Evolution of Random Search Trees. Wiley, New York.

Mahmoud, H.,Modarres, R. andSmythe, R. (1995). Analysis of quickselect: an algorithm for order statistics.RAIRO Inform. Th´eor. Appl.28299–310.

Mart´ınez, C.,Panholzer, A. andProdinger, H. (2000). Partial match queries in relaxed multidimensional search trees.Algorithmica29181–204.

Neininger, R. (2000). Asymptoticdistributions for partial match queries ink−dtrees.Random Structures Algorithms. To appear.

Neininger, R. andR ¨uschendorf, L. (1999). On the internal path length ofd-dimensional quad trees.Random Structures Algorithms1525–41.

Rachev, S. T. (1991). Probability Metrics and the Stability of Stochastic Models. Wiley, New york.

Rachev, S. T. and R ¨uschendorf, L. (1995). Probability metrics and recursive algorithms.Adv.

in Appl. Probab.27770–799.

R ¨osler, U. (1991). A limit theorem for “quicksort.”RAIRO Inform. Th´eor. Appl.2585–100.

R ¨osler, U. (1992). A ﬁxed point theorem for distributions.Stochastic Process. Appl.42195–214.

R ¨osler, U. (2000). On the analysis of stochastic divide and conquer algorithms.Algorithmica29 238–261.

R ¨osler, U. andR ¨uschendorf, L. (2000). The contraction method for recursive algorithms.Algo- rithmica293–33.

Samet, H. (1990a). Applications of Spatial Data Structures: Computer Graphics, Image Process- ing, and GIS. Addison-Wesley, Reading, MA.

Samet, H. (1990b). The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA.

Schachinger, W. (2000). Limiting distributions for the costs of partial match retrievals in multidimensional tries.Random Structures Algorithms17428–460.

Tan, K. H. andHadjicostas, P. (1995). Some properties of a limiting distribution in quicksort.

Statist. Probab. Lett.2587–94.

Institut f ¨ur Mathematische Stochastik Universit¨at Freiburg

Eckerstr. 1 79104 Freiburg Germany

E-mail:rn@stochastik.uni-freiburg.de ruschen@stochastik.uni-freiburg.de