• Keine Ergebnisse gefunden

Also a first-order asymptoticexpansion for the variance of the cost is derived and results on exponential moments are given

N/A
N/A
Protected

Academic year: 2022

Aktie "Also a first-order asymptoticexpansion for the variance of the cost is derived and results on exponential moments are given"

Copied!
18
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

2001, Vol. 11, No. 2, 452–469

LIMIT LAWS FOR PARTIAL MATCH QUERIES IN QUADTREES

By Ralph Neininger and Ludger R ¨uschendorf Universit¨at Freiburg

It is proved that in an idealized uniform probabilisticmodel the cost of a partial match query in a multidimensional quadtree after normalization converges in distribution. The limiting distribution is given as a fixed point of a random affine operator. Also a first-order asymptoticexpansion for the variance of the cost is derived and results on exponential moments are given. The analysis is based on the contraction method.

1. Introduction. A partial match query is one of several types of queries in a file which maintains the organization of multidimensional data. Databases for multidimensional data are of special interest for applications in geogra- phical information systems, computer graphics and computational geometry.

Structures maintaining multiattribute keys should support the usual dictio- nary operations as well as someassociative queries. Examples of such associa- tive queries are nearest neighbor queries, partial match queries and convex or orthogonal range queries. Relevant data structures which support associa- tive queries are considered in the books of Knuth (1998) and Samet (1990a, b). These structures can be divided into comparison based algorithms and methods based ondigital techniques. The digital techniques use binary repre- sentations of the keys. Examples are triesanddigital search trees. Examples of comparison based structures are quadtrees and multidimensional binary search trees (K-d-trees). These algorithms work with comparisons of whole keys instead of binary representations. For an analysis of the performance of basicparameters for these structures see Mahmoud (1992).

In this paper we give an asymptoticprobabilisticanalysis of the cost of partial match queries in quadtrees. We assume the data to belong to some d-dimensional domain D = D1 × · · · ×Dd, which using binary encodings we can assimilate into the unit cube 01d. For a partial match query a query q = q1 qd is given where qi ∈ 01 ∪ ∗ for 1 ≤ i ≤ d. Here

∗ denotes that this component is left unspecified. Then all data in the file have to be retrieved, which match the query q. This means to report all the keys which are identical toqin all the components whereqis specified, that is, the components withqi = ∗. For the probabilisticanalysis of partial match retrieval we assume the uniform probabilistic model following Flajolet and Puech (1986). The uniform probabilistic model assumes all components in the data and the specified components in the query to be independent and

Received September 1999; revised June 2000.

AMS2000subject classifications. Primary 68Q25, 60F05; secondary 68P10.

Key words and phrases. Quadtree, partial match query, contraction method, multidimensional data structure, analysis of algorithms.

452

(2)

uniformly distributed on01. For comparison based algorithms this is equiv- alent to the more general model where the components are assumed to be drawn independently from any continuous distribution. However, we assume throughout this work the idealization that queries in subtrees are indepen- dent.

The quadtree structure is due to Finkel and Bentley (1974). It extends the classical idea of binary search trees to multidimensional data. For the construction of the quadtree we refer to Mahmoud (1992). Essentially a data point partitions the search space by the hyperplanes perpendicular to the axes. Used recursively this principle leads to a decomposition of the search space into quadrants. The quadtree corresponds to this partitioning. For a partial match query in a quadtree we have to start at the root of the tree.

According to the comparisons of the specified components of the query with the corresponding components of the root some of the subtrees of the root have to be considered recursively for the further search. The cost of a partial match query in a quadtree is measured by the number of nodes traversed during this search. We denote this cost in a quadtree containingnnodes byCn.

The cost has already been studied in the uniform probabilistic model. In dimension d=2 and withs =1 component specified, the first-order asymp- toticexpansions for the mean and variance are known. Flajolet, Gonnet, Puech and Robson (1993) derived

ƐCn∼γnα−1 with

α=

√17−1

2 and γ= 2α

23α (1)

Mart´ınez, Panholzer and Prodinger (2000) recently found VarCn ∼βn2α−2

with

β= 2α−12α

3αα−14α−22α 46α (2)

In arbitrary dimension d with 1 ≤ s ≤ d−1 components specified Flajolet, Gonnet, Puech and Robson (1993) derived

ƐCn∼γs dnα−1 (3)

whereγs dis a (unknown) positive constant which can be approximated nume- rically andα∈ 12is the unique solution of theindicial equation,

αd−sα+1s=2d (4)

An expansion for the variance of Cn was not known up to now.

In this paper we give limit laws for Cn in any dimension and derive the first-order asymptoticexpansion of the variance of Cn, and results on expo- nential moments. The normalized cost

Xn= Cn−ƐCn nα−1 (5)

(3)

converges weakly to a random variable which is characterized as the fixed point of a random affine operator. For the proof we use the contraction method.

This method was introduced by R¨osler (1991) for the analysis of Quicksort. The contraction method has been further developed independently in R¨osler (1992) and Rachev and R ¨uschendorf (1995). For a recent survey of this method see also R¨osler and R ¨uschendorf (2000).

Limit laws for the cost of partial match queries in the uniform probabilistic model for the K−d tries and some variants of K−d trees were recently derived in Schachinger (2000) and Neininger (2000), respectively. In all these data structures the mean and standard deviation for the cost of a partial match query are known to be of the same order of magnitude. Therefore we do not need the second-order term in the expansion of the mean of Cn in order to derive a limiting operator. The second-order term for the mean has turned out to be crucial for the problem of the internal path length of related random trees [see Dobrow and Fill (1999), R¨osler (2000) and Neininger and R ¨uschendorf (1999)]. From this point of view the problem of partial match query bears some similarity with the running time of the FIND-algorithm in the model of Mahmoud, Modarres and Smythe (1995). Nevertheless, for the FIND problem it is easier to derive information on the limit distribution from the fixed-point equation owing to the purely one-sided character of the FIND-algorithm.

2. Standard quadtrees in dimensiond=2. We denote byW= U V the first key to be inserted, which is stored in the root of the random two- dimensional quadtree. The variablesUandVare independent and uniformly distributed on 01. The rootW=w= u v partitions the unit square into four quadrants with volumes given by w = uv u1−v1−uv1−u 1−v. We denote byIn the vector of the cardinalities of the subtrees of the root of a random quadtree withnnodes. Then conditionally givenW=wthe vectorIn is multinomialMn−1wdistributed

InW=w=Mn−1w The weak law of large numbers for In then implies

In n

−→ W = UV U1 −V1−UV1−U1−V (6)

withU Vuniformly distributed on012.

This implies L1-convergence of bounded continuous functionals of In/n, in particular,

Ɛ Ink

n 2α−2

−→ƐUV2α−2= 1 2α−12 Ɛ

1Y<U

In1 n

2α−2

−→Ɛ

1Y<UUV2α−2

= 1 2α2α−1 (7)

ifYis uniformly distributed on01and independent of In andU.

(4)

For a partial match query in dimension 2 one component of the search pat- tern is specified. W.l.g. we can assume the first component is specified, so the pattern is of the formS∗. For the distributional analysis of partial match query let Cn denote the number of nodes traversed in the quadtree during a partial match retrieval. We assume that the first component of the search pattern Y is uniformly distributed on01 and independent of the random quadtree according to the uniform probabilistic model. ThenC1=1; we define C0=0. Conditionally givenIn the subtrees are mutually independent and distributed as quadtrees. For this reason the number of traversed nodes Cn satisfies the distributional recursive equation

Cn= 1Y<U

C1

In1 +C2

In2

+1Y≥U

C3

In3 +C4

In4

+1 (8)

Here Y U V, and the sequences C1i C4i are independent, Y U V are uniformly distributed on 01, Cki = Ci for k =1 4 i∈ 0, and Inis multinomialMn−1wdistributed givenU V =w. Related inde- pendence properties are used throughout the paper in recursive equations of a similar form without stating them explicitly in each case.

Recall the first-order expansions for the mean and variance ofCn. Flajolet, Gonnet, Puech and Robson (1993) derived

ƐCn∼γnα−1 (9)

withαandγgiven in (1).

In Mart´ınez, Panholzer and Prodinger (2000) it is shown that VarCn ∼βn2α−2

(10)

holds withβas in (2).

Therefore a normalized versionXnof Cn is given by Xn= Cn−ƐCn

nα−1 The modified recursion forXn is given by

Xn= 1Y<U 2

k=1

Ink n

α−1 XkIn

k +γ+o1

+1Y≥U 4

k=3

Ink n

α−1 XkIn

k +γ+o1 −γ+o1

(11)

This recursion and (6) suggest that a limitXofXnis a solution of the limiting equation,

X=1Y<U

UVα−1X1+γ + U1−Vα−1X2+γ +1Y≥U

1−UVα−1X3+γ + 1−U1−Vα−1X4

−γ (12)

(5)

Here Y, U, V, X1 X4 are independent, Y, U, V are uniformly dis- tributed on01, andXk= Xfork=1 4.

We define

M02= µ∈M111 Ɛµ=0 Varµ <∞

where Ɛµ and Varµ are defined, respectively, as the expectation and vari- ance of a corresponding random variable and M111 denotes the space of probability measures on the real line. We define the random affine operator corresponding to (12) by

TM111 →M111 Tµ= 1Y<U

UVα−1Z1+γ + U1−Vα−1Z2+γ +1Y≥U

1−UVα−1Z3

+ 1−U1−Vα−1Z4

−γ (13)

where Y U V, Z1 Z4 are independent, Y U V are uniformly dis- tributed on01andZk= µfork=1 4.

Our aim is to show thatTis thelimiting operatorof the recursive sequence Xnin (11). We supplyM02⊂M111with the minimal"2-metric,

"2µ ν =infƐX−Y21/2X= µ Y= ν (14)

For random variables X, Y we use synonymously "2X Y = "2XY. ThenM02 "2is a complete metric space and"2µn µ →0 is equivalent to

µn µ and

x2nx →

x2dµx (15)

The infimum in (14) is attained. Random variables X, Y with X = µ, Y= ν and "2µ ν = ƐX−Y21/2 are called optimal couplings of µ ν.

See Rachev (1991) and Bickel and Freedman (1981) for basic facts on the minimal"2-metric.

Lemma 2.1. TM02→M02, withTgiven in13is a contraction w.r.t."2,

"2Tµ Tν ≤ξ"2µ ν for allµ ν∈M02 ξ= 2

19−3√

17=0776· · · (16)

Proof. Obviously Var Tµ <∞. Furthermore, ƐTµ = 0 follows from Ɛ1Y<UUα−1Vα−1 =1/4. SoT is a well-defined mappingT M02 → M02. To prove contractivity letµ,ν∈M02and letWk Zk,Y,U,Vbe indepen- dent, andY,U,Vuniformly distributed on01. LetWk Zkbe optimal

(6)

couplings ofµ ν; that is,Wk= µ,Zk= νand"22µ ν =ƐWk−Zk2 fork=1 4. Then using the independence properties andƐWk=ƐZk=0,

"22Tµ Tν

≤Ɛ1Y<U

UVα−1W1−Z1 + U1−Vα−1W2−Z2 +1Y≥U

1−UVα−1W3−Z3

+ 1−U1−Vα−1W4−Z4 2

1Y<U

UV2α−2W1−Z12+ U1−V2α−2W2−Z22 +1Y≥U

1−UV2α−2W3−Z32

+ 1−U1−V2α−2W4−Z42

=4Ɛ

1Y<UUV2α−2W1−Z12

=4Ɛ

1Y<UUV2α−2

"22µ ν (17)

Now, from Ɛ

1Y<UUV2α−2

= 1

2α2α−1 = 1

19−3√ 17 the assertion follows.

By Banach’s fixed point theorem,Thas a unique fixed pointρinM02and

"2Tnµ ρ →0 exponentially fast for anyµ∈M02.

We call a random variable X with distribution ρ also a fixed point of T [compare (12)].

The representation of the limiting operatorTcan be simplified. We have Tµ= Uα−1/2Vα−1Z1+γ + 1−Vα−1Z2+γ −γ

(18)

withU,V,Z1,Z2being independent,U,Vuniformly distributed on01, andZ1,Z2= µ. The proof follows from an elementary calculation observ- ing that the sets of the indicator function in (13) are disjoint and√

Uhas the density 2xfor 0 ≤x≤ 1. By an additional translation it follows thatX is a fixed point ofTinM02if and only ifX=X+γis a fixed point of

Tµ = Uα−1/2Vα−1Z1+ 1−Vα−1Z2 (19)

inMγ2= µ∈M111 Ɛµ=γ Varµ <∞.

Theorem 2.2 (Limit theorem for partial match query in two-dimensional quadtrees). The normalized number of nodes traversed during a partial

(7)

match query in a random two-dimensional quadtree Xn converges w.r.t. "2 to the unique fixed pointXinM02 of the limiting operatorT, that is,

"2Xn X →0

The translated limiting distributionX=X+γis the unique solution inMγ2 of the limiting equation

Z= Uα−1/2Vα−1Z1+ 1−Vα−1Z2 (20)

with U, V, Z1, Z2 independent,U, Vuniformly distributed on01, and Z1,Z2= Z.

Proof. We use random variablesXkn = Xn,Xk= Xfork=1 4 such that Xkn Xk are optimal couplings of Xn, X; that is,"22Xn X = ƐXkn −Xk2. Furthermore, let In be conditionally given U V = w multinomialMn−1wdistributed. Then by (6) it holdsIn/n→ U V in probability. Furthermore, let U, V and Y be independent and uniformly distributed on01, and assume thatX1n n∈ X1 X4n n∈ X4, In U V,Yare independent.

For the estimate of "2Xn Xwe use the L2-distance of the special repre- sentation of Xn and X given by (11) and (13), respectively. Then using the independence properties andƐXk=ƐXkn =0 we obtain

"22Xn X

Ɛ

1Y<U

In1 n

α−1 X1

In1 +γ − UVα−1X1+γ +

In2 n

α−1 X2

In2 +γ − U1Vα−1X2+γ +1Y≥U

In3 n

α−1 X3

In3 +γ − 1UVα−1X3+γ +

In4 n

α−1 X4

In4 +γ − 1U1Vα−1X4+γ +o1

2

=Ɛ

1Y<U In1

n

α−1 X1

In1 +γ − UVα−1X1+γ

2

+ In2

n

α−1 X2

In2 +γ − U1Vα−1X2+γ 2

+1Y≥U In3

n

α−1 X3

In3 +γ − 1UVα−1X3+γ

2

+ In4

n

α−1 X4

In4 +γ − 1U1Vα−1X4+γ

2

+o1

(21)

(8)

where the mixed terms are o1 using independence and ƐIn1 /nα−1− UVα−1 =o1[analogously forIn2 ,In3 ,In4 ]. The four occurring summands in (21) are identically distributed. This implies

"22XnX ≤

1Y<U

In1 n

α−1 X1

In1 −UVα−1X1

2 +o1

=

1Y<U In1

n

α−1 X1

In1 −X1 +

In1 n

α−1

−UVα−1 X1

2 +o1

=

1Y<U In1

n

2α−2 X1

In1 X1

2

+

1Y<U

In1 n

α−1

−UVα−1

2

X1+γ2

+

1Y<U In1

n

α−1 X1

In1 −X1 In1 n

α−1

−UVα−1 X1 +o1

(22)

As a consequence of (6) we obtain Ɛ

In1 n

α−1

− UVα−1 2

→0 asn→ ∞ (23)

Therefore the second summand of (22) converges to 0. With the Cauchy–Schwarz inequality and (23) the third term in its absolute value is bounded from above by

In1 n

α−1

− UVα−1 2

X12 1/2

Ɛ

X1

In1 −X1 21/2

=o1Ɛ

X1

In1 −X1 21/2

≤o1Ɛ

X1

In1 −X1 2

+o1

For the last inequality observe that if the expectation is less than 1 then both sides areo1. Therefore, from (22) we derive withan="22Xn X,

an≤4Ɛ

1Y<U In1

n 2α−2

+o1

X1

In1 −X1 2

+o1

=4n−1

i=0

Ɛ

1In

1 =i1Y<U i

n 2α−2

+o1

X1i −X1 2

+o1

=4n−1

i=0

Ɛ

1In

1 =i1Y<U i

n 2α−2

+o1

"22Xi X +o1 (24)

(9)

Thus from (7), an≤4n−1

i=0

Ɛ

1In

1 =i1Y<U i

n 2α−2

+o1

1≤i≤n−1sup ai+o1

= 4Ɛ

1Y<UUV2α−2

+o1

1≤i≤n−1sup ai+o1

= ξ2+o1 sup

1≤i≤n−1ai+o1 (25)

where ξis defined in (16); in particularξ2 <1. Thusann∈ is bounded. We denote a = lim supn→∞an. Now we can conclude as in R¨osler (1991). For a givenε >0 there existn0∈andξ+<1 withan≤a+εandξ2+o1 ≤ξ+<1 for alln≥n0. Then from (24) it follows that

an≤4n0−1

i=0

Ɛ

1In

1 =i1Y<U i

n 2α−2

+o1

ai

+4n−1

i=n0

Ɛ

1In

1 =i1Y<U i

n 2α−2

+o1

a+ε +o1

≤ξ+a+ε +o1 (26)

Now,n→ ∞yields a≤ξ+a+ε, which impliesa=0.

Convergence in the "2-metric implies weak convergence and convergence of the second moments. For this reason the constant β in (2) can be red- erived from the limiting equation (20). We will give this argument in detail in the general d-dimensional case with 1 ≤ s ≤ d−1 components specified in Section 3. In this case the first-order expansion for the variance was not known up to now.

We cannot solve the limiting equation (20) explicitly. From a simulation we get an estimate for the density of the translated limiting distribution in (20) (see Fig. 1).

The plot was produced by iterating the translated limiting operatorTten times starting withδγ, the Diracmeasure inγ. We produced 15,000 samples of T10δγand applied a standard smoothing routine of S-Plus on the histogram of the data.

3. The multidimensional quadtree. We consider a partial match query for ad-dimensional random quadtree with 1≤s≤d−1 components specified.

By symmetry we assume w.l.g. these are the first s coordinates. Then after comparing the search pattern with an internal node of the quadtree we have to inspect 2d−ssubtrees at this node for the subsequent search. A nodew∈ 01d partitions the quadrant it belongs to into 2dsubquadrants. Let the index of a subquadrant be given by

d i=1

2d−i1wi≤pi w= wi p= pi

(10)

Fig. 1. Estimated density of the translated limiting distribution.

ifpis a point in this subquadrant. A keypis inserted in thekth subtree if it belongs to thekth subquadrant. For the binary representation of 0≤k≤2d−1,

k=d

i=1

ai2d−i ai =aik ∈ 01 let

Ek = i∈ 1 d aik =1 Nk = i∈ 1 d aik =0

Then equivalently,pis inserted in thekth subtree of a nodewifpi≥wi for alli∈Ekandpi< wi for alli∈Nk.

The volumes of the quadrants generated by the rootu∈ 01d of the tree are given by

uk=

i∈Nk

ui

i∈Ek

1−ui

here u = u0 u2d−1 denotes the vector of the generated volumes.

The vectorInof the cardinalities of the subtrees of a randomd-dimensional quadtree with n nodes is conditionally given the root U multinomial distributed:

InU=u=Mn−1u

whereU, the first key to be inserted, is uniformly distributed on01d. As in the two-dimensional case, convergence in probability of In/nfollows:

In n

−→ U = U 0 U2d−1 (27)

(11)

We denote the sspecified components of the search pattern by Y= Y1 Ys. The variables Yi are independent, uniformly distributed on 01, and independent of the random quadtree. In order to give a concise form of the recursive distributional equation for the numberCnof nodes traversed during a partial match query in a randomd-dimensional quadtree withnnodes and 1≤s≤d−1 components specified we define forj1 js∈ 01,

1j1jsU Y =

1≤i≤s ji=0

1Yi<Ui

1≤i≤s ji=1

1Yi≥Ui

Then analogously to (8) it holds that Cn=

j1jd=01

1j1jsU YCj

Inj +1 (28)

Here and in the followingj1 jdis the binary representation ofj, that is, j=d

i=1

ji2d−i (29)

In (28) the variablesY= Y1 Ys,U= U1 Ud,C0i C2i d−1 are independent, Y and U are uniformly distributed on 01s and 01d, respectively, Cki = Ci and In is conditionally given U = u multinomial Mn−1u distributed.

For d-dimensional quadtrees a first-order expansion is only known for the mean of Cn. In Flajolet, Gonnet, Puech and Robson (1993) the asymptotic expansion

ƐCn∼γs dnα−1 (30)

is proved. Here γs d is a positive constant which can in principle be approxi- mated numerically andα∈ 12is the unique solution of the indicial equation

αd−sα+1s=2d (31)

An asymptoticexpansion for the variance ofCn was unknown up to now. We will prove later that

VarCn ∼βs dn2α−2

whereβs d>0 has an explicit representation in terms ofαandγs d. The normalized number of traversed nodes,

Xn= Cn−ƐCn nα−1 withαgiven by (31), satisfies the modified recursion,

Xn=

j1jd=01

1j1jsU Y Inj

n α−1

XjIn jsd

−γs d+o1 (32)

(12)

Note that here and in the following we use (29) in our notation. We define Uj1jd=

1≤i≤d

ji=0

Ui

1≤i≤d

ji=1

1−Ui

for j1 jd ∈ 01. By the convergence of the coefficients in (27) it seems reasonable that a distributional limit Xof Xn is a solution of the limiting equation,

X=

j1jd=01

1j1jsU YUα−1j1jdXjsd

−γs d (33)

Therefore, we define the limiting operator

T:M111 →M111 Tµ=

j1jd=01

1j1jsY UUα−1j1jdZjsd

−γs d (34)

where Y, U, Z0 Z2d−1 are independent, Y and U are uniformly dis- tributed on01sand01d, respectively, andZj= µforj=0 2d−1.

Lemma 3.1. The limiting operatorT:M02 → M02 is a contraction w.r.t.

"2,

"2Tµ Tν ≤ξ "2µ ν for allµ ν∈M02 (35)

ξ= 1

αsα−1/2d−s <1 (36)

Proof. Obviously VarTµ<∞. Since the summands in (34) are iden- tically distributed we derive

ƐTµ =2dƐ s

i=1

1Yi<Uid

i=1

Uα−1i

γs d−γs d

=2dƐ

1Y1<U1Uα−11 s Ɛ

Uα−11 d−s

γs d−γs d

=2dα+1−sα−d−sγs d−γs d

=2d2−dγs d−γs d=0

where the indicial equation (31) is used. So T:M02 →M02 is well defined.

To prove contractivity let µ, ν ∈ M02 and let Wk Zk, Y be indepen- dent, Y, U uniformly distributed on 01s and 01d, respectively, and let Wk Zk be optimal couplings ofµ ν, that is Wk = µ, Zk = ν and

(13)

"22µ ν =ƐWk−Zk2 fork=0 2d−1. Then using the independence properties andƐWk=ƐZk=0 we conclude similarly to (17),

"22Tµ Tν ≤Ɛ

j1jd=01

1j1jsY UU2α−2j1jdZj−Wj2

=2dƐ s

i=1

1Yi<UiU2α−2i d

i=s+1

U2α−2i

"22µ ν

=2d−s2α−1−d−s"22µ ν

= 1

αsα−1/2d−s"22µ ν

This implies assertion (35). Sinceα∈ 12we have 2α > α+1 and 2α−1> α which together with (31) yields

ξ=

αsα−1/2d−s−1/2

=

2ds2α−1d−s

1/2

< 2d

α+1sαd−s =1 As in the two-dimensional case, a simplification of the limiting operatorT is possible [cf. (18), (19)]. We denote

Ujs+1jd =

s+1≤i≤d

ji=0

Ui

s+1≤i≤d

ji=1

1−Ui

forjs+1 jd∈ 01. Then Tµ=

s i=1

Uα−1/2i

js+1jd=01

Uα−1js+1jdXjs+1jdsd

−γs d

whereU Xjs+1jd:js+1 jd=01is an independent family,U is uni- formly distributed on 01d, and Xjs+1jd = µ for all js+1 jd =01.

With an additional translation it follows thatXis a fixed point ofTinM02if and only ifX=X+γs dis a fixed point of the operatorTinMγs d2given by

Tµ = s

i=1

Uα−1/2i

js+1jd=01

Uα−1js+1jdXjs+1jd (37)

In (37) againUXjs+1jdjs+1 jd=01is independent,Uuniformly distributed on01d, andXjs+1jd= µ∈Mγs d2.

Theorem 3.2 (Limit theorem for partial match query in quadtrees). The normalized number Xn of nodes traversed during a partial match query in

(14)

a random d-dimensional quadtree with 1 ≤ s ≤ d−1 components specified converges w.r.t."2to the unique fixed-pointXinM02 of the limiting operator T given in34, that is,

"2Xn X →0

The translated limiting distributionX=X+γs dis the unique fixed point in Mγs d2 of the operator

Tµ = s

i=1

Uα−1/2i

js+1jd=01

Uα−1js+1jdXjs+1jd

given in37.

Proof. Using (27), (32) and (35) the proof of the two-dimensional case can be extended to the multidimensional case. Withan="22Xn Xanalogously to (21)–(24) the recursion,

an≤2dn−1

j=0

Ɛ s

i=1

1Yi<Ui1In

1 =jj/n2α−2+o1

ai+o1 (38)

can be derived. Then as in (25), an≤2d

Ɛ

s

i=1

1Yi<UiU2α−2i d

i=s+1

U2α−2i

+o1

1≤i≤n−1sup ai+o1

= ξ2+o1 sup

1≤i≤n−1ai+o1 (39)

with ξ given in (36). This implies that ann∈N is bounded. The convergence then follows as in (26).

Convergence in the "2-metric implies convergence of the second moments [cf. (15)]. Therefore a first-order asymptotic of the variance of the number of traversed nodes Cn can be derived in dimension d with 1 ≤ s ≤ d−1 components in the search pattern specified. We use the simplified form of the fixed-point equation (37).

Corollary 3.3(Variance of partial match query in quadtrees). The vari- ance of the limiting distribution for the normalized number of nodes tra- versed during a partial match query in a random d-dimensional quadtree with1≤s≤d−1components specified is given by

βs d=

2α−1Bα α +1d−s−1 αsα−1/2d−s−1 −1

γs d2 The variance of the numberCnof nodes traversed satisfies

VarCn ∼βs dn2α−2

The constants α and γs d are given by30 and 31, and B··denotes the Eulerian beta integral.

(15)

Proof. Note that

VarX =VarX = ƐX2−γ2s d (40)

whereXis the fixed point of the operator in (37). From (37) we obtain ƐX2

s i=1

Uα−1i

js+1jd=01

ks+1kd=01

Uα−1js+1jdUα−1ks+1kdXjs+1jdXks+1kd

−s

∀iji=ki

ƐU2α−2js+1jdƐX2+

∃iji=ki

ƐUα−1js+1jdUα−1ks+1kd ƐX 2

(41)

The expectations of the occurringU’s can be calculated explicitly:

ƐU2α−2js+1jd = 2α−1−d−s and forjs+1 jdks+1 kdand

h=cards+1≤i≤d:ji=ki ƐUα−1js+1jdUα−1ks+1kd

=

ƐU2α−21 d−s−h

ƐU11−U1α−1h

= 2α−1−d−s−hBα αh (42)

With (42) in (41) we derive ƐX2−s

2d−s2α−1−d−sƐX2

+d−s

h=1

2d−s

d−s h

2α−1−d−s−hBα αh

γs d2

Using the binomial formula it follows,

ƐX2

1−α−s 2 2α−1

d−s

−s2d−s

Bα α + 1 2α−1

d−s

− 1

2α−1 d−s

γ2s d A simplification leads to

ƐX2= 2α−1Bα α +1d−s−1 αsα−1/2d−s−1 γs d2 Together with (40) this implies the first assertion.

By convergence of the second moments ofXn we finally conclude VarCn =Varnα−1Xn =VarXnn2α−2= VarX +o1n2α−2

∼βs dn2α−2

(16)

The weak convergence in Theorem 2.2 also leads to results on exponential moments using the tools of R¨osler (1991, 1992).

Theorem 3.4(Convergence of Laplace transforms). The limitXof the nor- malized numberXnof nodes traversed during a partial match query in a ran- dom d-dimensional quadtree with 1 ≤ s ≤ d−1 components specified has a finite Laplace transform in some neighborhood of0,

ƐexpλX<∞ for all λ∈ −λ0 λ0 For

0< s

d <ln4/3

ln5/3 =0563· · · (43)

existence and convergence of the Laplace transform holds on the whole real line ƐexpλXn −→ƐexpλX for allλ∈

Proof. Note that the recursions forXnandXgiven in (32) and (33) can be written in the form

Xn=

j1jd=01

1j1jsU Y Inj

n

α−1

Xj

Inj +CnU Y In (44)

and

X=

j1jd=01

1j1jsU YUα−1j1jdXj

+CU Y (45)

with

CnU Y In =

j1jd=01

1j1jsU Y Inj

n

α−1

γs d (46)

−γs d+o1 and

CU Y =

j1jd=01

1j1jsU YUα−1j1jd

γs d−γs d

The distributions and (in-)dependencies are as in (32) and (33). The recursion (45) satisfies the conditions of Theorem 6 in R¨osler (1992) with

Tj=1j1jsU Y Inj

n

α−1

This implies the existence of a neighborhood−λ0 λ0of zero whereXhas a finite Laplace transform.

For the second assertion note that

ƐCnU Y In =0 for alln∈ (47)

(17)

since the variablesXn andXj

Inj in (44) are centered. Define Vn=

j1jd=01

1j1jsU Y Inj

n

2α−2

−1

It is 2j=0d−1Inj = n−1. Condition (43) and the indicial equation (31) imply α≥3/2. Thus

Vn<0 for all n∈ (48)

The convergence of the coefficients in (27) implies ƐVn−→Ɛ

j1jd=01

1j1jsU YU2α−2j1jd

−1=ξ−1<0

withξ given in (36). This yields

supn∈ƐVn<0 (49)

From the representation (46) ofCnU Y Init is obvious that supn∈Cn<∞

(50)

The properties (47), (48), (49) and (50) are sufficient to obtain ƐexpλXn −→ƐexpλX

(51)

for allλ∈as in Lemma 4.1 and Theorem 4.2 in R¨osler (1991).

In particular, Theorem 3.4 implies exponential tails and the existence of all moments of the limiting distributions. Under condition (43) additionally convergence of all moments follows and a bound for large deviations of the (unscaled) cost Cn can be established: for allλ > 0 there exists acλ > 0 so that for any sequenceanof positive, real numbers holds

Cn≥an ≤cλ exp

−λ an nα−1

(52)

The existence of densities of the limiting distributions with respect to the Lebesgue measure can be deduced following the scheme of Theorem 2.1 in Tan and Hadjicostas (1995) for the limiting distribution of the running time of the Quicksort algorithm. The translated limit distributions [given by the operators (37)] are supported by 0∞. The densities are positive almost everywhere on0∞[cf. Theorem 2.4 in Tan and Hadjicostas (1995)].

(18)

Acknowledgment. The authors thank P. Flajolet for introducing them to the problem of partial match query and for his suggestion to analyze it by the contraction method.

REFERENCES

Bickel, P. J. andFreedman, P. A. (1981). Some asymptotictheory for the bootstrap.Ann. Statist.

91196–1217.

Dobrow, R. P. andFill, J. A. (1999). Total path length for random recursive trees. Combin.

Probab. Comput.8317–333.

Finkel, R. andBentley, J. (1974). Quad trees, a data structure for retrieval on composite keys.

Acta Inform.41–9.

Flajolet, P.,Gonnet, G.,Puech, C. andRobson, J. (1993). Analyticvariations on quadtrees.

Algorithmica10473–500.

Flajolet, P. andPuech, C. (1986). Partial match retrieval of multidimensional data.J. ACM33 371–407.

Knuth, D. E. (1998).The Art of Computer Programming: Sorting and Searching 3, 2nd ed.

Addison-Wesley, Reading, MA.

Mahmoud, H. (1992). Evolution of Random Search Trees. Wiley, New York.

Mahmoud, H.,Modarres, R. andSmythe, R. (1995). Analysis of quickselect: an algorithm for order statistics.RAIRO Inform. Th´eor. Appl.28299–310.

Mart´ınez, C.,Panholzer, A. andProdinger, H. (2000). Partial match queries in relaxed multi- dimensional search trees.Algorithmica29181–204.

Neininger, R. (2000). Asymptoticdistributions for partial match queries inkdtrees.Random Structures Algorithms. To appear.

Neininger, R. andR ¨uschendorf, L. (1999). On the internal path length ofd-dimensional quad trees.Random Structures Algorithms1525–41.

Rachev, S. T. (1991). Probability Metrics and the Stability of Stochastic Models. Wiley, New york.

Rachev, S. T. and R ¨uschendorf, L. (1995). Probability metrics and recursive algorithms.Adv.

in Appl. Probab.27770–799.

R ¨osler, U. (1991). A limit theorem for “quicksort.”RAIRO Inform. Th´eor. Appl.2585–100.

R ¨osler, U. (1992). A fixed point theorem for distributions.Stochastic Process. Appl.42195–214.

R ¨osler, U. (2000). On the analysis of stochastic divide and conquer algorithms.Algorithmica29 238–261.

R ¨osler, U. andR ¨uschendorf, L. (2000). The contraction method for recursive algorithms.Algo- rithmica293–33.

Samet, H. (1990a). Applications of Spatial Data Structures: Computer Graphics, Image Process- ing, and GIS. Addison-Wesley, Reading, MA.

Samet, H. (1990b). The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA.

Schachinger, W. (2000). Limiting distributions for the costs of partial match retrievals in mul- tidimensional tries.Random Structures Algorithms17428–460.

Tan, K. H. andHadjicostas, P. (1995). Some properties of a limiting distribution in quicksort.

Statist. Probab. Lett.2587–94.

Institut f ¨ur Mathematische Stochastik Universit¨at Freiburg

Eckerstr. 1 79104 Freiburg Germany

E-mail:rn@stochastik.uni-freiburg.de ruschen@stochastik.uni-freiburg.de

Referenzen

ÄHNLICHE DOKUMENTE

Based on the Survey of Transparency International Corruption Perception Index (CPI) in 2009, the ASEAN countries except Singapore, Brunei Darussalam and Malaysia, placed

Find the maximum common substructure of the following compounds by visual inspection and report the corresponding SMILES string that would match all compounds!. Which of the

Find the maximum common substructure of the following compounds by visual inspection and report the corresponding SMILES string that would match all compounds!. Which of the

We have explained how this result is based on the method of moments : expressing the essential degrees of freedom for a developing spatial pattern in terms of spatial moments

In summary, we applied CPLEX and a MA to approximate cores of hard to solve bench- mark instances and observed that using approximate cores of fixed size instead of the original

uted to the concept of protocol: the oldest and most familiar aspect of meaning is the verbatim note taking of a negotiation or court hearing, usually of a political or

Author contributions BB has led overall research activities from proposal development to data compilation, data entry and processing, data analysis, and interpretation of the result

Ces approches comprennent entre autres la théorie du Trade Off et celle du Pecking Order, qui utilisent différentes caractéristiques de l’entreprise, pour expliquer les