• Keine Ergebnisse gefunden

Space-Efficient Online Computation of Quantile Summaries

N/A
N/A
Protected

Academic year: 2022

Aktie "Space-Efficient Online Computation of Quantile Summaries"

Copied!
9
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Space-Efficient Online Computation of Quantile Summaries

Michael Greenwald

Computer & Information Science Department University of Pennsylvania

200 South 33rd Street Philadelphia, PA 19104

greenwald@cis.upenn.edu

Sanjeev Khanna

y

Computer & Information Science Department University of Pennsylvania

200 South 33rd Street Philadelphia, PA 19104

sanjeev@cis.upenn.edu

ABSTRACT

An-approximatequantilesummaryofasequeneofN el-

ementsisadatastruturethatananswerquantilequeries

aboutthesequenetowithinapreisionofN.

Wepresentanewonlinealgorithmforomputing-approxi-

matequantilesummariesofverylargedatasequenes. The

algorithmhasaworst-asespaerequirementofO(

1

log(N)).

ThisimprovesuponthepreviousbestresultofO(

1

log

2

(N)).

Moreover,inontrasttoearlierdeterministialgorithms,our

algorithmdoesnotrequireaprioriknowledgeofthelength

oftheinputsequene.

Finally,the atualspaebounds obtainedonexperimental

dataaresigniantlybetterthantheworstaseguarantees

ofouralgorithmaswellastheobservedspaerequirements

ofearlier algorithms.

1. INTRODUCTION

Westudytheproblemofspae-eÆientomputationofquan-

tile summariesof very large data sets ina single pass. A

quantilesummaryonsistsofasmallnumberofpointsfrom

theinputdatasequene,andusesthosequantileestimatesto

giveapproximateresponsestoanyarbitraryquantilequery.

Summariesof large data sets havelong been used by pro-

grammersmotivatedbylimitedmemoryresoures. Elemen-

tarysummaries, suhas running averages or standard de-

viation,aretypiallysuÆientonlyforsimpleappliations.

The mean and variane are often either insuÆiently de-

sriptive, orare toosensitiveto outliersand other anoma-

Supported inpart by DARPA underContrat #F39502-

99-1-0512, and by the NationalSiene Foundation under

GrantANI-00-81901.

y

SupportedinpartbyanAlfredP.SloanResearhFellow-

ship.

lousdata.Forsuhases,onlinealgorithmsareneessaryto

generatequantilesummariesthat use little spae andpro-

videreasonablyaurateapproximationstothedistribution

funtioninduedbytheinputdatasequene[6,1,5,13,2℄.

1.1 Quantile Estimation for Database Appli- cations

Reentwork(e.g. [8,9,12℄)hashighlightedtheimportane

ofquantileestimatorsfordatabaseusersandimplementors.

Quantileestimatesareusedtoestimatethesizeofinterme-

diateresults,toallowqueryoptimizerstoestimatetheost

of ompeting plansto resolve database queries. Parallel

databases attemptto partition the data into value ranges

suhthatthesizeofallpartitionsareroughlyequal.Quan-

tileestimates anbeusedtohoosethe rangeswithoutin-

speting the atual data. Quantile estimates have several

other uses indatabases as well. User-interfaes may esti-

materesultsizesof queries,andprovidefeedbakto users.

Thisfeedbakmaypreventexpensiveandinorretqueries

from beingissued, and mayag disrepanies between the

user'smodelofthedatabaseanditsatualontent. Quan-

tileestimatesarealsousedbydatabaseuserstoharaterize

thedistributionofrealworlddatasets.

The existing body of work has also identied partiular

properties that quantile estimators require in order to be

usefulforthesedatabaseappliations|propertiesthatmay

notbestritlyneessarywhenestimatingquantilesinother

domains. Some of the desirable properties are as follows.

(1)Thealgorithmshouldprovidetunableandexpliitapri-

ori guarantees on the preision of the approximation. We

say that a quantile summary is -approximate if it an be

usedto answeranyquantilequerytowithin apreision of

N. Inotherwords,foranygivenrankr,an-approximate

quantilesummaryreturnsavaluewhoserankr 0

isguaran-

teed to be within the interval [r N;r+N℄. (2) The

algorithm shouldbe dataindependent. Neitheritsguaran-

tees shouldbe aeted by thearrivalorderor distribution

of values, nor should it require a priori knowledge of the

size of thedataset. (3)Thealgorithm should exeuteina

singlepassoverthedata. (4)Thealgorithmshouldhaveas

smallamemoryfootprintaspossible. Wenoteherethatthe

memoryfootprint applies to temporarystorage during the

omputation. We an always onstrut an -approximate

summary of size O(1=) as follows. We rst onstrut an

=2-approximate summary. For i from 0 to 2

, querythis

(2)

summaryforeahi

2

quantile. Itiseasytoseethattheset

ofresponsesonstitutesan-approximatesummary.

1.2 Previous Work

Severalearlier works have made progress towards meeting

the above-mentionedrequirements. Manku, Rajagopalan,

and Lindsay [8℄ present a single-pass algorithm that on-

strutsan-approximatequantilesummary. Thealgorithm

stritlyguaranteesapreisionofN,butitrequiresanad-

vaneknowledgeof N,the sizeof thedataset. Itrequires

O(

1

log

2

(N)) spae. In [8℄ the same authors present an

algorithm that does not require an advane knowledge of

N. However, theymustgive upthe deterministi guaran-

teeonauray. Instead,theyprovide onlyaprobabilisti

guaranteethatthequantileestimatesarewithinthedesired

preision.

Gibbons,Matias, andPoosala[4℄ estimatequantiles under

a dierent error metri, but their algorithm requires mul-

tiplepassesoverthedata. Similarly,Chaudhuri,Motwani,

and Narsayya[3℄ require multiple passes andonly provide

probabilistiguarantees.

Munro and Paterson [10℄, building on the earlier work of

Pohl[7℄,showedthat anyalgorithmthat exatlyomputes

the-quantileof asequene ofN data elementsinonlyp

passes, requires a spae of (N 1=p

). Thus the notion of

approximate quantiles is inherently neessaryfor obtained

sub-linearspaealgorithms.

Manyresearhershavealsoaddressedtheproblemofdeter-

miningthesmallestnumberofomparisonsthatarenees-

saryfor omputing a-quantile. We refer thereader toa

niesurveyartilebyPaterson[11℄foranoverviewofresults

inthisarea.

1.3 Our Results

Wedesignandanalyzeanewonlinealgorithmfor omput-

ing an -approximate quantile summary of large data se-

quenes. Thealgorithmhasaworst-asespaerequirement

ofO(

1

log(N)),thusimprovinguponthepreviousbestre-

sultofO(

1

log

2

(N)). Moreover,inontrasttoearlierdeter-

ministialgorithms,ouralgorithmdoesnotrequireapriori

knowledge ofthelengthoftheinputsequene.

Ourapproahisbasedonanoveldatastruturethatee-

tivelymaintainstherangeofpossibleranksforeahquantile

that we store. This diersfrom previous approahes that

impliitly assumed that the error in stored quantiles was

distributed roughly uniformlythroughout the distribution

of observed values. Byexpliitly maintaining the possible

rangeofrankvaluesforeahquantile,ouralgorithmisable

toadaptivelyhandlenewobservations:valuesobservednear

tightlyonstrained quantilesare morelikely tobedropped

andnewvaluesobservednearlooselyonstrained quantiles

are morelikelyto be stored. Intuitively speaking, theim-

provedbehaviorofouralgorithmisbasedonthefat(whih

weprove)thatnoinputsequeneanbe\bad"arosstheen-

tiredistributionatone. Inotherwords,aninputsequene

annotpersistentlypresent newobservationsthat mustbe

storedwithoutallowingustosafelydeleteoldstoredobser-

vations.

Wealsonotehere thatouralgorithmanbeparallelizedin

a straightforward mannerto deal with the senario where

a system of P independent proessors analyzes P disjoint

streamsderivedfromaparentsequene.Due tospae on-

siderations, wewill omitthedetails ofthisimplementation

inthisversion.

Finally,westudytheperformaneofouralgorithmfroman

empirial perspetive. The atual spae bounds obtained

onexperimentaldataaresigniantlybetterthanboththe

worst ase guarantees of our algorithm as well as the ob-

servedspaerequirementsofearlier algorithms. Forexam-

ple, when summarizing uniformlyrandom data with =

0:001 andN =10 7

,ouralgorithmusedanorderofmagni-

tudelessmemorythanthebestpreviouslyknownalgorithm.

2. THE NEW ALGORITHM

We will assume withoutany loss of generality that a new

observationarrivesaftereahunitoftimeandthuswewill

usento denoteboththenumberofobservations(elements

ofthe datasequene)thathavebeenseen sofar aswellas

theurrenttime. Ouralgorithmmaintainsasummarydata

strutureS=S(n)atalltimes,andwedenotebys=s(n),

the total spae used by it. Finally, we denote the given

preisionrequirementby.

2.1 The Summary Data Structure

At anypoint intimen,thedatastrutureS(n)onsists of

an ordered sequene of tuples whih orrespond to asub-

setof theobservations seenthusfar. Foreahobservation

v inS, we maintainimpliit bounds onthe minimum and

themaximumpossiblerankoftheobservationvamongthe

rstnobservations. Letrmin(v)andrmax(v)denoterespe-

tively thelowerandupperbounds ontherank ofvamong

the observations seen so far. Speially,S onsists oftu-

plest0;t1;:::;ts

1

whereeahtupleti=(vi;gi;i)onsists

ofthreeomponents: (i)avaluev

i

thatorrespondstoone

of theelementsinthedatasequeneseenthusfar, (ii) the

value gi equals rmin(vi) rmin(vi 1), and (iii) i equals

r

max (v

i ) r

min (v

i

). Weensurethat,atalltimes,themax-

imum and the minimum values are part of the summary.

Inotherwords,v0 andvs

1

alwaysorrespondtothemin-

imumandthemaximumelementsseensofar. Itiseasyto

seethatr

min (v

i )=

P

ji g

j andr

max (v

i )=

P

ji g

j +

i .

Thus gi+i 1 is an upperbound onthe total number

of observations that may have fallen betweenvi

1 and vi.

Finally, observe that P

i g

i

equals n, the total number of

observationsseensofar.

AnsweringQuantile Queries: Asummaryoftheabove

form an be used ina straightforward manner to provide

-approximateanswerstoquantilequeries. Theproposition

belowformsthebasisofourapproah.

Proposition 1. GivenaquantilesummarySintheabove

form,a-quantileanalwaysbeidentiedtowithinanerror

ofmaxi(gi+i)=2.

Proof. Letr =dneandlet e=max

i (g

i +

i

)=2. We

will searh for an indexi suhthat r e rmin(vi) and

rmax(vi)r+e. Clearly,suhavalueviapproximatesthe

(3)

guethatsuhanindeximustalwaysexist. First,onsider

theaser>n e. Wehavermin(vs

1

)=rmax(vs 1)=n,

andthereforei=s 1hasthedesiredproperty. Otherwise,

when rn e, we hoose the smallest indexj suhthat

rmax(vj) > r+e. It follows that r e rmin(vj

1 ). If

r e>rmin(vj 1)thenrmax(vj)=rmin(vj

1

)+gj+j>

r

min (v

j 1

)+2e; a ontradition to our assumption that

e=maxi(gi+i)=2. Byassumption,rmax(vj

1

)r+e,

thereforej 1is anexampleof anindexiwiththe above

desribedproperty.

Thefollowingisanimmediateorollary.

Corollary 1. Ifatany time n,the summaryS(n)sat-

isesthe property that maxi(gi+i) 2n, thenwe an

answerany -quantilequery towithin annpreision.

Atahighlevel,ouralgorithmfor maintainingthequantile

summaryproeedsasfollows. Wheneverthealgorithmsees

a newobservation, it inserts inthe summary a tupleor-

respondingtothis observation. Periodially, thealgorithm

performsasweepoverthesummaryto\merge"someofthe

tuplesintotheirneighborssoastofreeupspae. Theheart

ofthe algorithm is inthe mergephase where we maintain

severalonditionsthatallowustoboundthespaeusedby

Satanytime. ByCorollary1,itsuÆestoensurethatatall

timesmaxi(gi+i)2n. Motivatedbythisonsideration,

wewillsaythatanindividualtupleisfullifgi+i=b2n.

Theapaityofanindividualtupleisthemaximumnumber

ofobservations thatanbeountedbygi beforethe tuple

beomesfull.

Bands: Inorderto minimizethenumberof tuplesinour

summary,ourgeneralstrategywillbetodeletetupleswith

smallapaityandpreservetupleswithlargeapaity. The

mergephasewillfreeupspaebymergingtupleswithsmall

apaitiesintotupleswith\similar"orlargerapaities. We

say that two tuples, t

i and t

j

, have similar apaities, if

logapaity (t

i

)logapaity(t

j ).

This notion of similarity partitions the possible values of

into bands. Roughlyspeaking, we try to dividethe s

into bands that liebetweenelements of(0;

1

2 2n;

3

4 2n; :::

2 i

1

2 i

2n;:::2n 1;2n). (Theseboundariesorrespondto

apaitiesof2n;n;

1

2 n;:::

1

2 i

n;:::,8;4;2;1.) Aswewill

seeshortly,itisusefultodenebandsinawaythatensures

thepropertythatiftwosareeverinthesameband,they

neverappear in dierent bands as n inreases. Therefore,

for from 1 to dlog2ne, we let p =b2n and we dene

bandtobethesetofallsuhthatp 2

(pmod2

)<

p 2

1

(pmod2 1

). The(pmod2

)termholds

thebordersbetweenbandsstatiasninreases. Wedene

band0 to simply be p. Asa speial ase, we onsider the

rst1=2observations,with=0,tobeinabandoftheir

own. Figure1showsthebandboundariesas2ngoesfrom

24to34. We willdenotebyband(ti;n)thebandof i at

timen,andbyband

(n)all tuples(orequivalently,the

valuesassoiatedwiththesetuples)thathaveabandvalue

of.

1111111111222222222233333

2n 0123456789012345678901234 5678 901 234

24

25

26

27

28

29

30

31

32

33

34

Figure 1: Band boundaries as 2n progresses from

24to34. Therightmostbandineahrowisband0.

Proposition 2. Atanypointintimenandforany

1,band(n)ontainseither2

or2 1

distintvaluesof.

Proof. Theband(n) isboundedbelowby2n 2

(2nmod2

) and aboveby 2n 2 1

(2nmod2 1

).

If 2nmod2

<2 1

, then 2nmod2

=2nmod2 1

,

and band(n) ontains 2

2 1

= 2 1

distint values

of . If 2nmod2

2 1

, then 2nmod2

= 2 1

+

(2nmod2 1

),andband

(n) ontains2 1

+2 1

=2

distintvaluesof.

ATree Representation:Wewillnditusefultoimpose

a tree struture over the tuples. Given a summary S =

ht

0

;t

1

;:::;t

s 1

i,thetreeTassoiatedwithSontainsanode

Vi for eah ti and aspeial root node R . Theparentof a

nodeViisthenodeVjsuhthatjistheleastindexgreater

than iwith band(t

j

) > band(t

i

). If no suh indexexists,

thenthe nodeRis settobe theparent. Allhildren(and

alldesendants)ofagivennodeVihavevalueslargerthan

i

. ThefollowingtwopropertiesofTanbeeasilyveried.

Proposition 3. The hildren of any node in T are al-

ways arranged innon-inreasingorderof bandinS.

Proposition 4. For any node V, the set of all its de-

sendants inTformsaontiguous segment inS.

2.2 Operations

Wenowdesribethevariousoperationsthatweperformon

oursummarydatastruture. Westartwithadesriptionof

externaloperations:

2.2.1 External Operations

QUANTILE() Toomputean-approximate-quantile

from the summary S(n) after n observations, om-

pute the rank, r = dne. Find i suh that both

r rmin(vi)nandrmax(vi) rnandreturnvi.

(4)

i 1 i

andinsertthetuple(v;1;b2n),betweent

i 1 andt

i .

Inrements. Asaspeialase,ifvisthenewminimum

orthemaximumobservationseen,theninsert(v;1;0).

INSERT(v)maintains orretrelationshipsbetweengi,i,

r

min (v

i )andr

max (v

i

). Considerthatifvisinsertedbefore

vi,thevalueofrmin(v)maybeassmallasrmin(vi 1)+1,

andhenegi=1. Similarly,rmax(v)maybeaslargeasthe

urrentr

max (v

i

),whihinturnis boundedbyb2n. Note

thatrmin(vi)andrmax(vi)getinreasedby1afterinsertion.

COMPRESS()

forifroms 2to0do

if((BAND(i;2n)BAND(i+1;2n))&&

(g

i +g

i+1 +

i+1

<2n))then

DELETEalldesendantsoftiandthetupletiitself;

endif

endfor

endCOMPRESS

Figure2: Pseudo-ode for COMPRESS

2.2.2 Internal Operations

DELETE(vi) To delete the tuple (vi;gi;i) from S, re-

plae(v

i

;g

i

;

i )and(v

i+1

;g

i+1

;

i+1

)bythenewtu-

ple(v

i+1

;g

i +g

i+1

;

i+1

),andderements.

DELETE() orretly maintains the relationships be-

tweengi,i,rmin(vi)andrmax(vi). Deletingvihasno

eetonrmin(vi+1)and rmax(vi+1),so DELETE(vi)

shouldsimplypreserver

min (v

i+1 )andr

max (v

i+1 ).The

relationshipbetweenr

min (v

i+1 )andr

max (v

i+1 )ispre-

servedaslongasi+1isunhanged. Sinermin(vi+1)=

P

ji+1

gj,andwedeletegi,wemustinreasegi+1by

gi tokeeprmin(vi+1). Allotherentriesareunaltered

bythisoperation.

COMPRESS() TheoperationCOMPRESStriestomerge

togetheranodeandall itsdesendantsintoeitherits

parentnodeorintoitsrightsibling. Thepropertythat

wemustensureisthatthetuplethatresultsafterthis

merging is not full. By Proposition 4, we knowthat

a nodeand itshildrenalwaysform a ontiguousse-

quene of tuples in S(n). Let g

i

denote the sum of

g-valuesofthetuplet

i

andallitsdeendantsinT. It

iseasytoseethatmergingt

i

anditsdesendants(by

DELETEingthem) into ti+1would resultinti+1be-

ing updatedto(vi+1;g

i

+gi+1;i+1). Wewouldlike

to ensurethatthis resultingtupleisnotfull. Wesay

thatapairofadjaenttuplesti;ti+12S(n)ismerge-

able if (g

i

+gi+1+i+1 < 2n) and band(ti;n)

band(t

i+1

;n). At a high level, the COMPRESS op-

eration iterates overthe tuples inS(n) from right to

left,andwheneveritndsamergeablepairti;ti+1,it

mergest

i

aswellasalltuplesthatare desendantsof

ti inT(n)intoti+1. Notethatpairsoftuplesthatare

not mergeableat somepoint intime may beomeso

at a later point in time as the term b2n inreases

overtime. Figure2 givespseudo-odedesribingthis

theofsurvivingtuples,itfollows that

i

ofanyquantile

entryremainsunhangedoneithasbeeninserted.

COMPRESS()inspetstuplesfromright(highestindex)to

left. Therefore, it rstombineshildren (andtheir entire

subtree of desendants) into parents. It ombinessiblings

onlywhennomorehildrenanbeombinedintotheparent.

Initial State

S ;;s=0;n=0.

Algorithm

Toaddthen+1stobservation,v,tosummaryS(n):

if(n0mod 1

2 )then

COMPRESS();

end if

INSERT(v);

n=n+1;

Figure3: Pseudo-ode for thealgorithm

2.3 Analysis

It is easy to see that the data struture above maintains

an-approximate quantilesummaryat eahpointintime.

TheINSERTaswellasCOMPRESSoperationsalwaysen-

sure that g

i

+

i

2n at any point in time. We will

now establish that the total numberof tuples inthe sum-

maryS afternobservationshavebeenseen isboundedby

(11=2)log(2n).

We start by deninga notion of overage. We say that a

tupletinthequantilesummarySoversanobservationvat

anytimenifeitherthetupleforvhasbeendiretlymerged

into tior atupletthatoveredvhasbeenmergedintoti.

Moreover,atuplealwaysoversitself. Itiseasytoseethat

the total number of observations overed by t

i

is exatly

givenby gi =gi(n). The lemmasbelow give somesimple

properties onerning overage of observations by various

tuples.

Lemma 1. Atnopointintime,atuplefrombandov-

ersanobservationfromaband>.

Proof. Suppose at some time n, the event desribed

in the lemma ours. TheCOMPRESS subroutine never

mergesatupleti intoanadjaent tupleti+1ifthebandof

ti is greater thanthe bandof ti+1. Thus the onlyway in

whihthiseventanourisifitatsomepointintime,say

m,wehaveband(ti;m)band(ti+1;m)andattheurrent

time n, we have band(ti;n)> band(ti+1;n). We now ar-

guethat this annotour sine ifat any point intime`,

band(ti;`)=band(ti+1;`),thenforalln`,wemusthave

band(ti;n)=band(ti+1;n). Thebordersbetweenbandsare

stati,exeptwhentwobandsombine(forever). Band0is

alwaysnew. If2n2 1

mod2

,thenand+1om-

bineintothe+1band(isauniquebandforgivenn). All

bands >+1remainthesame. Beauseband0isalways

new,allbands<beome+1. Inotherwords,borders

(5)

thetotalnumberofobservations overedumulativelybyall

tupleswithbandvaluesin[0::℄isboundedby2

=.

Proof. ByProposition2,eahband(n)ontainsatmost

2

distint values of. Thereare nomorethan1=2 ob-

servationswithanygiven,soatmost2

=2observations

were inserted with 2 band. By Lemma 1, no obser-

vations frombands>willbe overed byanodefrom .

Thereforethenodesinquestionanover,atmost,thetotal

numberofobservationsfromallbands . Summingover

allyieldsanupperboundof2 +1

=2=2

=.

Thenextlemmashowsthatforanygivenbandvalue,only

asmall numberof nodes anhavea hild withthat band

value.

Lemma 3. Atany timenandfor anygiven,there are

atmost3=2nodesinT (n)thathaveahildwithbandvalue

of. Inotherwords,thereareatmost3=2parentsofnodes

fromband(n).

Proof. Letmminandmmax,respetivelydenotetheear-

liestandthelatesttimesatwhihanobservationinband

(n)

ould be seen. It is easy to verify that mmin = (2n

2

(2nmod2

))=2andmmax=(2n 2 1

(2nmod

2 1

))=2. Thus, any parent of a node inband(n) must

have

i

<2m

min .

FixaparentnodeViwithatleastonehildinband(n)and

letV

j

betherightmostsuhhild. Denotebym

j

thetime

atwhihtheobservationorrespondingtoVj wasseen.

Wewill showthatat leasta(2=3)-fration ofall observa-

tionsthat arrivedaftertimemminanbeuniquelymapped

tothepair(Vi;Vj). Thisinturnimpliesthatnomorethan

3=2suhVi's anexist,thusestablishingthelemma. The

mainideaunderlyingourproofisthat thefatthatCOM-

PRESS() did not merge Vj into Vi implies there mustbe

alargenumberofobservationsthatanbeassoiated with

theparent-hildpair(V

i

;V

j ).

We rst argue that g

j (n)+

P

i 1

k =j+1 g

k

(n) g

i 1 (n). If

j=i 1,it is triviallytrue. Otherwise, observe that any

tuplet

k

thatliesbetweent

j and t

i

mustbelong toaband

lessthanorequalto|elseV

k

,andnotV

i

,wouldbethe

parent of Vj. Therefore, P

i 1

k =j+1 g

k (n) g

i 1

(n)and the

laimfollows.

NowsineCOMPRESS()didnotmergeVj intoVi,itmust

betheasethatg

i 1

(n)+gi(n)+i>2n. Usingthelaim

above,weanonludethatg

j (n)+

P

i 1

k =j+1 g

k (n)+g

i (n)+

i >2n. Also, attimemj,wehad gi(mj)+i <2mj.

Sinem

j

isatmostm

max

,itmustbethat

g

j (n)+

i 1

X

k =j+1 g

k (n)+(g

i (n) g

i (m

j

))>2(n m

max ):

Finally observe that for any other suh parent-hild pair

V0 andV0,theobservationsountedaboveby(Vj;Vi)and

j i

min

observationsthatarrivedafterm

min

,weanboundthetotal

numberofsuhpairsby(n mmin)=(2(n mmax))whih

iseasilyveriedtobeatmost3=2.

Givena full pairof tuples(t

i 1

;t

i

),we say thatthe tuple

ti

1

isaleftpartnerandtiisarightpartnerinthisfullpair.

Lemma 4. Atanytime nandforany given,thereare

atmost4=tuples fromband

(n)thatare rightpartnersin

afulltuplepair.

Proof. LetXbethesetoftuplesinband(n)thatpar-

tiipateasarightpartnerinsomefullpair. Werstonsider

theasewhentuplesinXformasingleontiguoussegment

inS(n). Letti;:::;ti+p

1

beamaximalontiguoussegment

of band

(n)tuplesinS(n). Sinethesetuples arealivein

S(n),itmustbetheasethat

g

j 1 +g

j +

j

>2n ij<i+p:

Addingoverallj,weget

i+p 1

X

j=i g

j 1 +

i+p 1

X

j=i g

j +

i+p 1

X

j=i

j

>2pn:

Inpartiular,weanonludethat

2 i+p 1

X

j=i 1 g

j +

i+p 1

X

j=i

j>2pn:

The rst term in the LHS of the above inequality ounts

twiethenumberofobservationsoveredbynodesinband(n)

orbyoneofitsdesendantsinthetreeT (n). UsingLemma2,

thissumanbeboundedby2(2

=). Theseondterman

be boundedby p(2n 2 1

) sine the largestpossible

valueforatuplewithabandvalueoforlessis(2n 2 1

).

Substitutingthesebounds,weget

2 +1

+p(2n 2 1

) > 2pn

Simplifyingabove,wegetp<4=aslaimedbythelemma.

Finally,thesameargumentapplieswhennodesinX indue

multiple segments in S(n); we simply onsider the above

summationoverallsuhsegments.

Lemma 5. Atanytimenandforanygiven,themaxi-

mumnumberoftuplespossiblefromeahband(n)is11=2.

Proof. ByLemma4weknowthatthenumberofband(n)

nodesthatarerightpartnersinsomefullpairanbebounded

(6)

! .1 .05 .01 .005 .001 .1 .05 .01 .005 .001 .1 .05 .01 .005 .001

10 5

: 61 120 496 902 3290 183 360 1488 2706 9870 275 468 1519 2859 8334

10 6

: 76 156 664 1230 4983 228 468 1992 3690 14949 378 702 2748 4664 15155

10 7

: 94 185 835 1578 6662 282 555 2505 4734 19986 600 1032 3708 7000 27475

10 8

: 110 224 1067 2063 9148 330 672 3201 6189 27444 765 1477 5960 10320 37026

10 9

: 124 266 1249 2407 11074 372 798 3747 7221 33222 924 1880 7650 14742 59540

Table1: Number oftuplesstored and spaerequirementsfor \hard input" sequenes. For MRLalgorithm,

weassumethat eahquantile storedtakesonlyoneunit ofspae.

by 4=. Any otherband(n)node eitherdoesnotpartii-

pateinanyfullpairoroursonlyasaleftpartner. Werst

laimthateahparentofaband(n)nodeanhaveatmost

onesuhnodeinband(n). Toseethis,observethatifapair

ofnon-full adjaent tuplest

i

;t

i+1

,where t

i+1 2 band

(n),

isnotmergedthenitmustbebeauseband(ti;n)isgreater

than.ButProposition3tellsusthatthiseventanour

onlyoneforany,andtherefore,V

i+1

mustbetheunique

band(n)hildofitsparentthat doesnotpartiipate ina

fullpair. Itisalsoeasytoverifythatforeahparentnode,

atmostoneband

(n)anpartiipateonlyasaleftpartner

inafullpair. Finally,observethatonlyoneoftheabovetwo

eventsanourforeahparentnode. ByLemma3,there

areat most3=2 parentsofsuhnodes,and thus thetotal

numberofband

(n)nodesanbeboundedby11=2.

Theorem 1. At any time n,the total number of tuples

stored inS(n)isatmost(11=2)log(2n).

Proof. There are at most 1+blog2n bands at time

n. There an be at most 3=2 total tuples in S(n) from

bands0and 1. Fortheremainingbands,Lemma5bounds

the maximum numberof tuples in eah band. Theresult

follows.

3. EMPIRICAL MEASUREMENTS

Wenowdesribesomeempirialresultsonerningtheper-

formane of our algorithm in pratie. We experimented

withthreedierentlassesofinputdata: (1)A\hardase"

forouralgorithm,(2)\sorted"inputdata,and(3)\random"

input data. The \sorted" and \random" input sequenes

werehosenfor tworeasons. First,\random"should yield

someinsight into thebehavior of this algorithm on\aver-

age" inputs, or after some randomization. Seond, these

twosenarioswereusedtoproduetheexperimentalresults

in[8℄. TheMRLalgorithm[8℄isthebestpreviouslyknown

algorithm.

Weobservedduring theseruns that, inpratie, the algo-

rithm used substantially less spae than indiated by our

analysisfromthe previoussetion. Theobservedspaere-

quirements alsoturn out to be better thanthose required

by the MRLalgorithm. Moreover, whenwerun ouralgo-

rithmwiththe samespae asusedbythe MRLalgorithm,

the observed error is signiantly better than that of the

MRLalgorithm. Wewill refer to this latervariant as the

pre-alloated variantof ouralgorithm. Inontrast, we will

refertothebasiversionofthealgorithmwherewealloate

anewquantileentryonlywhentheobservederrorisabout

Ourimplementationofthe algorithmdieredslightlyfrom

thatdesribedinSetion2intwoways.First,newobserva-

tionswereinsertedasatuple(v;1;gi+i 1)ratherthan

as(v;1;b2n). Thelatterapproahisusedintheprevious

setion stritlytosimplifytheoretialanalysisofthe spae

omplexity. Seond,ratherthanrunningCOMPRESSafter

every 1=2 observations, instead, for eah observation in-

sertedintoS,onetuplewasdeleted,whenpossible. When

notuple ould be deletedwithout ausing itssuessor to

beome overfull, the size of S grew by 1. Note that by

prealloatingalargeenoughnumberofstoredquantiles,no

inreaseinspaeneedevertakeplae, assuming youknow

N inadvane.

Foreahexperimentwemeasuredboththemaximumspae

used to produe the summary, and the observed preision

of the results. Wemeasured spaeonsumption by ount-

ingthenumberofstoredtuples.Whenomparingourspae

onsumptiontotheMRLalgorithm,wepessimistiallymul-

tiplied thenumberofstoredtuplesby3toaountfor our

reordingthevalueandboththeminandmaxrankofeah

storedelement.

3.1 Hard Input

Weonstrutheredatasequenesinadversarialmannerfor

our algorithm. At eah time step, we generate the next

observation so that it falls inthe largest urrent \gap" in

ourquantilesummary.

We suessively fed observationsto our summary,withno

advanehint aboutthetotal numberof observationsto be

seen. Wemeasuredthemaximumamountofspaerequired

as thesize oftheinputsequeneinreased to10 9

. Table 1

reports the results of this experiment for N ranging over

powersof10from10 5

to10 9

.

Notethattherequirednumberofquantilesstoredisapprox-

imatelya fatorof11lowerthanthe worst-ase boundwe

omputed inthe previoussetion of this paper. Also note

that thenumberof quantiles westore issigniantlylower

thanthe numberused by the MRL algorithm. Even after

multiplyingour tupleount byafator of3,wealmostal-

waysrequirelessspaethanMRL.Theonlyexeptionisin

=:001andN =10 5

,wherethespaeostofouralgorithm

exeedsthatoftheMRLalgorithm.

3.2 Sorted Input

Theseondsenario,\sorted",measuresthebehaviorofthe

summary whenthedata arrivesinsorted order. We xed

=:001andonstruted summariesof sortedsequenesof

5 6 7

(7)

qi# MRL Ouralgorithm,Prealloated Ouralgorithm,Adaptive

N ! 10

5

10 6

10 7

10 5

10 6

10 7

10 5

10 6

10 7

jSj 8334 15155 27475 2778 5052 9158 756 756 756

Max 0.00035 0.000194 0.000167 0.00027 0.000128 0.000090 0.00095 0.000899 0.000819

1 0.00015 0.000199 0.000091 0.00021 0.000020 0.000077 0.00074 0.000057 0.000618

2 0.00006 0.000050 0.000120 0.00024 0.000056 0.000009 0.00039 0.000259 0.000203

3 0.00006 0.000210 0.000062 0.00010 0.000052 0.000031 0.00010 0.000744 0.000665

4 0.00024 0.000161 0.000001 0.00001 0.000016 0.000005 0.00040 0.000860 0.000002

5 0.00002 0.000033 0.000070 0.00002 0.000092 0.000050 0.00016 0.000494 0.000230

6 0.00022 0.000166 0.000053 0.00012 0.000048 0.000014 0.00027 0.000716 0.000632

7 0.00000 0.000037 0.000085 0.00024 0.000060 0.000066 0.00007 0.000388 0.000488

8 0.00010 0.000084 0.000043 0.00012 0.000096 0.000035 0.00021 0.000829 0.000090

9 0.00019 0.000207 0.000095 0.00006 0.000124 0.000014 0.00033 0.000000 0.000038

10 0.00013 0.000060 0.000100 0.00012 0.000088 0.000050 0.00055 0.000036 0.000354

11 0.00005 0.000098 0.000013 0.00002 0.000000 0.000014 0.00005 0.000542 0.000185

12 0.00004 0.000096 0.000001 0.00008 0.000004 0.000022 0.00017 0.000093 0.000010

13 0.00006 0.000107 0.000045 0.00014 0.000008 0.000044 0.00039 0.000263 0.000220

14 0.00002 0.000116 0.000038 0.00020 0.000008 0.000056 0.00022 0.000732 0.000665

15 0.00003 0.000098 0.000049 0.00023 0.000028 0.000041 0.00008 0.000316 0.000425

Table 2: Spaeand preision measurements for\sorted" ase.

erroroverallpossiblequantilequeries,andhosetoquery15

quantilesatrank q

i

16

N,forqi=[1::15℄,tostudythebehavior

atspeiquantiles.

We ompared three algorithms for onstruting the sum-

mary.First,weusedtheMRLalgorithmtoomputeasum-

mary wherewe prealloated the storagerequired by MRL

asafuntionofN and. Seond,wepre-alloatedthesame

amount of storage required by MRL (1/3 as many stored

quantilesasMRL, though),andranouralgorithm without

alloatinganymorequantiles. Finally,weranouralgorithm

inthe adaptivemode; westarted with 1

2

stored quantiles

andonlyalloatedextrastorageifitwasimpossibletodelete

existingquantileswithoutexeedingapreisionof:001n.

Table 2reportsthe results ofthis experiment. jSj reports

thenumberofstoredquantilesneededtoahievethedesired

preision. Therowlabeled\max"reportsthemaximumer-

rorofallpossiblequantilequeriesonthesummary.Inorder

to give an indiationof the behavior of this algorithm for

speiquantiles,theremainingrowslisttheapproximation

erroroftheresponsetothequeryfortheq

i

=16thquantile.

To interpretthe entries inTable 2, onsider the .5 quan-

tile(50thptile, or8/16). Forasequeneof10 5

elements,

theadaptive algorithm usesonly756tuples, butreturnsa

value withanapproximation error of .00021. MRL stores

overeighttimesasmanyquantiles,andreturnsavaluewith

error .00010, almost twie as aurate. Our prealloated

algorithm stores only one third as many tuples as MRL,

butreturnsavaluewithanapproximationerrorof.00012{

omparableauraybutusing onlyone thirdthe number

oftuples.

Infat,however,theerroronanyindividualquantileisnot

representativeofthe erroras awhole |hadwehosento

inspetthe1/4quantileinsteadof1/2,thenouralgorithm

wouldhavebeen24timesasaurateasMRL!Hadweho-

sen 3/4, thenMRL would have been twie as aurate as

ours. Of the 15 quantiles we sampled, we outperformed

MRLon6outof15for asequeneofsize10 5

,10outof15

forsize10 6

,and11outof15for10 7

.Individualqueriesare

highlysensitivetohowlosethequantilequeryhappensto

beto somesinglestoredquantile. Onaverage,inompari-

sontoMRLusingthesamestorage,ouralgorithmreported

better worst-ase observed error,andomparable observed

error (we performslightlyworsefor N =10 5

, butslightly

better for N = 10 6

and 10 7

). Both algorithms ahieved

higher preision thandemanded by the a priori speia-

tion.

The mostinteresting resultis that our adaptivealgorithm

seemstorequireonly756storedquantiles,regardlessofthe

sizeoftheinputsequene. Closerexperimentationrevealed

that thealgorithm onlyneeds all 756stored quantilesat a

fairly earlystageintheomputation|the exessstorage

redues the observed error, slightly. One an see this by

observing the maximum error in Table 2. For a desired

= :001, one would expet that the maximum observed

errorwouldbeapproximatelyequalto .001,too. However,

for 10 5

the maximum error isonly :000955 and as N gets

largerthemaximumerrorgetssmaller.

The maximumerror oersanother interesting insight into

thebehaviorofouralgorithm. Notethattheoptimalvalue

formaximumerrorinallasesis1=(2jSj)(thisoursonlyif

thestoredquantilesaredistributedevenlyamongallvalues,

and we know their rank preisely). For example, for 756

quantiles,theoptimalmaxerroris.00066. For2778 quan-

tiles, the ideal maximum error is .00018. Our algorithm

deliversamaximumerrorwithinafatorof2ofoptimal. In

ontrast,the optimalmaxerrorof 8334stored quantiles is

5:9910 5,yettheMRLalgorithmdeliversamaxerror6

times as large. Infat, for MRL, thedisrepany between

the idealmaxerrorandobservedmaxerrorseemstogrow

as N (andjSj)getslarger; for N =10 7

,the observedmax

errorismorethan9timestheoptimalvalue.

3.3 Random Input

The third senario, \random", selets eah datum by se-

leting an element (without replaement) from a uniform

(8)

skeweddistribution, butthe orderin whihthe values are

observedbythesummaryishosenbytheuniformrandom

proess.

Asinthesortedase,wexed=:001andsummarizedse-

quenesoflengths10 5

;10 6

,and10 7

. Weagainomputedthe

maximumerror,thequantilesatrank q

i

16 N,forq

i

=[1::15℄,

andmeasuredtheatual maximumstorage requirementto

omputethesummary.Inontrasttothesortedinputase

where a single experiment was suÆient to determine the

expeted behavior, random input requires running several

trialstoilluminateexpetedbehavior. Weraneahexperi-

ment50timesandreportthemin,max,meanandstandard

deviationforeverymeasurement. Tables3through5report

theseresults.

Theobservedofourprealloatedalgorithmisroughlytwie

as aurateas MRL, although ouradvantage seemsto in-

rease steadily as N getslarger. Notsurprisingly,the ob-

servedof ouradaptive algorithmstayslose to 0.001 re-

gardlessofhowlargeN gets. Theobservedstoragerequire-

ments,however, may be surprising. These are one again

the most interesting results of our \random" senario. It

appearsthatforuniformlyrandominputtherequiredspae

isindependentofN,thesizeofthedataset,anddependent

only upon . In all our experiments, a :001-approximate

summaryofarandominputwasahievedwithroughly920

tuples.

4. CONCLUDING REMARKS

Wepresentedanewonlinealgorithmforomputingquantile

summariesofverylargesequenesofdatainaspae-eÆient

manner. Ouralgorithmimprovesupontheearlierresultsin

twosigniantways. First,itimprovesthespaeomplexity

by a fator of (log(N)). Seond, it does not require a

prioriknowledgeoftheparameterN |thatis,italloates

morespaedynamiallyasthedatasequenegrowsinsize.

Anobviousquestioniswhetherornotthespaeomplexity

ahieved by our algorithm is asymptotially optimal. We

believethattheanswerisintheaÆrmativeindeed.

Ourempirialstudyofthenewalgorithmprovidesevidene

that our algorithm ompares favorably with the previous

algorithmsinpratieaswell. Aurioustrend observedin

ourexperimentsisthatonrandominputs,thespaerequire-

mentsofthealgorithmseemonlytodependontheerrorpa-

rameterand beome independent of thesequene length

N. Itwillbeinterestingtoanalytiallyverifythisbehavior

and to understandthe minimalharateristis of the data

sequenesthatlead tosuhimprovedspaerequirements.

5. REFERENCES

[1℄ RakeshAgrawalandArunSwami.Aone-pass

spae-eÆientalgorithmforndingquantiles.Pro.

7thInt.Conf.Managementof Data,COMAD,

28{30Deember1995.

[2℄ KhaledAlsabti,SanjayRanka,andVineetSingh.A

one-passalgorithmfor auratelyestimatingquantiles

fordisk-residentdata.Proeedingsofthe23rdIntl.

CA94022,USA,1997.Morgan KaufmannPublishers.

[3℄ SurajitChaudhuri,RajeevMotwani,andVivek

Narasayya.Randomsamplingforhistogram

onstrution: howmuhisenough? InACMSIGMOD

'98,volume28,pages436{447,Seattle,WA,June1{4,

1998.

[4℄ PhillipB.Gibbons,YossiMatias,andViswanath

Poosala.Fastinrementalmaintenaneofapproximate

histograms.InProeedingsofthe23rdIntl.Conf.Very

LargeDataBases,VLDB,pages466{475.Morgan

Kaufmann,25{27August1997.

[5℄ MihaelB.Greenwald.Pratialalgorithmsforself

salinghistogramsorbetterthanaveragedata

olletion. PerformaneEvaluation,27&28:19{40,

Otober1996.

[6℄ R.JainandI.Chlamta.TheP 2

algorithmfor

dynamialulationofquantileandhistograms

withoutstoringobservations.Communiations ofthe

ACM,28(10):1076{1085, Otober1986.

[7℄ I.Pohl.Aminimumstoragealgorithmforomputing

themedian.IBMResearh ReportRC2701,November

1969.

[8℄ GurmeetSinghManku,SridharRajagopalan, and

BrueG.Lindsay.Approximatemediansandother

quantilesinonepassandwithlimitedmemory.ACM

SIGMOD '98,volume28,pages426{435,Seattle,WA,

June1998.

[9℄ GurmeetSinghManku,SridharRajagopalan, and

BrueG.Lindsay.Randomsamplingtehniquesfor

spaeeÆientonlineomputationoforderstatistisof

largedatasets.InACMSIGMOD '99,volume29,

pages251{262.Philadelphia,PA,June1999.

[10℄ J.I.MunroandM.S.Paterson.Seletionandsorting

withlimitedstorage.TheoretialComputerSiene,

vol. 12: 315{323;1980.

[11℄ M.S.Paterson.Progressinseletion.TehnialReport,

UniversityofWarwik,Coventry,UK,1997.

[12℄ ViswanathPoosala, VenkateshGanti,andYannisE.

Ioannidis.Approximatequeryansweringusing

histograms.BulletinoftheIEEE TehnialCommittee

onDataEngineering,22(4):6{15, Deember1999.

[13℄ ViswanathPoosala, PeterJ.Haas,YannisE.

Ioannidis,andEugeneJ.Shekita.Improved

histograms forseletivityestimationofrange

prediates. InACMSIGMOD96,volume26, pages

294{305,Montreal,Quebe,Canada,June4{6, 1996.

(9)

i

jSj! 8334 2778 [898-939℄,919.188.63

[range(10 4

)℄avgstdev [range(10 4

)℄avgstdev [range(10 4

)℄avgstdev

Max [4.3-5.2℄0.00046982.02e-05 [2.9-2.95℄0.00029200.24e-05 [8.25-8.70℄0.00084870.91e-05

1 [0.0-3.2℄0.00009287.38e-05 [0.1-2.5℄ 0.00010747.19e-05 [0.1-7.8℄0.00032221.88e-04

2 [0.0-3.0℄0.00011307.58e-05 [0.2-2.5℄ 0.00012166.42e-05 [0.1-7.0℄0.00032161.88e-04

3 [0.0-3.5℄0.00011048.86e-05 [0.0-2.7℄ 0.00012207.36e-05 [0.2-7.7℄0.00034062.07e-04

4 [0.0-2.8℄0.00010406.93e-05 [0.0-2.7℄ 0.00012367.44e-05 [0.1-7.6℄0.00029521.98e-04

5 [0.0-3.7℄0.00011728.81e-05 [0.0-2.6℄ 0.00008446.07e-05 [0.1-6.6℄0.00031021.88e-04

6 [0.1-3.0℄0.00010467.69e-05 [0.0-3.3℄ 0.00009127.41e-05 [0.2-6.7℄0.00029861.64e-04

7 [0.2-3.6℄0.00013467.97e-05 [0.0-2.5℄ 0.00010786.45e-05 [0.0-6.9℄0.00030901.89e-04

8 [0.1-3.8℄0.00009828.86e-05 [0.0-3.1℄ 0.00011347.08e-05 [0.0-7.7℄0.00029101.94e-04

9 [0.0-2.7℄0.00012227.37e-05 [0.0-2.5℄ 0.00010747.62e-05 [0.0-6.6℄0.00029101.75e-04

10 [0.0-3.4℄0.00012787.68e-05 [0.0-2.3℄ 0.00009126.01e-05 [0.0-7.0℄0.00027401.69e-04

11 [0.1-3.1℄0.00012047.87e-05 [0.0-2.8℄ 0.00009547.31e-05 [0.1-6.9℄0.00027901.84e-04

12 [0.1-2.4℄0.00010406.83e-05 [0.0-2.4℄ 0.00009406.71e-05 [0.2-8.2℄0.00035662.32e-04

13 [0.0-3.0℄0.00008786.83e-05 [0.0-2.3℄ 0.00011146.49e-05 [0.2-7.6℄0.00034462.01e-04

14 [0.0-3.1℄0.00009828.05e-05 [0.0-2.5℄ 0.00011966.80e-05 [0.4-8.2℄0.00034241.99e-04

15 [0.0-2.8℄0.00010007.12e-05 [0.0-2.8℄ 0.00013308.24e-05 [0.1-6.2℄0.00029521.86e-04

Table3: N=100;000;Samples= 50;random order.

q

i

# MRL OurAlgorithm,Prealloated OurAlgorithm,Adaptive

jSj! 15155 5052 [900-939℄ 919.388.92

[range(10 4

)℄avgstdev [range(10 4

)℄avgstdev [range(10 4

)℄avgstdev

Max [3.02-3.63℄0.00032751.44e-05 [1.495-1.520℄15.04e-050.06e-05 [7.835-8.215℄0.00080040.82e-05

1 [0.02-3.00℄0.00011947.88e-05 [0.05-1.41℄5.41e-053.37e-05 [0.00-7.78℄0.00031732.12e-04

2 [0.09-3.19℄0.00012487.69e-05 [0.04-1.41℄5.79e-053.65e-05 [0.06-6.94℄0.00032591.80e-04

3 [0.01-2.90℄0.00012537.27e-05 [0.01-1.28℄5.73e-053.71e-05 [0.15-7.11℄0.00031721.87e-04

4 [0.01-2.71℄0.00010927.47e-05 [0.02-1.43℄5.57e-053.46e-05 [0.07-7.04℄0.00035461.97e-04

5 [0.12-2.84℄0.00012607.44e-05 [0.03-1.36℄5.45e-053.59e-05 [0.02-7.06℄0.00029071.78e-04

6 [0.01-3.20℄0.00009847.68e-05 [0.01-1.22℄5.89e-053.26e-05 [0.29-6.57℄0.00029721.76e-04

7 [0.01-2.79℄0.00012567.52e-05 [0.01-1.38℄5.03e-053.58e-05 [0.09-6.30℄0.00029511.60e-04

8 [0.05-3.27℄0.00012996.03e-05 [0.01-1.21℄4.55e-053.37e-05 [0.11-7.10℄0.00028921.73e-04

9 [0.22-3.27℄0.00012687.75e-05 [0.05-1.24℄5.88e-053.57e-05 [0.04-7.15℄0.00030152.04e-04

10 [0.13-3.74℄0.00013898.64e-05 [0.03-1.61℄7.14e-053.88e-05 [0.02-7.07℄0.00029242.04e-04

11 [0.09-3.01℄0.00014317.67e-05 [0.00-1.38℄5.81e-053.58e-05 [0.11-6.43℄0.00029892.01e-04

12 [0.03-3.32℄0.00014468.64e-05 [0.00-1.46℄4.86e-053.33e-05 [0.20-6.71℄0.00033781.66e-04

13 [0.04-2.84℄0.00013397.25e-05 [0.00-1.34℄5.30e-053.42e-05 [0.04-6.69℄0.00031281.70e-04

14 [0.04-2.74℄0.00012888.91e-05 [0.03-1.43℄5.65e-053.60e-05 [0.02-7.03℄0.00031461.86e-04

15 [0.02-2.92℄0.00012848.82e-05 [0.02-1.67℄5.45e-053.86e-05 [0.05-6.46℄0.00027971.72e-04

Table 4: N =1;000;000; Samples=50; randomorder.

qi# MRL OurAlgorithm,Prealloated OurAlgorithm,Adaptive

jSj! 27475 9158 [899-939℄918.428.71

[range(10 4

)℄avgstdev [range(10 4

)℄avgstdev [range(10 4

)℄avgstdev

Max [2.032-2.641℄2.35e-041.18e-05 [0.799-0.806℄8.01e-051.8e-07 [7.628-8.016℄7.82e-049.75e-06

1 [0.026-1.466℄4.98e-053.29e-05 [0.002-0.712℄2.74e-051.96e-05 [0.187-6.123℄2.87e-041.65e-04

2 [0.022-1.922℄6.32e-054.98e-05 [0.001-0.764℄2.94e-052.22e-05 [0.166-6.814℄3.04e-041.80e-04

3 [0.019-1.750℄5.90e-054.62e-05 [0.002-0.656℄2.93e-051.80e-05 [0.008-7.040℄3.68e-041.91e-04

4 [0.024-1.953℄6.19e-054.37e-05 [0.003-0.615℄2.98e-051.65e-05 [0.096-7.149℄2.98e-041.81e-04

5 [0.022-1.892℄7.02e-055.03e-05 [0.011-0.722℄2.99e-051.63e-05 [0.111-7.297℄2.56e-041.80e-04

6 [0.026-1.766℄6.61e-054.65e-05 [0.008-0.655℄2.60e-051.86e-05 [0.021-6.618℄3.27e-041.72e-04

7 [0.038-1.987℄5.75e-054.33e-05 [0.025-0.688℄3.30e-051.63e-05 [0.009-5.620℄2.14e-041.47e-04

8 [0.004-1.801℄5.69e-054.29e-05 [0.006-0.712℄2.69e-052.01e-05 [0.043-7.718℄3.17e-041.96e-04

9 [0.012-2.252℄6.47e-054.19e-05 [0.003-0.675℄2.90e-051.83e-05 [0.116-7.167℄2.83e-041.93e-04

10 [0.011-1.840℄6.11e-054.28e-05 [0.006-0.649℄2.64e-051.67e-05 [0.050-7.225℄3.09e-041.83e-04

11 [0.010-1.640℄6.67e-054.41e-05 [0.005-0.727℄2.99e-051.78e-05 [0.231-6.606℄2.60e-041.66e-04

12 [0.013-1.847℄6.09e-054.69e-05 [0.013-0.686℄2.68e-051.71e-05 [0.018-6.639℄2.95e-041.51e-04

13 [0.005-1.747℄5.80e-053.87e-05 [0.015-0.680℄2.82e-051.93e-05 [0.014-6.518℄3.06e-041.90e-04

14 [0.026-1.853℄7.12e-055.07e-05 [0.000-0.671℄3.43e-051.84e-05 [0.051-7.385℄2.69e-041.99e-04

15 [0.022-1.510℄5.57e-053.56e-05 [0.019-0.775℄2.91e-051.83e-05 [0.029-6.415℄2.74e-041.80e-04

Table5: N=10;000;000;Samples= 50;random order.

Referenzen

ÄHNLICHE DOKUMENTE

However, a relatively strong positive relationwas also observed in the region combining the highest quantiles of tourism growth (0.85-0.90) with the lowest quantiles

[r]

In Chapter 3, motivated by applications in economics like quantile treatment ef- fects, or conditional stochastic dominance, we focus on the construction of confidence corridors

conditional quantile, time series, sieve estimate, neural network, qualitative threshold model, uniform consistency, value at risk.. JEL classification: C14, C45

Another aspect indicating time series of λ as a measure of systemic risk is its dependency on inter- connectedness of financial institutions, which can be measured by the number

In attempting to assess the link between equity market conditions and the distribution of oil price shocks, the study is able to assess what would happen to the performance of equity

To address these important but challenging empirical questions, we introduce a new het- erogeneous panel quantile model with factor structures, in which a few unobservable factors

On the other hand, to avoid too many parameters to estimate and data sparsity, we apply a novel method – functional data analysis (FDA) combin- ing least asymmetric weighted