Prediction of 0-1-events for short- and long-memory time series

(1)

JanBeran

Abstract. Theproblem of predicting0-1-eventsis considered under general

conditions,includingstationaryprocesseswithshortandlongmemoryaswell

asprocesseswithchangingdistributionpatterns.Nonparametricestimatesof

theprobability functionandpredictionintervalsareobtained.

Keywords:0-1-events,long-rangedependence,short-rangedependence,antipersis-

tence,kernelsmoothing,bandwidth,prediction

1. The general problem

In time series applications, the main concern is sometimes to predict whether a

certaineventwill occurornot.For instance, in nance,adecisionmaybebased

ontheprobabilitythatastockpricestayswithincertainbounds; inmeteorology,

we may want to know whether certain disastrous weather conditions are likely

to occuror not etc. This motivates the following problem: Let X

t

(t 2 N) be a

stochastic process onaprobabilityspace (;A;P) where R N

is asubspace

ofrealvaluedfunctions onN andAisasuitable algebra.Foraxedk2N

+ ,

and

1

;:::;

k

2N;letA

1;:::;k

(i)2Abesuchthat

A

1;:::;k

(i)=f!:(X

i+1

;X

i+2

;:::;X

i+k )2Bg;

forsomeB R k

and

p=p(i;

1

;:::;

k

)=P(A

1

;:::;

k (i))

the probability of this event. The general question is now: Given observations

X

1

;:::;X

n

;howcanweestimatep;withoutmakingtoostrongassumptionsonthe

unknownunderlyingprobabilitydistributionP:

2. Specic assumptions

Thefollowingassumptionswillbeused: Let

(1) Y

i

=1fA

;:::;

(i)g

(2)

and

(2) p=p(t

i

)=P(Y

i

=1):

TheprocessY

i

isassumedto havethefollowingproperties:

(A1)

(3) Z

i

= Y

i p(t

i )

p

p(t

i

)(1 p(t

i ))

isasecondorderstationaryprocesswithautocovariances(k)andspectral

densityf()=(2) 1

P

1

k = 1

exp(ik)(k):

(A2)Thespectraldensityiscontinuousin[ ;0)[(0;]andattheorigin

wehave

(4) f()c

f jj

2d

(jj!0)

foraconstantc

f

>0andd2( 1

2

; 1

2

); where""meansthat theratio

oftheleftandrighthand sideconvergesto one.

(A3) p2C 2

[0;1]

(A4) sup

0<x<1 max

j=0;1;2 jp

(j)

(x)j C

1

<1 where p (j)

denotes the j'th

derivativeofp:

(A5)jp 00

(x) p 00

(y)jC

2 jx yj

forallx;y2[0;1];constantsC

1

;C

2

<1;

andsome2(2;3]:

(A6) Foragiven 2 (0;

1

2 ), sup

t2[;1 ] jp

(l+1)

(t)j >0for at least one

l2f0;1gandp (l)

achievesanabsolutemaximumorminimumin[;1 ]:

Remarks:

1. SinceY

i

isa0-1-process,wehavevar(Y

i )=p(t

i

)(1 p(t

i

))sothatY

i p(t

i )

cannotbestationary.Therefore,thestandardizedprocessZ

i

isconsidered.

2. Z

i

can besecondorderstationary evenifneithertheX

i norX

i E(X

i )

are stationary. Forinstance, letX

i

be iid with xed quantile q

but

arbitrarydistributions F

i

thatdier,forinstance,intheirvariance.Then

X

i

is not second order stationary, in contrast to the 0-1-process Y

i

=

1fA

(i)gwithA

(i)=fX

i+ 1

>q

g.

3. Three cases can be distinguished (see e.g. Beran 1994 and references

therein):

(a) Short memory: d = 0;f is continuous in the whole interval [ ;]

and06=

P

(k)<1;

(b) Longmemory:d>0;f isinniteatzeroand P

(k)=1;

(c) Antipersistence:d<0;f(0)=0and P

(k)=0:

4. Theassumptionsinclude inparticularthespecial casewhere theoriginal

processitselfconsistsof0-1-variables,i.e.whereY

i

=X

i :

(3)

3. Estimation of p

Undertheassumptionsgivenabove,theestimationproblemconsistsofestimating

a smooth function p(t); where 0 p 1: If the distribution of the process X

i

is known, except for a nite dimensional parameter vector ; then the optimal

method isto estimate fromtheoriginal observationsX

i

(forinstanceby maxi-

mumlikelihood)andsetp(t)b =p(t;

b

):Here,weaddresstheproblemofestimating

p;whenonlytheassumptionsgivenin theprevioussectionareknown.Notethat

theseareassumptionsontheprocessY

i

-noknowledgeaboutthedistributionof

theoriginalprocessX

i

is needed.Thus, weconsiderestimationof p(t)(t2[0;1])

wherep(t

i

)=E(Y

i );Y

i

2f0;1gand

(5) Y

i

=p(t

i )+

p

p(t

i

)(1 p(t

i ))Z

i

where Z

i

is astationary zero meanprocessas dened in (A1). Wewill consider

kernelestimation of p: LetK : [ 1;1]! R

+

bea positivesymmetric function

withsupport[ 1;1]andb>0abandwidth,thenwedene

(6) p(t;b b)=

1

nb n

X

i=1 K(

t t

i

b )Y

i :

Thegeneralproblemofestimating asmoothfunctionfrom dataoftheform

(7) Y

i

=(t

i )+Z

i

has been considered by various authors for the case where the error process Z

i

isstationarywith (i)short-rangedependence (seee.g.Chiu,1989;Altman, 1990;

Hall and Hart, 1990; Herrmann, Gasserand Kneip, 1992) or(ii) long-range de-

pendence,i.e. 0 < <1 (see e.g.Hall and Hart, 1990; Csorgo and Mielniczuk,

1995;RayandTsay,1997)or(iii)antipersistence(BeranandFeng2002a).Beran

andFeng(2002a,b,c)considerthemoregeneralcasewhereitisnotknownapriori

whetherZ

i

isstationary(includingantipersistenceaswellasshort-andlong-range

dependence)ornonstationary.

The essential question to be solved is how to choose the bandwidth b op-

timally. Note that,in contrastto theusual setup, for 0-1-processesthe variance

of the error process is related to the mean function and the mean function is

boundedfrombelowandabove.Onemaythereforeeitherestimatepitself,under

the constraint 0 p 1 or one may instead estimate a suitable transforma-

tion of p: Obvious transformations are, for instance, the logistic transformation

g(p)=log[p=(1 p)] or thevariancestabilizingtransformation g(p)=arcsin p

p:

Asymptotically,thechoiceofg doesnotinuencebandwidthselection,ifthecri-

terion isthemeansquarederror.Thisfollowsfromstandardarguments:Assume

that bp(t;b

n

)(n2N)isa(weakly)consistentsequenceofestimatesof p(t):Then

g(bp)=g(p)+g 0

(p)(bp p)+o

p

(bp p)sothat(undersuitableregularityconditions

onthesequencep)b wehaveMSE(g(bp))=Ef[g(bp) g(p)]

2

g=[g 0

(p)]

2

MSE(bp)+r

where risof smallerorderthan MSE(bp):Since g 0

(p) isaconstant,independent

(4)

minimizingMSE(bp):Inthefollowing,wethususethemeansquarederrorofpbas

acriterionforchoosingb:

4. Asymptotically optimalbandwidth choice

Inthissection,asymptoticexpressionsforthemeansquarederrorandtheasymp-

toticallyoptimalbandwidtharegiven.UsingthenotationsI(p 00

)= R

1

[p

00

(t)]

2

dt

and I(K)= R

1

1 x

2

K(x)dx; the followingresults canbederivedin asimilar way

asinBeranandFeng(2002a)bytakingintoaccounttheheteroskedasticityfactor

proportionalw(t)=p(t)(1 p(t)):

Theorem1. Letb

n

>0be asequenceof bandwidths suchthat b

n

!0andnb

n

!

1;then wehave

(i): Bias:

(8) E[bp(t) p(t)]=b

2

n p

00

(t)I(K)

2

+o(b 2

n )

uniformlyin <t<1 ;

(ii): Variance:

(9) (nb

n )

1 2d

var(bp(t))=w(t)V()+o(1)

uniformlyin <t<1 where0<V()<1 isaconstant;

(iii): IMSE:The integratedmeansquarederror in[;1 ]isgiven by

Z

1

Ef[bp(t) p(t)]

2

gdt=IMSE

asympt (n;b

n

)+o(max(b 4

n

;(nb

n )

2d 1

))

(10) =b 4

n I(g

00

)I 2

(K)

4

+(nb

n )

2d 1

V() Z

1

w(t)dt+o(max (b 4

n

;(nb

n )

2d 1

))

(v): Optimalbandwidth:ThebandwidththatminimizestheasymptoticIMSE

isgiven by

(11) b

opt

=C

opt n

(2d 1)=(5 2d)

where

(12) C

opt

=C

opt ()=[

(1 2d)V() R

1

w(t)dt

I(g 00

)I 2

(K)

] 1=(5 2d)

:

Similarresults canbeobtainedfor kernelestimatesof derivativesof p: For

instance, the second derivative canbe estimated by pb 00

(t) = n 1

b 3

P

K((t

j

t)=b)Y

j

whereKisasymmetrickernelsuchthat R

K(x)dx=0and R

K(x)x 2

dx=

2: The optimal bandwidth for estimating the second derivative is of the order

O(n

(2d 1)=(9 2d)

):TheasymptoticexpressionV()canbegivenexplicitlyford=0

andd>0:

(13) V()=2c

f Z

1

K 2

(x)dx; (d=0);

(5)

(14) V()=2c

f

(1 2d)sind 1

1 1

1

K(x)K(y)jx yj 2d 1

dxdy; (d>0):

Ford<0;ageneralsimpleformula forV doesnotseemto beavailable.,except

inspecialcases.Forthebox-kernel,weobtain(seeBeranand Feng2002a)

Corollary1. LetK(x)= 1

2

1fx2[ 1;1]g: Dene

(15) (d)=

2 2d

(1 2d)sin(d)

d(2d+1)

with (0) = lim

d!0

(d) = : Then, under the assumptions of Theorem 1, we

have

(i): Bias:

(16) E[bp(t) p(t)]=b

2

n p

00

(t)

6

+o(b 2

n );

(ii): Variance:

(17) var(bp(t))=(nb

n )

2d 1

(d)c

f

w(t)+o((nb

n )

2d 1

);

(iii): IMSE:

Z

1

Ef[bp(t) p(t)]

2

gdt=b 4

n I(p

00

)

36

+(nb

n )

2d 1

(d)c

f W

(18) +o(max(b

4

n

;(nb

n )

2d 1

))

whereW = R

1

w(t)dt:

(iv): Optimalbandwidth:

(19) b

opt

=C

opt n

(2d 1)=(5 2d)

with

(20) C

opt

=[

9(1 2d)(d)c

f W

I(g 00

) ]

1=(5 2d)

5. Data driven bandwidth choice

An iterative algorithm for choosing the bandwidth for a modell with a smooth

trend function and a stationary or nonstationary error processes Z

i

is dened

in Beran and Feng (2002a,b).The errorprocess is modelled by a(possibly inte-

grated)Gaussianfractional ARIMA process(Granger andJoyeux1980,Hosking

1981).BeranandFengproveconvergenceofthealgorithmandprovidenitesam-

plemodicationstoimproveitsperformance forshort series.Convergenceof the

algorithmreliesonconsistencyoftheestimateofthespectraldistributionf:Fora

0-1-processY

i

;thespectraldistributionfunctioncanbeestimatedconsistentlyby

(6)

identicalwiththespectraldensityofafractionalARIMAprocess.Wethusassume

thefollowingadditionalassumption (A7):

(21) f()=

2

"

2

(e

i

)

(e i

)

2

j1 e i

j 1 2d

for some 1

2

<d <

1

2

: Here,(x) and (x) are polynomials of nite orders m

1

andm

2

respectivelywithrootsoutsidetheunitcircle.

Asuitablemodicationofthealgorithmin BeranandFeng (2002a,b,c)can

now be dened.(Notethat in Beranand Feng(2002c),m

1

is set equalto zero.)

Themainstepsofthealgorithm areasfollows:

Algorithm:

Step 1: Set j = 1; dene a maximal autoregressive order M and an initial

bandwidthb

o

;andcarryoutSteps2to5foreachm

2

2f0;1;:::;Mg:

Step 2:EstimatepandZ

i

usinganthebandwidthb

j 1

;

Step 3:Estimatef bymaximumlikelihood (Beran1995).

Step 4:Giventheestimated spectraldensityf;calculate anewoptimalband-

widthb

j

;set j=j+1andanewoptimalbandwidthforestimatingp 00

:

Step 5:Stop,ifthechangeinthebandwidthdoesnotexceedacertainbound.

OtherwisegotoStep2.

Step 6: Select thesolution that minimizesa consistentmodel choice criterion

suchastheBIC(Schwarz1978,Beranet al.1998).

6. Testing and Prediction

An approximate pointwisetestfor testing thenullhypothesisH

o

:p(t) const,

canbe obtainedby deningthe rejectionregionjbp(t)j>c

2

^

v where ^v isequalto

thesquare root of (nb

n )

2 b

d 1

( b

d)c

f b

p(t)(1 p(t))b and c

2

is the(1

2

) quantile

ofthestandardnormaldistribution.Notethat,alternatively,wemaytestthenull

hypothesis H

o :p

0

(t)0:

PredictionofY

n+k fromY

1

;:::;Y

n

reducestopredictingthesuccesprobability

p(1+k=n): Beranand Ocker(1999)proposeapredictionmethod in thecontext

of generalnonparametric trendfunctions that is basedon Taylorexpansionand

optimallinearpredictionofthestochasticcomponent.Thisapproachcan,inprin-

ciple, be carried over to forecasts of p(1+k=n): Note, however, that this may

leadtovaluesoutsideoftheinterval[0;1]:Asanalternative,onemayextrapolate

a suitable transformation of p: Morespecically, let g : [0;1] ! R be a one-to-

onemonotonic function such that lim

x!1

g(x) =1 and lim

x! 1

g(x)= 1:

Then g(p(1+k=n)) maybeapproximated byg(1)+g

(1) k

n

. The predictedvalue

of pisp(1b +k=n)=g 1

[g(1)+g

(1) k

n

]. Inthecontextof quantileestimation for

certainlong-memoryprocesses,thisapproachis used,forinstance, inGhoshand

(7)

7. Data examples

Themethod introducedherecanbeused to explorevariouslinearand nonlinear

properties of timeseries. This is illustratedbythe following application to daily

values of three stock market indices between january1, 1992 and november10,

1995. Theindices are: FTSE 100(gure 1), CAC(gure 2) and theSwiss Bank

CorporationIndex(gure3). Weconsider theevent

(22) A(i)=f!:X

i+20

>X

i

and min

s=1;:::;20 X

i+s

>0:9X

i g:

TheeventA(i)meansthat inonemonth(20workdays)theindexwillbehigher

thanthe initialvalueX

i

andduring this one-month period itneverdropsbelow

90%of X

i

. Theestimated probability functions p(t

i

)=P(A(i)) aredisplayedin

gures1c,2c and3c respectively. Theshadedareascorrespondto timestretches

where p is signicantly dierent from a constant. (fully shaded area for "high

level" andareashadedwith linesfor "lowlevel").Although thecriticallimitsto

test for non-constant p(t) are pointwise limits only, the similar patterns for all

threeseriessupporttheconjecturethatpisnotconstant.Inparticular,thereisa

period(aroundobservation400)wherepisconsiderablyhigherforallthreeseries.

(8)

Acknowledgment

Thisresearchwassupported inpartbytheCenterofFinance andEconometrics,

UniversityofKonstanz,Germany.IwouldalsoliketothankDr.DirkOcker(Swiss

Unionof Raieisenbanks)andDr. ElkeM.Hennig(Citibank,Frankfurt)forpro-

vidingmewiththedata.Thanksgoalsotoarefereeforcommentsthathelpedto

improvethepresentationoftheresults.

References

: Altman,N.S.1990.Kernelsmoothingofdatawithcorrelatederrors.J.Am.

Statist.Assoc. 85,749-759.

: Beran,J. 1994. Statisticsfor long-memoryprocesses.Chapman andHall,

NewYork.

: Beran,J.1995.Maximumlikelihoodestimationofthedierencingparame-

terforinvertibleshort-andlong-memoryARIMAmodels.J.Roy.Statist.

Soc. B57,No.4,695-672.

: Beran, J.,Bhansali, R.J.and Ocker,D. 1998. On unied model selection

for stationary and nonstationaryshort-and long-memoryautoregressive

processes.Biometrika Vol.85,No.4,921-934.

: Beran,J., Feng,Y. (2002a) `SEMIFAR models -asemiparametric frame-

work for modelling trends, long-rangedependence and nonstationarity'.

ComputationalStatistics&DataAnalysisg(inpress).

: Beran,J.and Feng,Y. (2002b)`Data drivenbandwidthchoice forSEMI-

FAR models'. Journal of Computational and Graphical Statisticsg (in

press).

: Beran,J.andFeng,Y.(2002c)`Localpolynomialttingwithlongmemory,

shortmemoryandantipersistenterrors'.Annals ofStatisticalMathemat-

ics(inpress).

: Beran, J.and Ocker,D. (1999) SEMIFAR forecasts,with applications to

foreignexchangerates'.JournalofStatisticalPlanningandInferenceg,80,

137-153.

: Chiu,S.T. 1989.Bandwidthselectionfor kernelestimateswith correlated

noise.Statist.Probab.Lett. 8,347-354.

: Csorgo,S. andMielniczuk,J.1995.Nonparametricregressionunder long-

rangedependentnormalerrors.Ann.Statist. 23,1000-1014.

: Ghosh,S.andDraghicescu,D.(2001)Predictingthedistributionfunction

forlong-memoryprocesses.InternationalJournalofForecasting(inpress).

: Granger,C.W.J., Joyeux,R.1980.An introductionto long-rangetimese-

riesmodelsandfractionaldierencing.J.TimeSeriesAnal. 115-30.

: Hall, P.and Hart,J. 1990. Nonparametricregressionwith long-rangede-

pendence.Stoch.ProcessesAppl. 36,339-351.

: Herrmann, E., Gasser, T. and Kneip, A. 1992. Choice of bandwidth for

kernelregressionwhenresidualsarecorrelated.Biometrika 79, 783-795.

(9)

: Ray,B.K.and Tsay,R.S.1997. Bandwidth selectionfor kernelregression

withlong-rangedependence.Biometrika 84,791{802.

: Schwarz,G.1978. Estimatingthedimension ofamodel. Ann. Statist. 6,

461-464.

DepartmentofMathematicsandStatistics,UniversityofKonstanz,Universitatsstr.

10,Postfach5560,78457Konstanz,Germany

(10)

day

x(t)

0 200 400 600 800

0 500 1000

FTSE 100 - original series

day

x(t)-x(t-1)

0 200 400 600 800

-100 -50 0 50

FTSE 100 - differenced series

day

p(t)

0 200 400 600 800

0.0 0.2 0.4 0.6 0.8 1.0

Figure 1 c: FTSE 100

fitted p(t)=P(X(t+20)>X(t),min[X(t),...,X(t+20)]>0.9*X(t-2)

Figure1. FTSE100betweenjanuary1,1992andnovember10,

1995-originalseries(gure1a),dierencedseries(gure1b)and

estimated probability function p(i=n) = P(A(i)) where A(i) =

f! : X

i+20

>X

i

and min

s=1;:::;20 X

i+s

>0:9X

i

g: Periods with

signicantdepartures from H

o

:pconstare shadedwith lines

(forpbelowcriticalbound)andfullyshaded(forpabovecritical

bound) respectively.

(11)

day

x(t)

0 200 400 600 800

0 200 400 600

day

x(t)-x(t-1)

0 200 400 600 800

-60 -20 20 60

day

p(t)

0 200 400 600 800

0.0 0.2 0.4 0.6 0.8 1.0

Figure 2 c: CAC

fitted p(t)=P(X(t+20)>X(t),min[X(t),...,X(t+20)]>0.9*X(t-2)

Figure2. CACbetweenjanuary1,1992andnovember10,1995

-original series (gure 1a),dierenced series (gure 1b)and es-

timatedprobabilityfunctionp(i=n)=P(A(i))whereA(i)=f!:

X

i+20

> X

i

and min

s=1;:::;20 X

i+s

> 0:9X

i

g: Periods with sig-

nicant departures from H

o

: p const are shaded with lines

(12)

day

x(t)

0 200 400 600 800

0 100 200 300 400 500

day

x(t)-x(t-1)

0 200 400 600 800

-20 -10 0 10 20

day

p(t)

0 200 400 600 800

0.0 0.2 0.4 0.6 0.8 1.0

Figure 3 c: SBC

fitted p(t)=P(X(t+20)>X(t),min[X(t),...,X(t+20)]>0.9*X(t-2)

Figure3. SBCbetweenjanuary1,1992andnovember10,1995

-original series (gure 1a),dierenced series (gure 1b)and es-

timatedprobabilityfunctionp(i=n)=P(A(i))whereA(i)=f!:

X

i+20

> X

i

and min

s=1;:::;20 X

i+s

> 0:9X

i

g. Periods with sig-

nicant departures from H

o

: p const are shaded with lines