JanBeran
Abstract. Theproblem of predicting0-1-eventsis considered under general
conditions,includingstationaryprocesseswithshortandlongmemoryaswell
asprocesseswithchangingdistributionpatterns.Nonparametricestimatesof
theprobability functionandpredictionintervalsareobtained.
Keywords:0-1-events,long-rangedependence,short-rangedependence,antipersis-
tence,kernelsmoothing,bandwidth,prediction
1. The general problem
In time series applications, the main concern is sometimes to predict whether a
certaineventwill occurornot.For instance, in nance,adecisionmaybebased
ontheprobabilitythatastockpricestayswithincertainbounds; inmeteorology,
we may want to know whether certain disastrous weather conditions are likely
to occuror not etc. This motivates the following problem: Let X
t
(t 2 N) be a
stochastic process onaprobabilityspace (;A;P) where R N
is asubspace
ofrealvaluedfunctions onN andAisasuitable algebra.Foraxedk2N
+ ,
and
1
;:::;
k
2N;letA
1;:::;k
(i)2Abesuchthat
A
1;:::;k
(i)=f!:(X
i+1
;X
i+2
;:::;X
i+k )2Bg;
forsomeB R k
and
p=p(i;
1
;:::;
k
)=P(A
1
;:::;
k (i))
the probability of this event. The general question is now: Given observations
X
1
;:::;X
n
;howcanweestimatep;withoutmakingtoostrongassumptionsonthe
unknownunderlyingprobabilitydistributionP:
2. Specic assumptions
Thefollowingassumptionswillbeused: Let
(1) Y
i
=1fA
;:::;
(i)g
and
(2) p=p(t
i
)=P(Y
i
=1):
TheprocessY
i
isassumedto havethefollowingproperties:
(A1)
(3) Z
i
= Y
i p(t
i )
p
p(t
i
)(1 p(t
i ))
isasecondorderstationaryprocesswithautocovariances(k)andspectral
densityf()=(2) 1
P
1
k = 1
exp(ik)(k):
(A2)Thespectraldensityiscontinuousin[ ;0)[(0;]andattheorigin
wehave
(4) f()c
f jj
2d
(jj!0)
foraconstantc
f
>0andd2( 1
2
; 1
2
); where""meansthat theratio
oftheleftandrighthand sideconvergesto one.
(A3) p2C 2
[0;1]
(A4) sup
0<x<1 max
j=0;1;2 jp
(j)
(x)j C
1
<1 where p (j)
denotes the j'th
derivativeofp:
(A5)jp 00
(x) p 00
(y)jC
2 jx yj
forallx;y2[0;1];constantsC
1
;C
2
<1;
andsome2(2;3]:
(A6) Foragiven 2 (0;
1
2 ), sup
t2[;1 ] jp
(l+1)
(t)j >0for at least one
l2f0;1gandp (l)
achievesanabsolutemaximumorminimumin[;1 ]:
Remarks:
1. SinceY
i
isa0-1-process,wehavevar(Y
i )=p(t
i
)(1 p(t
i
))sothatY
i p(t
i )
cannotbestationary.Therefore,thestandardizedprocessZ
i
isconsidered.
2. Z
i
can besecondorderstationary evenifneithertheX
i norX
i E(X
i )
are stationary. Forinstance, letX
i
be iid with xed quantile q
but
arbitrarydistributions F
i
thatdier,forinstance,intheirvariance.Then
X
i
is not second order stationary, in contrast to the 0-1-process Y
i
=
1fA
(i)gwithA
(i)=fX
i+ 1
>q
g.
3. Three cases can be distinguished (see e.g. Beran 1994 and references
therein):
(a) Short memory: d = 0;f is continuous in the whole interval [ ;]
and06=
P
(k)<1;
(b) Longmemory:d>0;f isinniteatzeroand P
(k)=1;
(c) Antipersistence:d<0;f(0)=0and P
(k)=0:
4. Theassumptionsinclude inparticularthespecial casewhere theoriginal
processitselfconsistsof0-1-variables,i.e.whereY
i
=X
i :
3. Estimation of p
Undertheassumptionsgivenabove,theestimationproblemconsistsofestimating
a smooth function p(t); where 0 p 1: If the distribution of the process X
i
is known, except for a nite dimensional parameter vector ; then the optimal
method isto estimate fromtheoriginal observationsX
i
(forinstanceby maxi-
mumlikelihood)andsetp(t)b =p(t;
b
):Here,weaddresstheproblemofestimating
p;whenonlytheassumptionsgivenin theprevioussectionareknown.Notethat
theseareassumptionsontheprocessY
i
-noknowledgeaboutthedistributionof
theoriginalprocessX
i
is needed.Thus, weconsiderestimationof p(t)(t2[0;1])
wherep(t
i
)=E(Y
i );Y
i
2f0;1gand
(5) Y
i
=p(t
i )+
p
p(t
i
)(1 p(t
i ))Z
i
where Z
i
is astationary zero meanprocessas dened in (A1). Wewill consider
kernelestimation of p: LetK : [ 1;1]! R
+
bea positivesymmetric function
withsupport[ 1;1]andb>0abandwidth,thenwedene
(6) p(t;b b)=
1
nb n
X
i=1 K(
t t
i
b )Y
i :
Thegeneralproblemofestimating asmoothfunctionfrom dataoftheform
(7) Y
i
=(t
i )+Z
i
has been considered by various authors for the case where the error process Z
i
isstationarywith (i)short-rangedependence (seee.g.Chiu,1989;Altman, 1990;
Hall and Hart, 1990; Herrmann, Gasserand Kneip, 1992) or(ii) long-range de-
pendence,i.e. 0 < <1 (see e.g.Hall and Hart, 1990; Csorgo and Mielniczuk,
1995;RayandTsay,1997)or(iii)antipersistence(BeranandFeng2002a).Beran
andFeng(2002a,b,c)considerthemoregeneralcasewhereitisnotknownapriori
whetherZ
i
isstationary(includingantipersistenceaswellasshort-andlong-range
dependence)ornonstationary.
The essential question to be solved is how to choose the bandwidth b op-
timally. Note that,in contrastto theusual setup, for 0-1-processesthe variance
of the error process is related to the mean function and the mean function is
boundedfrombelowandabove.Onemaythereforeeitherestimatepitself,under
the constraint 0 p 1 or one may instead estimate a suitable transforma-
tion of p: Obvious transformations are, for instance, the logistic transformation
g(p)=log[p=(1 p)] or thevariancestabilizingtransformation g(p)=arcsin p
p:
Asymptotically,thechoiceofg doesnotinuencebandwidthselection,ifthecri-
terion isthemeansquarederror.Thisfollowsfromstandardarguments:Assume
that bp(t;b
n
)(n2N)isa(weakly)consistentsequenceofestimatesof p(t):Then
g(bp)=g(p)+g 0
(p)(bp p)+o
p
(bp p)sothat(undersuitableregularityconditions
onthesequencep)b wehaveMSE(g(bp))=Ef[g(bp) g(p)]
2
g=[g 0
(p)]
2
MSE(bp)+r
where risof smallerorderthan MSE(bp):Since g 0
(p) isaconstant,independent
minimizingMSE(bp):Inthefollowing,wethususethemeansquarederrorofpbas
acriterionforchoosingb:
4. Asymptotically optimalbandwidth choice
Inthissection,asymptoticexpressionsforthemeansquarederrorandtheasymp-
toticallyoptimalbandwidtharegiven.UsingthenotationsI(p 00
)= R
1
[p
00
(t)]
2
dt
and I(K)= R
1
1 x
2
K(x)dx; the followingresults canbederivedin asimilar way
asinBeranandFeng(2002a)bytakingintoaccounttheheteroskedasticityfactor
proportionalw(t)=p(t)(1 p(t)):
Theorem1. Letb
n
>0be asequenceof bandwidths suchthat b
n
!0andnb
n
!
1;then wehave
(i): Bias:
(8) E[bp(t) p(t)]=b
2
n p
00
(t)I(K)
2
+o(b 2
n )
uniformlyin <t<1 ;
(ii): Variance:
(9) (nb
n )
1 2d
var(bp(t))=w(t)V()+o(1)
uniformlyin <t<1 where0<V()<1 isaconstant;
(iii): IMSE:The integratedmeansquarederror in[;1 ]isgiven by
Z
1
Ef[bp(t) p(t)]
2
gdt=IMSE
asympt (n;b
n
)+o(max(b 4
n
;(nb
n )
2d 1
))
(10) =b 4
n I(g
00
)I 2
(K)
4
+(nb
n )
2d 1
V() Z
1
w(t)dt+o(max (b 4
n
;(nb
n )
2d 1
))
(v): Optimalbandwidth:ThebandwidththatminimizestheasymptoticIMSE
isgiven by
(11) b
opt
=C
opt n
(2d 1)=(5 2d)
where
(12) C
opt
=C
opt ()=[
(1 2d)V() R
1
w(t)dt
I(g 00
)I 2
(K)
] 1=(5 2d)
:
Similarresults canbeobtainedfor kernelestimatesof derivativesof p: For
instance, the second derivative canbe estimated by pb 00
(t) = n 1
b 3
P
K((t
j
t)=b)Y
j
whereKisasymmetrickernelsuchthat R
K(x)dx=0and R
K(x)x 2
dx=
2: The optimal bandwidth for estimating the second derivative is of the order
O(n
(2d 1)=(9 2d)
):TheasymptoticexpressionV()canbegivenexplicitlyford=0
andd>0:
(13) V()=2c
f Z
1
K 2
(x)dx; (d=0);
(14) V()=2c
f
(1 2d)sind 1
1 1
1
K(x)K(y)jx yj 2d 1
dxdy; (d>0):
Ford<0;ageneralsimpleformula forV doesnotseemto beavailable.,except
inspecialcases.Forthebox-kernel,weobtain(seeBeranand Feng2002a)
Corollary1. LetK(x)= 1
2
1fx2[ 1;1]g: Dene
(15) (d)=
2 2d
(1 2d)sin(d)
d(2d+1)
with (0) = lim
d!0
(d) = : Then, under the assumptions of Theorem 1, we
have
(i): Bias:
(16) E[bp(t) p(t)]=b
2
n p
00
(t)
6
+o(b 2
n );
(ii): Variance:
(17) var(bp(t))=(nb
n )
2d 1
(d)c
f
w(t)+o((nb
n )
2d 1
);
(iii): IMSE:
Z
1
Ef[bp(t) p(t)]
2
gdt=b 4
n I(p
00
)
36
+(nb
n )
2d 1
(d)c
f W
(18) +o(max(b
4
n
;(nb
n )
2d 1
))
whereW = R
1
w(t)dt:
(iv): Optimalbandwidth:
(19) b
opt
=C
opt n
(2d 1)=(5 2d)
with
(20) C
opt
=[
9(1 2d)(d)c
f W
I(g 00
) ]
1=(5 2d)
5. Data driven bandwidth choice
An iterative algorithm for choosing the bandwidth for a modell with a smooth
trend function and a stationary or nonstationary error processes Z
i
is dened
in Beran and Feng (2002a,b).The errorprocess is modelled by a(possibly inte-
grated)Gaussianfractional ARIMA process(Granger andJoyeux1980,Hosking
1981).BeranandFengproveconvergenceofthealgorithmandprovidenitesam-
plemodicationstoimproveitsperformance forshort series.Convergenceof the
algorithmreliesonconsistencyoftheestimateofthespectraldistributionf:Fora
0-1-processY
i
;thespectraldistributionfunctioncanbeestimatedconsistentlyby
identicalwiththespectraldensityofafractionalARIMAprocess.Wethusassume
thefollowingadditionalassumption (A7):
(21) f()=
2
"
2
(e
i
)
(e i
)
2
j1 e i
j 1 2d
for some 1
2
<d <
1
2
: Here,(x) and (x) are polynomials of nite orders m
1
andm
2
respectivelywithrootsoutsidetheunitcircle.
Asuitablemodicationofthealgorithmin BeranandFeng (2002a,b,c)can
now be dened.(Notethat in Beranand Feng(2002c),m
1
is set equalto zero.)
Themainstepsofthealgorithm areasfollows:
Algorithm:
Step 1: Set j = 1; dene a maximal autoregressive order M and an initial
bandwidthb
o
;andcarryoutSteps2to5foreachm
2
2f0;1;:::;Mg:
Step 2:EstimatepandZ
i
usinganthebandwidthb
j 1
;
Step 3:Estimatef bymaximumlikelihood (Beran1995).
Step 4:Giventheestimated spectraldensityf;calculate anewoptimalband-
widthb
j
;set j=j+1andanewoptimalbandwidthforestimatingp 00
:
Step 5:Stop,ifthechangeinthebandwidthdoesnotexceedacertainbound.
OtherwisegotoStep2.
Step 6: Select thesolution that minimizesa consistentmodel choice criterion
suchastheBIC(Schwarz1978,Beranet al.1998).
6. Testing and Prediction
An approximate pointwisetestfor testing thenullhypothesisH
o
:p(t) const,
canbe obtainedby deningthe rejectionregionjbp(t)j>c
2
^
v where ^v isequalto
thesquare root of (nb
n )
2 b
d 1
( b
d)c
f b
p(t)(1 p(t))b and c
2
is the(1
2
) quantile
ofthestandardnormaldistribution.Notethat,alternatively,wemaytestthenull
hypothesis H
o :p
0
(t)0:
PredictionofY
n+k fromY
1
;:::;Y
n
reducestopredictingthesuccesprobability
p(1+k=n): Beranand Ocker(1999)proposeapredictionmethod in thecontext
of generalnonparametric trendfunctions that is basedon Taylorexpansionand
optimallinearpredictionofthestochasticcomponent.Thisapproachcan,inprin-
ciple, be carried over to forecasts of p(1+k=n): Note, however, that this may
leadtovaluesoutsideoftheinterval[0;1]:Asanalternative,onemayextrapolate
a suitable transformation of p: Morespecically, let g : [0;1] ! R be a one-to-
onemonotonic function such that lim
x!1
g(x) =1 and lim
x! 1
g(x)= 1:
Then g(p(1+k=n)) maybeapproximated byg(1)+g
(1) k
n
. The predictedvalue
of pisp(1b +k=n)=g 1
[g(1)+g
(1) k
n
]. Inthecontextof quantileestimation for
certainlong-memoryprocesses,thisapproachis used,forinstance, inGhoshand
7. Data examples
Themethod introducedherecanbeused to explorevariouslinearand nonlinear
properties of timeseries. This is illustratedbythe following application to daily
values of three stock market indices between january1, 1992 and november10,
1995. Theindices are: FTSE 100(gure 1), CAC(gure 2) and theSwiss Bank
CorporationIndex(gure3). Weconsider theevent
(22) A(i)=f!:X
i+20
>X
i
and min
s=1;:::;20 X
i+s
>0:9X
i g:
TheeventA(i)meansthat inonemonth(20workdays)theindexwillbehigher
thanthe initialvalueX
i
andduring this one-month period itneverdropsbelow
90%of X
i
. Theestimated probability functions p(t
i
)=P(A(i)) aredisplayedin
gures1c,2c and3c respectively. Theshadedareascorrespondto timestretches
where p is signicantly dierent from a constant. (fully shaded area for "high
level" andareashadedwith linesfor "lowlevel").Although thecriticallimitsto
test for non-constant p(t) are pointwise limits only, the similar patterns for all
threeseriessupporttheconjecturethatpisnotconstant.Inparticular,thereisa
period(aroundobservation400)wherepisconsiderablyhigherforallthreeseries.
Acknowledgment
Thisresearchwassupported inpartbytheCenterofFinance andEconometrics,
UniversityofKonstanz,Germany.IwouldalsoliketothankDr.DirkOcker(Swiss
Unionof Raieisenbanks)andDr. ElkeM.Hennig(Citibank,Frankfurt)forpro-
vidingmewiththedata.Thanksgoalsotoarefereeforcommentsthathelpedto
improvethepresentationoftheresults.
References
: Altman,N.S.1990.Kernelsmoothingofdatawithcorrelatederrors.J.Am.
Statist.Assoc. 85,749-759.
: Beran,J. 1994. Statisticsfor long-memoryprocesses.Chapman andHall,
NewYork.
: Beran,J.1995.Maximumlikelihoodestimationofthedierencingparame-
terforinvertibleshort-andlong-memoryARIMAmodels.J.Roy.Statist.
Soc. B57,No.4,695-672.
: Beran, J.,Bhansali, R.J.and Ocker,D. 1998. On unied model selection
for stationary and nonstationaryshort-and long-memoryautoregressive
processes.Biometrika Vol.85,No.4,921-934.
: Beran,J., Feng,Y. (2002a) `SEMIFAR models -asemiparametric frame-
work for modelling trends, long-rangedependence and nonstationarity'.
ComputationalStatistics&DataAnalysisg(inpress).
: Beran,J.and Feng,Y. (2002b)`Data drivenbandwidthchoice forSEMI-
FAR models'. Journal of Computational and Graphical Statisticsg (in
press).
: Beran,J.andFeng,Y.(2002c)`Localpolynomialttingwithlongmemory,
shortmemoryandantipersistenterrors'.Annals ofStatisticalMathemat-
ics(inpress).
: Beran, J.and Ocker,D. (1999) SEMIFAR forecasts,with applications to
foreignexchangerates'.JournalofStatisticalPlanningandInferenceg,80,
137-153.
: Chiu,S.T. 1989.Bandwidthselectionfor kernelestimateswith correlated
noise.Statist.Probab.Lett. 8,347-354.
: Csorgo,S. andMielniczuk,J.1995.Nonparametricregressionunder long-
rangedependentnormalerrors.Ann.Statist. 23,1000-1014.
: Ghosh,S.andDraghicescu,D.(2001)Predictingthedistributionfunction
forlong-memoryprocesses.InternationalJournalofForecasting(inpress).
: Granger,C.W.J., Joyeux,R.1980.An introductionto long-rangetimese-
riesmodelsandfractionaldierencing.J.TimeSeriesAnal. 115-30.
: Hall, P.and Hart,J. 1990. Nonparametricregressionwith long-rangede-
pendence.Stoch.ProcessesAppl. 36,339-351.
: Herrmann, E., Gasser, T. and Kneip, A. 1992. Choice of bandwidth for
kernelregressionwhenresidualsarecorrelated.Biometrika 79, 783-795.
: Ray,B.K.and Tsay,R.S.1997. Bandwidth selectionfor kernelregression
withlong-rangedependence.Biometrika 84,791{802.
: Schwarz,G.1978. Estimatingthedimension ofamodel. Ann. Statist. 6,
461-464.
DepartmentofMathematicsandStatistics,UniversityofKonstanz,Universitatsstr.
10,Postfach5560,78457Konstanz,Germany
day
x(t)
0 200 400 600 800
0 500 1000
FTSE 100 - original series
day
x(t)-x(t-1)
0 200 400 600 800
-100 -50 0 50
FTSE 100 - differenced series
day
p(t)
0 200 400 600 800
0.0 0.2 0.4 0.6 0.8 1.0
Figure 1 c: FTSE 100
fitted p(t)=P(X(t+20)>X(t),min[X(t),...,X(t+20)]>0.9*X(t-2)
Figure1. FTSE100betweenjanuary1,1992andnovember10,
1995-originalseries(gure1a),dierencedseries(gure1b)and
estimated probability function p(i=n) = P(A(i)) where A(i) =
f! : X
i+20
>X
i
and min
s=1;:::;20 X
i+s
>0:9X
i
g: Periods with
signicantdepartures from H
o
:pconstare shadedwith lines
(forpbelowcriticalbound)andfullyshaded(forpabovecritical
bound) respectively.
day
x(t)
0 200 400 600 800
0 200 400 600
day
x(t)-x(t-1)
0 200 400 600 800
-60 -20 20 60
day
p(t)
0 200 400 600 800
0.0 0.2 0.4 0.6 0.8 1.0
Figure 2 c: CAC
fitted p(t)=P(X(t+20)>X(t),min[X(t),...,X(t+20)]>0.9*X(t-2)
Figure2. CACbetweenjanuary1,1992andnovember10,1995
-original series (gure 1a),dierenced series (gure 1b)and es-
timatedprobabilityfunctionp(i=n)=P(A(i))whereA(i)=f!:
X
i+20
> X
i
and min
s=1;:::;20 X
i+s
> 0:9X
i
g: Periods with sig-
nicant departures from H
o
: p const are shaded with lines
(forpbelowcriticalbound)andfullyshaded(forpabovecritical
bound) respectively.
day
x(t)
0 200 400 600 800
0 100 200 300 400 500
day
x(t)-x(t-1)
0 200 400 600 800
-20 -10 0 10 20
day
p(t)
0 200 400 600 800
0.0 0.2 0.4 0.6 0.8 1.0
Figure 3 c: SBC
fitted p(t)=P(X(t+20)>X(t),min[X(t),...,X(t+20)]>0.9*X(t-2)
Figure3. SBCbetweenjanuary1,1992andnovember10,1995
-original series (gure 1a),dierenced series (gure 1b)and es-
timatedprobabilityfunctionp(i=n)=P(A(i))whereA(i)=f!:
X
i+20
> X
i
and min
s=1;:::;20 X
i+s
> 0:9X
i
g. Periods with sig-
nicant departures from H
o
: p const are shaded with lines
(forpbelowcriticalbound)andfullyshaded(forpabovecritical
bound) respectively.