bandwidth choice in
nonparametric quantile regression
KlausAbberger,UniversityofKonstanz,Germany
Abstract:
In nonparametric mean regression various methods for bandwidth choice ex-
ist. These methods can roughlybedividedinto plug-inmethodsand methods
basedonpenalizingfunctions. Thispaperusestheapproachbasedonpenalizing
functions andadapt it tononparametric quantileregressionestimation, where
bandwidthchoiceisstill anunsolvedproblem. Variouscriteriaforbandwitdth
choicearedenedandcompared insomesimulationexamples.
Key Words: nonparametric quantile regression, bandwidth choice, cross-
validation,penalizingfunctions
1 Introduction
Althoughmostregressioninvestigationsareconcernedwiththeregressionmean
functionotheraspectsof theconditional distributionofY givenX arealsoof-
tenof interest. For xed 2(0;1), the quantileregressionfunction gives the
thquantileq
(x)intheconditionaldistributionofaresponsevariableY given
thevalue X =x. It canbeused to measure the eect of covariatesnotonly
inthecenterofapopulation,but alsoin theloweranduppertails. Especially
of interest is the case where the data pattern shows heteroscedasticities and
asymmetries.
discussed. Thesemethodsincludesplinesmoothing,kernelestimation,nearest-
neighbourestimationandlocallyweightedpolynomialregression.YuandJones
(1998)proposetwokindsoflocallinearquantileregression. Theyalsodevelopa
rule-of-thumbbandwidthchoiceprocedure basedontheplug-inidea. Starting
pointistheasymptoticallyoptimalbandwidthminimizingtheMSE.Sincethis
bandwidthdependsonunknownquantitiestheauthorsintroducesomesimplify-
ingassumptions. Theseassumptionsresultin thebandwidthselectionstrategy
h
=h
mean
f(1 )=(
1
()) 2
g 1=5
: (1)
andare thestandardnormaldensityand distributionfunction and h
mean
isabandwidthchoiceforregressionmeanestimationwithoneoftheseveralex-
istingmethods. Asitcan beseenthisprocedureleadstoidentical bandwidths
fortheand(1 )quantiles. Althoughthisstrategymightworkverywellin
somesituationsourspecialinterestliesin asymmetricdatapatternswhere the
aboveruleistorestrictive.
Abberger(1998) adapts thecross-validation ideato kernelquantileregres-
sion and presents somesimulation examples. Also asymmetric data patterns
basedonthelognormaldistributionareincluded.
Thispapertriestousepenalizingfunctionbasedcriteriatochoosetheband-
width in nonparametric quantileregression. Inthe next section these criteria
arepresentedandsimulationexamplesarediscussedinSection3.
2 Quantile estimation and bandwidth choice
A locally weighted linear quantile regression estimator is dened by setting
^ q
(x)=^a,where^aand
^
b minimize
n
X
i=1
(Y
i
a b(X
i
x))K
x X
i
h
(2)
(u)=1
fu0g
(u)u+( 1)1
fu<0g
(u)u (3)
introduced by Koenker and Basset (1978) in connection with the parametric
quantile regression. Fora discussionof this estimatorsee Heiler (2000)or Yu
andJones(1998),whichalsoderivetheMSEofthisestimator. Tocalculateq^
weuse an iterativelyreweightedleast squaresalgorithm. Initial estimatesare
conditionalquantilescalculatedwithakernelestimatoroftheNadaraya-Watson
type(seeHeiler(2000)).
log(h(ASE)/h(AAWE))
est. density
-2 -1 0 1
0.0 0.5 1.0 1.5 2.0
Figure1: EstimateddensityoflogdierencesbetweenASEandAAWEoptimal
bandwidthsforsimulateddata
Estimation by minimizing equation (2) canbe interpreted asM-estimator
orin thenotationofBickelandDoksum(2001)asminimumcontrastestimate
withcontrastfunction
. Ingeneraltheydeneadiscrepancy function
D(
0
;)E
0
(Y;) (4)
thetruevalue
0 .
Nonparametric estimation of quantile regression requires the choice of a
bandwidth. Innonparametricmeanregressionproceduresforbandwidthchoice
usually groundonthe MSE. Variousdenitions of theoptimal bandwidthare
available. One candidate is the bandwidth that minimizes MISE (mean inte-
grated squared error) for the given sample size and design. This bandwidth
is optimalwith respect to the averageperformance over allpossibledata sets
foragivenpopulation,rather thanfor the performance for theobserveddata
set. Anotherchoiceisthebandwidththatminimizestheaveragesquarederror
(ASE)fortheobserveddataset. Betweenthesetwoconceptswechosethelater
one. Forfurther discussionofthisissuesee e.g. Mammen(1990),Grund etal.
(1994),Hardle(1988).
Another natural choice in quantile regression is based on the discrepancy
function(4). Itis
E[
(Y m(x))]=(
Y
(x) m(x))+ Z
m(x)
1
F(yjx)dy (5)
andthustheoptimalbandwidthisthatoneforwhichthecorrespondingquantile
estimatorminimizes
1
n n
X
i=1 (
Z
^ q
(X
i )
1
F(yjX
i
)dy ^q
(X
i )
)
: (6)
Inthesequelthiscriterionwillbecalledaveragealphaweightederror(AAWE).
ThedierencebetweenASE(whichinquantileregressionis1=n P
(q
(X
i )
^ q
(X
i ))
2
)andAAWEisdemonstratedforadatapattern whichisfurther con-
sideredin thesimulationexamplessection. Thetrueunderlying distributionis
exponentialwithdensity
f(y)=ae ay 1
1
fy> 1=ag
(y);a>0: (7)
x
(est.) quantiles
0 100 200 300 400 500 600
0.0 0.2 0.4 0.6 0.8
true AAEW ASE
Figure2: Estimated0.75quantileswithASEand AAWEoptimal bandwidths
forasimulateddataset
This density is asymmetric and has expectation Zero for all a > 0. With
x=1;:::;600wechosea=1:5+sin(
x
100
2). For1000repetitionsthebandwidths
minimizingthe average errorsestimatingthe0:75-quantileswithakernelesti-
matorarecalculated. Figure1showsthedensityoflog (h
ASE
=h
AAWE
). There
isahighpeakat0indicatingthatthechosenbandwidthscoincide quiteoften.
Butthere is also aslight left skewness observable. This indicates a tendency
of theAAWEmethod to smooth strongerthan theASE procedure. Figure 2
showsa\typical\examplewheretheASEbandwidthissmallerthantheAAWE
bandwidth. Sincetheconditionaldistributionin thepeaksismuchatterthan
inthevalleyswheretheconditionaldistributionisverysteep,derivationsinthe
valleysare intheAAWEsensemoreimportantthanerrorsin thepeaks. This
perspectiveisquitenaturalforquantileestimation,especiallywhenwethinkof
doingquantileforecasts.
theyareoftheformy^=m(x)^ =Hy , wherethematrixH iscommonlycalled
thesmoother matrix anddepends onx but noton y . The traceofH canbe
interpretedas theeective numberof parametersused in thesmoothing(e.g.
HastieandTibshirani(1990),sec. 3.5). Onepossiblestrategytondasuitable
smoothingparameteristochose thebandwidthwhichistheminimizerof
log (^
2
)+ (H); where (8)
^ 2
= 1
n n
X
i=1 fy
i
^ m
h (X
i )g
2
(9)
and () apenalty function designed to decrease with increasing smoothness
of m^
h
. Common choices of lead to GCV ( (H) = 2logf1 tr(H)=ng),
Rice`s T ( (H) = logf1 2tr(H)=ng) and AIC
c
(Hurvich et al. (1998) )
( (H)=f1+tr(H)=ng=f1 [tr(H)+2]=2g).
These smoothing parameter selectors can be adapted to quantile regres-
sion estimation. The rst modication concerns log(^
2
). Since the quantile
estimator (2) falls into the class of M-estimators wecan proceed as usual in
M-estimation(seee.g. Hampel et al. (1986))and interpretthe
function as
\-loglikelhood=
\. Sothe AIC criterionand all the otherabove mentioned
criteriacan beadaptedbyusing 1
n P
n
i=1
(y
i
^ q
(x
i
))insteadof.^
The second modication concerns the smoother matrixH. Estimator (2)
doesnot lead to alinear estimatory^ =Hy . Because the actual estimatoris
carriedoutbyiterativelyreweightedleastsquaresthesmoother matrixH can
beapproximatedbytheimpliedsmoothermatrixfromthelastiterationof the
iterativelyreweightedleastsquarestof themodel.
Withthesemodicationswearriveatthefollowingstrategytondasuitable
smoothingparameterforlocallinearquantileregression: choosethebandwidth
2log 1
n n
X
i=1
(y
i
^ q
(x
i ))
!
+ (H); (10)
where () is oneof theabovementioned penalizing functions and H the ap-
proximativesmoothermatrix.
3 Simulation examples
Inthissection, somesimulation resultsarepresented. Theunderlying density
functionswereoftheexponentialtypeshownin equation(7). Thetwomodels
ModelI: a =1:5+sin(
x
100
2) (11)
ModelII: a =10exp( 1=200x) (12)
with x = 1;:::;400are considered. For each setting 100 repetitions were cal-
culated. The0:25-and0:75-quantileswereestimated forbothmodels. Band-
widthsarechosenwiththehelpoftheabovediscussedmethodsbasedonpenal-
izingfunctions andin additionwiththecross-validationmethodwhichchooses
h
CV
=min
h (
n
X
i=1
(Y
i
^ q
( i)
(X
i ))
)
; (13)
withq^ ( i)
(X
i
)thesocalledleave-one-outestimator. Thisistheestimatorforthe
conditionalquantileatX
i
whichiscalculatedwithouttheobservation(Y
i
;X
i ).
Toavoidboundaryeects onlythe200observations x=101;:::;300in the
middleareused forbandwidthchoice.
Theestimateddensitiesoflog(h
CV
=h
AAWE ),log(h
GCV
=h
AAWE
)andlog(h
AICc
=h
AAWE )
forthe twoquantilesand bothmodels are shown in the Figures 3-6. Wealso
calculatedRice`s T but the resultsare quite similar to the AIC
c
criterion so
theseresultsarenotshownin thegraphs.
AAWE
andforthe0:75quantilesthemeanis58:1. Thisdierenceinthemeansconrms
theneedofmethods whichcanhandleasymmetricdatapatterns.
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0
0.0 0.2 0.4 0.6 0.8 1.0
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0
0.0 0.2 0.4 0.6 0.8 1.0
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0
0.0 0.2 0.4 0.6 0.8 1.0 CV
GCV AICC
Figure3: Estimateddensitiesoflog(h
=h
AAWE
)for0.25quantilesof ModelI
Figure 3showstheresultsfor the 0:25quantilesof Model I.The three es-
timateddensitieshaveallmodiaroundZero. Butthepeaksforthepenalizing
methodsarehigherandsharperthanforthecross-validationmethodwherethe
densityisatter.
Alsoin Figure 4whichpresentstheresultsfor the0:75quantilesof Model
I the cross-validation density is relativelyat. But in this caseit is the only
densitywithmodusaroundZero. Thepenalizingmethodstendtoundersmooth.
AsimilarbehaviourcanbeobtainedforModelIIvisualizedinFigure6and
7. The mean of h
AAWE
for = 0:25is 109.6 and for = 0:75 the mean is
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
0.0 0.2 0.4 0.6 0.8
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
0.0 0.2 0.4 0.6 0.8
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
0.0 0.2 0.4 0.6 0.8
CV GCV AICC
Figure4: Estimateddensitiesoflog(h
=h
AAWE
)for0.75quantilesof ModelI
245.6. Thisdierenceis againaresultoftheasymmetric density. Andjust as
forModelI thepenalizingmethods tendtoundersmooththeupperquantile.
Theseresultsremainunchangedwhenh
ASE
isusedasreferencebandwidth
insteadof h
AAWE
becauseinthese examplesthe dierencebetweenh
ASE and
h
AAWE
isnotsuchlarge.
Figure7showstheestimateddensitiesforthe0.75quantilesofModelIbut
now100observationsareusedforbandwidthchoiceinsteadof200. Thepenal-
izingmethods still undersmoothbut the smallersample size leadsto stronger
dierences between the methods. The AIC
C
method undersmooth less than
theGCVmethod.
Tosumupthesimulationresultsitcanbestatedthatthepenalizingfunc-
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
0.0 0.2 0.4 0.6 0.8
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
0.0 0.2 0.4 0.6 0.8
est. density
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
0.0 0.2 0.4 0.6 0.8
CV GCV AICC
Figure5: Estimateddensitiesoflog(h
=h
AAWE
)for0.25quantilesofModelII
tionbasedmethods forbandwidthchoicecanleadto areductioninvariability
comparedwith thecross-validation method. But forthis wehaveto takeinto
accountthetendencyofpenalizing methodsto undersmoothwhenlargeband-
widthsareappropriate. Maybethisdisadvantagecanbegetundercontrolwith
thedevelopmentofadaptedpenalizingfunctions. Simulationsbasedonsmaller
sample sizes show that the AIC
C
penalizing function undersmooth less than
someotherpenalizingfunctions.
x
est. density
-2 -1 0 1
0.0 0.2 0.4 0.6 0.8
x
est. density
-2 -1 0 1
0.0 0.2 0.4 0.6 0.8
x
est. density
-2 -1 0 1
0.0 0.2 0.4 0.6 0.8
CV GCV AICC
Figure6: Estimateddensitiesoflog(h
=h
AAWE
)for0.75quantilesofModelII
est. density
-4 -3 -2 -1 0 1 2
0.0 0.1 0.2 0.3 0.4 0.5 0.6
est. density
-4 -3 -2 -1 0 1 2
0.0 0.1 0.2 0.3 0.4 0.5 0.6
est. density
-4 -3 -2 -1 0 1 2
0.0 0.1 0.2 0.3 0.4 0.5 0.6
CV GCV AICC
Figure 7: Estimated densitiesof log(h
=h
AAWE
) for0.75 quantiles ofModel I
(bandwidthchoice with100observations)
Abberger K. (1998): Cross-validation innonparametricquantileregression.
AllgemeinesStatistischesArchiv,82, 149-161.
BickelP.J.,DoksumK.A.(2001): MathematicalStatistics. Longman
HigherEducation,NewJersey.
Grund B., Hall P., Marron J.S (1994): Loss and risk in smoothing
parameterselection. JournalofNonparametricStatistics, 4,107-132.
HampelF.R.,RonchettiE.M.,RousseeuwP.J.,StahelW.A.(1986):
RobustStatistics. Wiley,NewYork.
Hardle W., Hall P., Marron J:S. (1988): How far are automatically
chosen regressionsmoothing parameters from their optimum? Journalof the
AmericanStatisticalAssociation,83,86-101.
Hastie T.J., Tibshirani R.J. (1990): Generalized Additive Models.
ChapmanandHall,NewYork.
Heiler S. (2000): NonparametricTime SeriesAnalysis. In: A Course in
TimeSeriesAnalysis,editedbyD.Penaand G.C.Tiao. JohnWiley,London.
Hurvich C.M., SimonoJ.S., Tsay C.L. (1998): Smoothingparame-
terselectioninnonparametricregressionusinganimprovedAkaikeinformation
criterion. journaloftheRoyalStatisticalSociety,Ser. B,60,271-293.
HurvichC.M., Tsay C.L. (1989): Regressionandtimeseriesmodelse-
lectioninsmallsamples. Biometrika,76,297-307.
46,33-50.
Koenker R., Portnoy S., Ng P. (1992): Nonparametricestimation of
conditionalquantilefunctions. In: L
1
-StatisticalAnalysisandRelatedmethods
(ed. Y. Dodge),North-Holland,NewYork.
Mammen E. (1990): A short note on optimal bandwidth selection for
kernelestimators. StatisticsandProbabilityLetters,9,23-25.
YuK., Jones M.C.(1998): Locallinearquantileregression. Journalof
theAmericanStatisticalAssociation,93,228-237.