Penalizing function based bandwidth choice in nonparametric quantile regression

(1)

bandwidth choice in

nonparametric quantile regression

KlausAbberger,UniversityofKonstanz,Germany

Abstract:

In nonparametric mean regression various methods for bandwidth choice ex-

ist. These methods can roughlybedividedinto plug-inmethodsand methods

basedonpenalizingfunctions. Thispaperusestheapproachbasedonpenalizing

functions andadapt it tononparametric quantileregressionestimation, where

bandwidthchoiceisstill anunsolvedproblem. Variouscriteriaforbandwitdth

choicearedenedandcompared insomesimulationexamples.

Key Words: nonparametric quantile regression, bandwidth choice, cross-

validation,penalizingfunctions

1 Introduction

Althoughmostregressioninvestigationsareconcernedwiththeregressionmean

functionotheraspectsof theconditional distributionofY givenX arealsoof-

tenof interest. For xed 2(0;1), the quantileregressionfunction gives the

thquantileq

(x)intheconditionaldistributionofaresponsevariableY given

thevalue X =x. It canbeused to measure the eect of covariatesnotonly

inthecenterofapopulation,but alsoin theloweranduppertails. Especially

of interest is the case where the data pattern shows heteroscedasticities and

asymmetries.

(2)

discussed. Thesemethodsincludesplinesmoothing,kernelestimation,nearest-

neighbourestimationandlocallyweightedpolynomialregression.YuandJones

(1998)proposetwokindsoflocallinearquantileregression. Theyalsodevelopa

rule-of-thumbbandwidthchoiceprocedure basedontheplug-inidea. Starting

pointistheasymptoticallyoptimalbandwidthminimizingtheMSE.Sincethis

bandwidthdependsonunknownquantitiestheauthorsintroducesomesimplify-

ingassumptions. Theseassumptionsresultin thebandwidthselectionstrategy

h

=h

mean

f(1 )=(

1

()) 2

g 1=5

: (1)

andare thestandardnormaldensityand distributionfunction and h

mean

isabandwidthchoiceforregressionmeanestimationwithoneoftheseveralex-

istingmethods. Asitcan beseenthisprocedureleadstoidentical bandwidths

fortheand(1 )quantiles. Althoughthisstrategymightworkverywellin

somesituationsourspecialinterestliesin asymmetricdatapatternswhere the

aboveruleistorestrictive.

Abberger(1998) adapts thecross-validation ideato kernelquantileregres-

sion and presents somesimulation examples. Also asymmetric data patterns

basedonthelognormaldistributionareincluded.

Thispapertriestousepenalizingfunctionbasedcriteriatochoosetheband-

width in nonparametric quantileregression. Inthe next section these criteria

arepresentedandsimulationexamplesarediscussedinSection3.

2 Quantile estimation and bandwidth choice

A locally weighted linear quantile regression estimator is dened by setting

^ q

(x)=^a,where^aand

^

b minimize

n

X

i=1

(Y

i

a b(X

i

x))K

x X

i

h

(2)

(3)

(u)=1

fu0g

(u)u+( 1)1

fu<0g

(u)u (3)

introduced by Koenker and Basset (1978) in connection with the parametric

quantile regression. Fora discussionof this estimatorsee Heiler (2000)or Yu

andJones(1998),whichalsoderivetheMSEofthisestimator. Tocalculateq^

weuse an iterativelyreweightedleast squaresalgorithm. Initial estimatesare

conditionalquantilescalculatedwithakernelestimatoroftheNadaraya-Watson

type(seeHeiler(2000)).

log(h(ASE)/h(AAWE))

est. density

-2 -1 0 1

0.0 0.5 1.0 1.5 2.0

Figure1: EstimateddensityoflogdierencesbetweenASEandAAWEoptimal

bandwidthsforsimulateddata

Estimation by minimizing equation (2) canbe interpreted asM-estimator

orin thenotationofBickelandDoksum(2001)asminimumcontrastestimate

withcontrastfunction

. Ingeneraltheydeneadiscrepancy function

D(

0

;)E

0

(Y;) (4)

(4)

thetruevalue

0 .

Nonparametric estimation of quantile regression requires the choice of a

bandwidth. Innonparametricmeanregressionproceduresforbandwidthchoice

usually groundonthe MSE. Variousdenitions of theoptimal bandwidthare

available. One candidate is the bandwidth that minimizes MISE (mean inte-

grated squared error) for the given sample size and design. This bandwidth

is optimalwith respect to the averageperformance over allpossibledata sets

foragivenpopulation,rather thanfor the performance for theobserveddata

set. Anotherchoiceisthebandwidththatminimizestheaveragesquarederror

(ASE)fortheobserveddataset. Betweenthesetwoconceptswechosethelater

one. Forfurther discussionofthisissuesee e.g. Mammen(1990),Grund etal.

(1994),Hardle(1988).

Another natural choice in quantile regression is based on the discrepancy

function(4). Itis

E[

(Y m(x))]=(

Y

(x) m(x))+ Z

m(x)

1

F(yjx)dy (5)

andthustheoptimalbandwidthisthatoneforwhichthecorrespondingquantile

estimatorminimizes

1

n n

X

i=1 (

Z

^ q

(X

i )

1

F(yjX

i

)dy ^q

(X

i )

)

: (6)

Inthesequelthiscriterionwillbecalledaveragealphaweightederror(AAWE).

ThedierencebetweenASE(whichinquantileregressionis1=n P

(q

(X

i )

^ q

(X

i ))

2

)andAAWEisdemonstratedforadatapattern whichisfurther con-

sideredin thesimulationexamplessection. Thetrueunderlying distributionis

exponentialwithdensity

f(y)=ae ay 1

1

fy> 1=ag

(y);a>0: (7)

(5)

x

(est.) quantiles

0 100 200 300 400 500 600

0.0 0.2 0.4 0.6 0.8

true AAEW ASE

Figure2: Estimated0.75quantileswithASEand AAWEoptimal bandwidths

forasimulateddataset

This density is asymmetric and has expectation Zero for all a > 0. With

x=1;:::;600wechosea=1:5+sin(

x

100

2). For1000repetitionsthebandwidths

minimizingthe average errorsestimatingthe0:75-quantileswithakernelesti-

matorarecalculated. Figure1showsthedensityoflog (h

ASE

=h

AAWE

). There

isahighpeakat0indicatingthatthechosenbandwidthscoincide quiteoften.

Butthere is also aslight left skewness observable. This indicates a tendency

of theAAWEmethod to smooth strongerthan theASE procedure. Figure 2

showsa\typical\examplewheretheASEbandwidthissmallerthantheAAWE

bandwidth. Sincetheconditionaldistributionin thepeaksismuchatterthan

inthevalleyswheretheconditionaldistributionisverysteep,derivationsinthe

valleysare intheAAWEsensemoreimportantthanerrorsin thepeaks. This

perspectiveisquitenaturalforquantileestimation,especiallywhenwethinkof

doingquantileforecasts.

(6)

theyareoftheformy^=m(x)^ =Hy , wherethematrixH iscommonlycalled

thesmoother matrix anddepends onx but noton y . The traceofH canbe

interpretedas theeective numberof parametersused in thesmoothing(e.g.

HastieandTibshirani(1990),sec. 3.5). Onepossiblestrategytondasuitable

smoothingparameteristochose thebandwidthwhichistheminimizerof

log (^

2

)+ (H); where (8)

^ 2

= 1

n n

X

i=1 fy

i

^ m

h (X

i )g

2

(9)

and () apenalty function designed to decrease with increasing smoothness

of m^

h

. Common choices of lead to GCV ( (H) = 2logf1 tr(H)=ng),

Rice`s T ( (H) = logf1 2tr(H)=ng) and AIC

c

(Hurvich et al. (1998) )

( (H)=f1+tr(H)=ng=f1 [tr(H)+2]=2g).

These smoothing parameter selectors can be adapted to quantile regres-

sion estimation. The rst modication concerns log(^

2

). Since the quantile

estimator (2) falls into the class of M-estimators wecan proceed as usual in

M-estimation(seee.g. Hampel et al. (1986))and interpretthe

function as

\-loglikelhood=

\. Sothe AIC criterionand all the otherabove mentioned

criteriacan beadaptedbyusing 1

n P

n

i=1

(y

i

^ q

(x

i

))insteadof.^

The second modication concerns the smoother matrixH. Estimator (2)

doesnot lead to alinear estimatory^ =Hy . Because the actual estimatoris

carriedoutbyiterativelyreweightedleastsquaresthesmoother matrixH can

beapproximatedbytheimpliedsmoothermatrixfromthelastiterationof the

iterativelyreweightedleastsquarestof themodel.

Withthesemodicationswearriveatthefollowingstrategytondasuitable

smoothingparameterforlocallinearquantileregression: choosethebandwidth

(7)

2log 1

n n

X

i=1

(y

i

^ q

(x

i ))

!

+ (H); (10)

where () is oneof theabovementioned penalizing functions and H the ap-

proximativesmoothermatrix.

3 Simulation examples

Inthissection, somesimulation resultsarepresented. Theunderlying density

functionswereoftheexponentialtypeshownin equation(7). Thetwomodels

ModelI: a =1:5+sin(

x

100

2) (11)

ModelII: a =10exp( 1=200x) (12)

with x = 1;:::;400are considered. For each setting 100 repetitions were cal-

culated. The0:25-and0:75-quantileswereestimated forbothmodels. Band-

widthsarechosenwiththehelpoftheabovediscussedmethodsbasedonpenal-

izingfunctions andin additionwiththecross-validationmethodwhichchooses

h

CV

=min

h (

n

X

i=1

(Y

i

^ q

( i)

(X

i ))

)

; (13)

withq^ ( i)

(X

i

)thesocalledleave-one-outestimator. Thisistheestimatorforthe

conditionalquantileatX

i

whichiscalculatedwithouttheobservation(Y

i

;X

i ).

Toavoidboundaryeects onlythe200observations x=101;:::;300in the

middleareused forbandwidthchoice.

Theestimateddensitiesoflog(h

CV

=h

AAWE ),log(h

GCV

=h

AAWE

)andlog(h

AICc

=h

AAWE )

forthe twoquantilesand bothmodels are shown in the Figures 3-6. Wealso

calculatedRice`s T but the resultsare quite similar to the AIC

c

criterion so

theseresultsarenotshownin thegraphs.

(8)

AAWE

andforthe0:75quantilesthemeanis58:1. Thisdierenceinthemeansconrms

theneedofmethods whichcanhandleasymmetricdatapatterns.

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0

0.0 0.2 0.4 0.6 0.8 1.0

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0

0.0 0.2 0.4 0.6 0.8 1.0

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0

0.0 0.2 0.4 0.6 0.8 1.0 CV

GCV AICC

Figure3: Estimateddensitiesoflog(h

=h

AAWE

)for0.25quantilesof ModelI

Figure 3showstheresultsfor the 0:25quantilesof Model I.The three es-

timateddensitieshaveallmodiaroundZero. Butthepeaksforthepenalizing

methodsarehigherandsharperthanforthecross-validationmethodwherethe

densityisatter.

Alsoin Figure 4whichpresentstheresultsfor the0:75quantilesof Model

I the cross-validation density is relativelyat. But in this caseit is the only

densitywithmodusaroundZero. Thepenalizingmethodstendtoundersmooth.

AsimilarbehaviourcanbeobtainedforModelIIvisualizedinFigure6and

7. The mean of h

AAWE

for = 0:25is 109.6 and for = 0:75 the mean is

(9)

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

0.0 0.2 0.4 0.6 0.8

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

0.0 0.2 0.4 0.6 0.8

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

0.0 0.2 0.4 0.6 0.8

CV GCV AICC

=h

AAWE

)for0.75quantilesof ModelI

245.6. Thisdierenceis againaresultoftheasymmetric density. Andjust as

forModelI thepenalizingmethods tendtoundersmooththeupperquantile.

Theseresultsremainunchangedwhenh

ASE

isusedasreferencebandwidth

insteadof h

AAWE

becauseinthese examplesthe dierencebetweenh

ASE and

h

AAWE

isnotsuchlarge.

Figure7showstheestimateddensitiesforthe0.75quantilesofModelIbut

now100observationsareusedforbandwidthchoiceinsteadof200. Thepenal-

izingmethods still undersmoothbut the smallersample size leadsto stronger

dierences between the methods. The AIC

C

method undersmooth less than

theGCVmethod.

Tosumupthesimulationresultsitcanbestatedthatthepenalizingfunc-

(10)

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

0.0 0.2 0.4 0.6 0.8

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

0.0 0.2 0.4 0.6 0.8

est. density

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

0.0 0.2 0.4 0.6 0.8

CV GCV AICC

=h

AAWE

)for0.25quantilesofModelII

tionbasedmethods forbandwidthchoicecanleadto areductioninvariability

comparedwith thecross-validation method. But forthis wehaveto takeinto

accountthetendencyofpenalizing methodsto undersmoothwhenlargeband-

widthsareappropriate. Maybethisdisadvantagecanbegetundercontrolwith

thedevelopmentofadaptedpenalizingfunctions. Simulationsbasedonsmaller

sample sizes show that the AIC

C

penalizing function undersmooth less than

someotherpenalizingfunctions.

(11)

x

est. density

-2 -1 0 1

0.0 0.2 0.4 0.6 0.8

x

est. density

-2 -1 0 1

0.0 0.2 0.4 0.6 0.8

x

est. density

-2 -1 0 1

0.0 0.2 0.4 0.6 0.8

CV GCV AICC

=h

AAWE

)for0.75quantilesofModelII

(12)

est. density

-4 -3 -2 -1 0 1 2

0.0 0.1 0.2 0.3 0.4 0.5 0.6

est. density

-4 -3 -2 -1 0 1 2

0.0 0.1 0.2 0.3 0.4 0.5 0.6

est. density

-4 -3 -2 -1 0 1 2

0.0 0.1 0.2 0.3 0.4 0.5 0.6

CV GCV AICC

Figure 7: Estimated densitiesof log(h

=h

AAWE

) for0.75 quantiles ofModel I

(bandwidthchoice with100observations)

(13)

Abberger K. (1998): Cross-validation innonparametricquantileregression.

AllgemeinesStatistischesArchiv,82, 149-161.

BickelP.J.,DoksumK.A.(2001): MathematicalStatistics. Longman

HigherEducation,NewJersey.

Grund B., Hall P., Marron J.S (1994): Loss and risk in smoothing

parameterselection. JournalofNonparametricStatistics, 4,107-132.

HampelF.R.,RonchettiE.M.,RousseeuwP.J.,StahelW.A.(1986):

RobustStatistics. Wiley,NewYork.

Hardle W., Hall P., Marron J:S. (1988): How far are automatically

chosen regressionsmoothing parameters from their optimum? Journalof the

AmericanStatisticalAssociation,83,86-101.

Hastie T.J., Tibshirani R.J. (1990): Generalized Additive Models.

ChapmanandHall,NewYork.

Heiler S. (2000): NonparametricTime SeriesAnalysis. In: A Course in

TimeSeriesAnalysis,editedbyD.Penaand G.C.Tiao. JohnWiley,London.

Hurvich C.M., SimonoJ.S., Tsay C.L. (1998): Smoothingparame-

terselectioninnonparametricregressionusinganimprovedAkaikeinformation

criterion. journaloftheRoyalStatisticalSociety,Ser. B,60,271-293.

HurvichC.M., Tsay C.L. (1989): Regressionandtimeseriesmodelse-

lectioninsmallsamples. Biometrika,76,297-307.

(14)

46,33-50.

Koenker R., Portnoy S., Ng P. (1992): Nonparametricestimation of

conditionalquantilefunctions. In: L

1

-StatisticalAnalysisandRelatedmethods

(ed. Y. Dodge),North-Holland,NewYork.

Mammen E. (1990): A short note on optimal bandwidth selection for

kernelestimators. StatisticsandProbabilityLetters,9,23-25.

YuK., Jones M.C.(1998): Locallinearquantileregression. Journalof

theAmericanStatisticalAssociation,93,228-237.