Testing for a Parametric Regression Function via the Conditional Quantile FunctionConditional Quantile Function

In Section 1.2, some approaches of how to test for a parametric regression function were presented. Afterwards, the main idea in Section 2.2 was to take these approaches and to mo-dify them in order to obtain valid tests in semiparametric transformation models. Although the presented approach will follow the same spirit, the tools used in this section slightly differ from those in 2.2. Basically, the influence of estimating the transformation parameter in semiparametric models is described by an appropriate Taylor expansion, where asymp-totic negligibility of higher terms of the expansion is ensured by appropriate integrability conditions on the parametric transformation function and its derivatives.

In the infinite dimensional nonparametric setting, one has to proceed differently since on the one hand Taylor expansions can not be applied as simply as for parametric transforma-tions and on the other hand the available estimators of the transformation in general only yield satisfying uniform convergence rates on compact sets. See for example the results of Chiappori et al. (2015) and Colling and Van Keilegom (2019). Therefore, a new testing ap-proach, which is extended to nonparametric transformation models in part 2.4, is presented in this section.

Although aiming to test for a parametric regression function, the method provided here is related to testing for a parametric quantile function. Already Chiappori et al. (2015) suggested the estimation of conditional quantiles. See Zheng (1998), Bierens and Ginter (2001), Horowitz and Spokoiny (2002), He and Zhu (2003) and Horowitz and Lee (2009) for some testing approaches in the context of quantile regression or Zheng (2000) for the related question of testing for a parametric conditional distribution function. The test in this section uses a Cram´er-von-Mises-type test statistic based on the inverse function of a kernel estimator of the conditional distribution function. In this regard, the testing appro-ach differs from the tests mentioned above and, to the author’s knowledge, also from other

2.3. Testing for a Parametric Regression Function via the Conditional Quantile Function tests in the literature. Hence, the asymptotic behaviour is examined in detail in Subsection 2.3.2.

2.3.1 The Test Statistic

From now on, the regression function is allowed to have an arbitrary intercept under the null hypothesis. Usually in regression models, the intercept is estimated as a part of the regression function anyway, so that this is does not reduce the generality of the model severely. Although possible as well, instead of assuming G_B in (2.2) to be closed with respect to addition of constants the adjusted null hypothesis

H0 : g∈ G_B+R={x7→g_β(x) +c:β ∈B, c∈R} (2.6) will be considered for reasons of comprehensibility. Here, β and c are identified under assumption (A7) from Section 2.7, which will be introduced and discussed later.

Let (Yi, Xi), i = 1, ..., n, be realisations of model (2.1) and let τ ∈ (0,1). Let Fε be the distribution function of ε and denote the τ-quantile of the conditional distribution of a random variable Z (givenX =x) by F_Z⁻¹(τ) and F_Z|X⁻¹ (τ|x), respectively. Due to

F_Y⁻¹_|X(τ|x) =E[Y|X=x] +F_ε⁻¹(τ) =g(x) +F_ε⁻¹(τ), (2.7) there is a strong connection between the conditionalτ-quantile and the conditional expec-tation. Many Cram´er-von-Mises-type tests like that of H¨ardle and Mammen (1993) take advantage of the fact that g ∈ G_B is equivalent to (E[Y|X = x]−g_β₀(x))² = 0 for all x ∈ R^d^X and some β₀ ∈ B. Referring to (2.7), another condition, which is equivalent to (2.2), is

(F_Y⁻¹_|X(τ|x)−gβ0(x)−F_ε⁻¹(τ))² = 0 for allx∈R^d^X and some β0 ∈B. (2.8) This condition can be translated to the context of (2.6) as

(F_Y⁻¹_|X(τ|x)−g_β₀(x)−c)² = 0 for all x∈R^d^X and someβ₀∈B, c∈R. (2.9) Let v be a weighting function with compact support in R^d^X, such that for all τ ∈ (0,1) condition (2.9) and

v(x)(F_Y⁻¹_|X(τ|x)−gβ0(x)−c)² = 0 for all x∈R^d^X and some β0 ∈B, c∈R (2.10) are equivalent. Thanks to (2.7), for allτ, χ∈(0,1) the functionx7→F_Y⁻¹_|X(τ|x)−F_Y⁻¹_|X(χ|x) is constant, so that equation (2.10) can be extended to multiple quantiles. For this purpose, let µbe a finite measure with compact support in (0,1). Then, (2.10) is equivalent to

minc∈R

sup

x∈R^dX

v(x)(F_Y⁻¹_|X(τ|x)−g_β₀(x)−c)²= 0 for all τ ∈(0,1) and some β₀ ∈B, so that

minβ∈B

Z minc∈R

v(x)(F_Y⁻¹_|X(τ|x)−g_β(x)−c)²dx µ(dτ) = 0. (2.11)

Equation (2.11) will be the base of the test statistic. Recall the definitions of Section 1.1 and let K, h_x and h_y be some kernel functions and some bandwidths, respectively, and defineKhy(y) =_h¹

yK _h^y

as well as K(y) =

Z y

−∞

K(u)du, K_h_y(y) =

Z y

−∞

Khy(u)du,

K(x₁, ..., x_d_X) =

i=1

K(x_i), K_h_x(x₁, ..., x_d_X) =

i=1

K_h_x(x_i) and

p(y, x) = 1 n

i=1

K_h_y(y−Yi)K_h_x(x−Xi),

fˆ_X(x) = 1 n

i=1

K_h_x(x−Xi), Fˆ_Y_|X(y|x) = p(y, x)ˆ

fˆ_X(x). (2.12)

Now, estimate the conditionalτ-quantile

F_Y⁻¹_|X(τ|x) =g(x) +F_ε⁻¹(τ) via ˆF_Y⁻¹_|X(τ|x) and define the test statistic as

Tn=nh

x2 min

β∈B

Z minc∈R

v(x) ˆF_Y⁻¹_|X(τ|x)−g_β(x)−c2

dx µ(dτ). (2.13) Here and in the following,F_Y⁻¹_|X is assumed to be the quantile function if the inverse function ofF_Y_|X does not exist.

Remark 2.3.1 The inner minimization in (2.13) can be done analytically by solving

∂

∂c Z

v(x) ˆF_Y⁻¹_|X(τ|x)−g_β(x)−c2

dx=−2 Z

v(x) ˆF_Y⁻¹_|X(τ|x)−g_β(x)−c

dx= 0, that is

T_n=nh

x2 min

β∈B

Z Z

v(x) ˆF_Y⁻¹_|X(τ|x)−g_β(x)−ˆc_β,τ2

dx µ(dτ)

=nh

x2 min

β∈B

Z Z v(x)

R v(w) ˆF_Y⁻¹_|X(τ|x)−g_β(x)−( ˆF_Y⁻¹_|X(τ|w)−g_β(w)) dw R v(w)dw

dx µ(dτ) with

ˆ cβ,τ =

Rv(w)( ˆF_Y⁻¹_|X(τ|w)−g_β(w))dw

R v(w)dw . (2.14)

2.3. Testing for a Parametric Regression Function via the Conditional Quantile Function 2.3.2 Asymptotic Behaviour of the Test Statistic

In the following, the asymptotic behaviour ofTnis examined. Consider the local alternatives H1,n: g(x) =gβ0(x) +c0+cn∆n(x) for all x∈supp(v) (2.15) and some fixedβ0∈B, c0 ∈R and define

Y0=gβ0(X) +c0+ε.

Here,c_n=n⁻¹²h⁻

x 4 and ∆_nis assumed to be uniformly bounded inx andn. Similarly to (2.14), define

c_β,τ =

R v(x)(F_Y⁻¹

0|X(τ|x)−gβ(x))dx

R v(x)dx . (2.16)

As before, the conditional distribution function ofY0givenX, the (integrated) joint density of (Y₀, X) etc. are denoted byF_Y₀_|X, f_Y₀_,X (p₀) etc. LetD_β and Hess denote the derivative and the Hessian of a function with respect toβ. Let

Ω = Z

v(x)

R v(w) D_βg_β₀(x)−D_βg_β₀(w) dw R v(w)dw

dx (2.17)

be positive definite (this will be the assumption (A7) in Section 2.7). Moreover, define κ(x, τ) = v(x)

f_Y₀_|X(F_Y⁻¹

0|X(τ|x)|x)²f_X(x)². (2.18) Before the main result is presented, an auxiliary lemma is given. The assumptions are listed in Section 2.7.

Lemma 2.3.2 Let τ ∈(0,1). Assume model (2.1) under (A1),(A3)–(A6) from Section 2.7. Further, let

h³⁻

x 2 →0, h²_yh⁻

x 2 →0, hyh¹⁻

x 2 →0 (2.19)

or let R

K(w)²w dw := R

K(w)²wjdw

j=1,...,dX = 0∈R^d^X and h³⁻

x 2 →0, h²_yh⁻

x 2 →0, hyh²⁻

x 2 →0. (2.20)

Then, one has nh

Z Z

v(x) ˆF_Y⁻¹

0|X(τ|x)−F_Y⁻¹

0|X(τ|x)2

dx µ(dτ)−b→^D Z (2.21) with Z∼ N(0, V),

V = 2 Z Z

K(x)K(x+s)dx 2

Z v(w)² f_X(w)²dw Z ₁

Z ₁

Z I_{u₁_≤τ}−τ

I_{u₂_≤τ}−τ fε(Fε⁻¹(τ))² µ(dτ)

du1du2 (2.22)

and b=h⁻

x 2

K(w)²dw Z Z

κ(x, τ)p0(F_Y⁻¹

0|X(τ|x), x)

1−p₀(F_Y⁻¹

0|X(τ|x), x) f_X(x)

dx µ(dτ)

−h_yh⁻

x 2

K(w)²dw Z

2zK(z)K(z)dz Z Z

κ(x, τ)fY0,X(F_Y⁻¹

0|X(τ|x), x)

1−p₀(F_Y⁻¹

0|X(τ|x), x) fX(x)

dx µ(dτ)

+h¹⁻

x 2

Z Z

κ(x, τ) ∂

∂u

p0(F_Y⁻¹

0|X(τ|x), u)

1−p₀(F_Y⁻¹

0|X(τ|x), u) fX(u)

u=x

dx µ(dτ) Z

K(w)²w dw+h²⁻

x 2

K(w)²w^t Z Z

κ(x, τ)

∂²

∂u²p₀(F_Y⁻¹

0|X(τ|x), u) _u=x

−2p0(F_Y⁻¹

0|X(τ|x), x) f_X(x)

∂²

∂u²p0(F_Y⁻¹

0|X(τ|x), u) u=x

p₀(F_Y⁻¹

0|X(τ|x), x)² fX(x)²

∂²

∂x²fX(x)

dx µ(dτ)w dw.

Here, _∂x^∂²2f_X(x) denotes the Hessian of f_X. If h¹⁻

x 2 →0, hyh⁻

x 2 →0 (2.23)

or R

K(w)²w dw= 0 and

h²⁻

x 2 →0, hyh⁻

x 2 →0, (2.24)

bsimplifies to b=h⁻

x 2

K(w)²dw

Z v(x) f_X(x)dx

Z τ(1−τ)

fε(Fε⁻¹(τ))² µ(dτ) +o(1). (2.25) The proof is given in Section 2.8.2.

Remark 2.3.3 1. Condition (2.19) requires 3−^d₂^X >0, that is dX ≤5, (2.24) leads to d_X ≤3and (2.23) even to d_X = 1. Principally,b can alternatively be defined without any of these conditions as

b=h

Z Z

κ(x, τ)E

K_h_y(F_Y⁻¹

0|X(τ|x)−gβ0(X1)−c0−ε1)−p₀(F_Y⁻¹

0|X(τ|x), x) fX(x)

K_h_x(x−X₁)²

dx µ(dτ). (2.26)

There is a trade off between how simple b is and how strict the bandwidth conditions are.

2. Let α∈R^d^B. Then, α^tΩα =

Z v(x)

D_βg_β₀(x)−

R v(w)D_βg_β₀(w)dw Rv(w)dw

2.3. Testing for a Parametric Regression Function via the Conditional Quantile Function

= Z

v(x)

D_βg_β₀(x)α−

R v(w)Dβgβ0(w)α dw R v(w)dw

dx,

that is, positive definiteness ofΩ is only violated if there exists someα∈R^d^B, α6= 0, such that the map x 7→ Dβgβ0(x)α is constant with respect to x. For example, this is the case, ifG_B already contains constant functions (e.g. polynomial functions with intercepts). Such a case will be excluded by assumption (A7). However, the test can be applied for the general class as well (see the explanation below).

In the following, the assumptions (2.19) and (2.20) are relaxed and expression (2.26) is used forb. The minimization with respect to c may cause the existence of multiple minimizing parameters β ∈B, especially ifG_B is already closed with respect to addition of constants.

To derive the asymptotic behaviour, it will be assumed that Z Z

v(x)(F_Y⁻¹

0|X(y|x)−g_β(x)−c_β,τ)²dx µ(dτ)>0 for all β₀6=β∈B,

that is, the classG_B is possibly shrunk to avoid multiple solutionsβ of the outer minimiza-tion. Nevertheless, since the value of the test statistic is not influenced by this shrinkage, the resulting test can be applied in the general case as well.

Theorem 2.3.4 Assume model (2.1). Further, let (A1)–(A8) from Section 2.7 hold and let b, V and Z be defined as in Lemma 2.3.2. Then,

Tn−b−δn

→D Z, where

δ_n=µ([0,1]) Z

v(x) ∆_n(x)−

R v(w₁)∆_n(w₁)dw₁ R v(w2)dw2

−

D_βg_β₀(x)−

R D_βg_β₀(w₃)dw₃ R v(w4)dw4

Ω⁻¹ Z

v(w₅)∆_n(w₅)

D_βg_β₀(w₅)−

R Dβgβ0(w6)dw6

Rv(w₇)dw₇

dw₅ t!2

dx.

Under H₀ (that is∆_n≡0 and thus δ_n= 0), this leads to T_n−b→^D Z.

The proof can be found in Section 2.8.3. Later, a hypothesis test will be deduced from Theorem 2.3.4. To see whether δn lies above some thresholdδ >0, that is, a test based on the asymptotic distribution ofTn would detect the local alternative, define

∆˜_n(x) = ∆_n(x)−

Rv(w1)∆n(w1)dw1

R v(w2)dw2

as well as

D(x) =˜ Dβgβ0(x)−

R D_βg_β₀(w₁)dw₁ R v(w₂)dw₂ . Then,

Ω = Z

v(x) ˜D(x)^tD(x)˜ dx.

Moreover, it can be shown by similar arguments as in the proof of Remark 2.3.5 below that δ_n=µ([0,1])

Z v(x)

∆˜_n(x)− Z

v(w) ˜∆_n(w) ˜D(w)dw

Ω⁻¹D(x)˜ ^t 2

dx.

Let ˜β be the minimizer

β˜= arg min

β∈B

v(x) ˜∆_n(x)−D(x)β˜ 2

dx.

By standard calculations, it can be shown that β˜= Ω⁻¹

v(x) ˜∆n(x) ˜D(x)dx and

δn= Z

v(x) ˜∆n(x)−D(x) ˜˜ β2

dx,

that is,δ_nis greater than zero, if ˜∆_nas a function is linearly independent of the components of ˜D. For some fixed ∆n≡∆, ˜β and δn are independent of n, so that

δ_n=δ = Z

v(x) ˜∆(x)−D(x) ˜˜ β2

dx >0,

if ˜∆ is linearly independent of ˜D. Such an orthogonality condition is quite intuitive and is often assumed explicitly, e.g., by H¨ardle and Mammen (1993). See the proof of the following remark for more details.

Remark 2.3.5 1. δn can alternatively be expressed as δ_n=µ([0,1])

v(x)∆_n(x)²dx−µ([0,1])

Rv(x)∆_n(x)dx2

R v(w)dw

−µ([0,1]) Z

v(x)∆n(x)

Dβgβ0(x)−

R D_βg_β₀(w₁)dw₁ R v(w₂)dw₂

Ω⁻¹ Z

v(x)∆n(x)

D_βg_β₀(x)−

RDβgβ0(w1)dw1

R v(w₂)dw₂

dx t

=δ1,n+δ2,n+δ3,n.

2. Let model (2.15) hold for some sequence β_n with ||β_n −β₀|| = c_n and ∆_n(x) =

g_βn(x)−g_β

0(x)

cn . Then, δn= o(1), that is, the asymptotic behaviour of the test statistic is the same as for g(x) =g_β₀(x) +c₀. This is consistent withg=g_β₀+c₀+c_n∆_n= gβn+c0 ∈ G_B+R. The proof can be found on page 71.

The easiest way to construct a test with asymptotic level α for a given α ∈ (0,1) may consist in estimating b and V by some estimators ˆb and ˆV and to reject H₀ if T_n >ˆb+ pV uˆ 1−α, where u1−α denotes the (1−α)-quantile of the standard normal distribution.

The corresponding test looks like

Φ(Y₁, X₁, ..., Y_n, X_n) =I

Tn>ˆb+

√

V uˆ 1−α

. (2.27)

See Section 2.5 for more details.

Theorem 2.3.6 1. Assume (A1),(A3),(A5) from Section 2.7 as well as

τ∈supp(µ)inf fε(F_ε⁻¹(τ)) inf

x∈supp(v)fX(x)>0. (2.28)

2.4. Extending the Test to Nonparametric Transformation Models

Moreover, let fX, fε and g_β₀ be uniformly continuous and assume hx, hy,log(n)

nh^dx^X

,log(n) nhy

→0.

If ˆb and Vˆ are some estimators ofb and V, such that ˆb=o_p nh

x2 and

pVˆ =o_p nh

x2 , one has

P(Φ(Y₁, X₁, ..., Y_n, X_n) = 1)→1

under fixed alternatives. Especially, the test is consistent under (A1)–(A8) with fixed cn,∆n.

2. Assume (A1)–(A8) for ∆n≡0 and letˆb,Vˆ be some estimators of b andV with ˆb−b=o_p(1) and Vˆ −V =o_p(1). (2.29) Then,

P(Φ(Y₁, X₁, ..., Y_n, X_n) = 1)→α.

The proof is given in Section 2.8.5.

Remark 2.3.7 When considering a quantile regression model Y =g(X) + ˜ε

for some fixedτ ∈(0,1)withε˜not necessarily independent ofX and F_ε|X_˜⁻¹(τ|X) = 0 almost surely, the test (without minimizing with respect to c and with µ being the Dirac measure in τ) can still be applied to test for the null hypothesis

H˜0 : g∈ G_B.

It is supposed that after replacing the product density of (X, ε) by the joint densityfX,˜ε in (A1)–(A8) and assuming

x∈supp(v)inf f_X,˜_ε(x,0)>0,

the presented results remain valid for testing H˜0, although in general for different b andV.

2.4 Extending the Test to Nonparametric Transformation

Im Dokument Nonparametric Transformation Models (Seite 38-45)