Least squares regression - Piecewise polynomial regression with fractional residuals for the an

Definition 4.2.1. Let x(t) = µ _n^t

+ξ(t) (t = 1, . . . , n) as before. Then, the least squares estimator θˆof θ is defined by the equation

L_n θˆ_n

= inf

τ∈Θ

L_n(τ), where L(τ) = Pn

t=1

x(t)−µ _n^t, τ2

and Θ =R^ν ×S^l.

To improve the asymptotic variance, we consider the more general weighted least squares estimator θˆ^w_n which is defined by

L^w_n θˆ_n^w

= inf

τ∈Θ

L^w_n(τ), where L^w_n(τ) = Pn

t=1w _n+1^t x(t)−µ _n^t, τ2

. Here, w : (0,1)→ R^∗+ denotes a contin-uously differentiable weight function satisfying the regularity assumptions

Z 1 0

|w(s)|²ds <∞ (4.4) and

1 n

t=1

w _n+1^t

h _n^t n→∞

−−−→

Z 1 0

h(t)w(t) dt <∞ ∀h∈ C([0,1]) (4.5)

where the integral the integral on the right-hand side in (4.5) is assumed to exists in the sense of Lebesgue. Moreover, we assume that w satisfies the assumption of theorem 3.2.14.

4.2.2. As said above, we consider a fully parametric regression setting. To avoid a poten-tial confusion of this approach with related nonparametric spline models, we will briefly recall the basic ideas of these conceptual different techniques.

A popular nonparametric method are the so-called smoothing splines as introduced by Whittacker [Whi23], Schoenberg [Sch64] and Reinsch [Rei67]: let

yi =f(ti) +εi (i= 1, . . . , n)

with 0 < t₁ < . . . < t_n < 1 denoting fixed design points, f : [0,1] → R a real function such that R1

f^(m)

2(t) dt < ∞ and εi some stochastic noise. Then, for fixed λ > 0 the smoothing spline s_λ is defined as the solution of the problem

s:[0,1]→minR, R1

0 |s^(m)|²(t) dt<∞

(1 n

i=1

(yi −s(ti))²+λ Z 1

s^(m)

2(t) dt )

. (4.6)

The parameter λ controls the tradeoff between smoothness and goodness-of-fit. If λ →

∞, the solution s_λ converges to a polynomial of order m. If λ → 0, s_λ will almost interpolate the data. Although this optimisation problem is initially defined over an infinite dimensional vector space, one can show that in the case n > m, the unique solution of this problem is given by a natural spline of order 2m with knots at t₁, . . . , t_n, see e.g. [Eub88, lemma 5.2, page 283]. Recall, that a spline of order 2m with knots at t1, . . . , tn is called natural spline if it is polynomial of order m outside [t1, tn]. One can can show that the space of all natural splines of order 2m with knots at t₁, . . . , t_n has dimension n. Hence, a solution to (4.6) can be given explicitly: denote by x_i, . . . , x_n a basis of the vector space of the natural splines and define

X_n = [x_j(t_i)]i,j=1,...,n ∈R^n×n. Furthermore, define

Ω = Z 1

x^(m)_i (t)x^(m)(t)_j dt

i,j=1...,n

. Then, the solution to (4.6) can be written as s_λ = Pn

i=1c_ix_i where (c₁, . . . , c_n) satisfies the equation system

X^t_nX_n+nλΩ

c=X^t_ny

which can be considered as a ridge regression problem. The parameterλis usually referred to as penalising, smoothing or regularisation parameter. The estimated trend function can now be written as

s_λ(t) =

i=1

y_iK_n(t, t_i;λ) with

K_n(t, s;λ)→ e^−|t−s|/

√

λ+e^−2/

√ λe^|t−s|/

√

λ +e^−(t−s)/

√

λ+e^(t+s+2)/

√ λ

2√

λ(1−e^−2/

√

λ) +O (nλ)⁻¹

uniformly insandtforλ →0 andnλ→ ∞, see [Eub88, theorem 5.4, page 247]. Based on this asymptotic kernel representation, asymptotic values of the integrated mean squared error can be derived. Note that λ corresponds to the bandwidth of this asymptotic kernel. For more information, including estimation procedures for λ see e.g. [Eub88] and in particular [Wah92]. For smoothing splines with autocorrelated short-memory errors, see [OWY01, Wan98, RKW92].

Agarwal and Studden [AS80, AS78] consider the regression model yi =g(xi) +εi

where g : [0,1] → R is d times differentiable and the ε_i are iid with finite variance. To estimate g, they fit a spline of orderdto the data. Based on the asymptotic mean squared error, the optimal number and placement of the knots are determined.

4.2.3. In our context, weighted regression is used to improve the asymptotic variance of the estimator. It is well-known that the usual least squares estimator is asymptotically not efficient if applied to long-memory data, see [Yaj88, Yaj91]. Dahlhaus [Dah95] introduced the following class of weight function which lead in the case of polynomial regression to an asymptotically efficient estimator: let ξ(·) be either a long-memory or antipersistent process in the sense of definition 1.1.5. Define

w_d(x) = x^−d(1−x)^−d

where d = 1−2α with α as in definition 1.1.5. Then, w_d satisfies the conditions stated in definition 4.2.1. In fact, (4.4) follows by definition. As for (4.5), note that

1 n

t=1

w_d _n+1^t h _n^t

= 1 n

bn/2c

t=1

w_d _n+1^t h _n^t

+ 1 n

dn/2e+1

w_d _n+1^t h _n^t

+o(1).

Define the sequence (f_n)n∈N byf_n(u) = Pbn/2c t=1 1^ht−1

n+1, t n+1

h(u)w_d _n+1^t h _n^t

, u ∈[0,1/2].

Then f_n(u) → w_d(u)h(u) for all u ∈ (0,¹₂) and f_n ≤ khk∞w_d if d > 0 and f_n ≤ khw_dk∞

if d <0. Thus, by Lebesgue’s theorem, 1

bn/2c

t=1

w_d _n+1^t h _n^t

→ Z ¹₂

w_d(u)h(u) du.

Likewise,

1 n

dn/2e+1

w_d _n+1^t h _n^t

→ Z 1

1 2

w_d(u)h(u) du,

which proves (4.5). The assumptions of theorem 3.2.14 are more demanding to prove.

Assume that ξ(·) has long memory. Then, we need to show that 1

n²

s,t=1 s6=t

w _n+1^t

w _n+1^s ^t−s_n

−β n→∞

−−−→

Z 1 0

w(s)w(t)

|s−t|^β dsdt <∞ ∀β ∈(0,1). (4.7)

where the integral on the right-hand side in (4.7) is defined in the sense of Lebesgue. To verify (4.7), observe at first that the integral

Z 1 0

s^−d(1−s)^−d(1−t)^−dt^−d

|s−t|^β dsdt

exists for all d∈(−0.5,0.5) andβ ∈(0,1). Indeed, by substitution s=ut, Z 1

t^−d Z t

s^−d(t−s)^−βdsdt = Z 1

t^1−2d−β Z 1

u^−d(1−u)^−βdudt <∞.

By symmetry,

Z 1 0

s^−dt^−d

|s−t|^β dsdt <∞ and

Z 1 0

(1−s)^−d(1−t)^−d

|s−t|^β dsdt <∞.

In particular,

Z ¹₂

s^−d(1−s)^−d(1−t)^−dt^−d

|s−t|^β dsdt <∞ and

Z 1

1 2

Z 1

1 2

s^−d(1−s)^−d(1−t)^−dt^−d

|s−t|^β dsdt <∞.

Finally, we have Z 1

1 2

Z ¹₂

(1−s)^−ds^−dt^−d(1−t)^−d

|s−t|^β dsdt≤K Z 1

1 2

(1−t)^−d Z ¹₂

s^−d(¹₂ −s)^−βdsdt <∞

which proves our claim. To show convergence of the Riemann sums in (4.7), define for u, v ∈[0,1/2] the function g_n(u, v) by

g_n(u, v) =

bn/2c

s,t=1 s6=t

1^hs−1 n+1, s

n+1

h(u)1^ht−1 n+1, t

n+1

h(v)w_d _n+1^t

w_d _n+1^s ^t−s_n

−β.

Then

|g_n(u, v)| ≤w_d(u)w_d(v)

¹₂(u−v)

−β

and thus 1 n²

bn/2c

s,t=1 s6=t

w _n+1^t

w _n+1^s ^t−s_n

−β =(n+ 1)² n²

Z ¹₂

g_n(u, v) dudv

−→

Z ¹₂

w(s)w(t)

|s−t|^β dsdt

Extending this argument to the whole summation index, gives the desired result, i.e.

1 n²

s,t=1 s6=t

w _n+1^t

w _n+1^s ^t−s_n

−β −→

Z 1 0

w(s)w(t)

|s−t|^β dsdt

We conclude the long-memory case by noting that the argument still holds if w in (4.7) is replaced by 1.

If ξ(·) is antipersistent, then d <0 and and w is absolutely continuous by definition.

Furthermore, we need to show that

− 1 n²

t=1

f _n+1^t

w _n+1^t

s=1,s6=t

g _n+1^s

w _n+1^s

−g _n+1^t

w _n+1^t

^s−t_n

−→ − Z 1

f(t)w(t) Z 1

g(s)w(s)−g(t)w(t)

|s−t|^β dsdt

(4.8)

for all β ∈(1,2) and all (not necessarily continuous) piecewise polynomialsf, g : [0,1]→ R.

To verify (4.8), we show at first that the integral on the right-hand side is indeed well-defined. To start with, consider the special case g =f = 1 and integration over the

set [0,¹₂]×[0,¹₂]. Since w_d⁰(u)≤const·s^|d|−1 for all 0< s≤u≤t < ¹₂, we find that Z ¹₂

w_d(t) Z ¹₂

|w_d(s)−w_d(t)|

|s−t|^β dsdt=2 Z ¹₂

w_d(t) Z t

w_d(t)−w_d(s) (t−s)^β dsdt

=2 Z ¹₂

w_d(t) Z t

sw⁰(u) du (t−s)^β−1 dsdt

≤const Z ¹₂

w_d(t) Z t

s^|d|−1

(t−s)^β−1 dsdt

<∞,

by the same arguments used to establish (4.7). Likewise, Z 1

1 2

w_d(t) Z 1

1 2

|wd(s)−wd(t)|

|s−t|^β dsdt <∞ and

Z ¹₂

w_d(t) Z 1

1 2

|wd(s)−wd(t)|

|s−t|^β dsdt ≤ Z ¹₂

w_d(t) Z 1

1 2

(s−t)^β dsdt

= Z ¹₂

w_d(t)(1−t)^1−β−(¹₂ −t)^1−β

β−1 dsdt

<∞.

So for f =g = 1, the integral in (4.8) is indeed finite. Now, let g(s) = (s−η)^k₊ for some η ∈[0,1) and some integer k≥0. Then

Z 1 0

w_d(t) Z 1

g(s)w_d(s)−g(t)w_d(t)

|s−t|^β dsdt

= Z 1

w_d(t) Z 1

g(s)w_d(s)−g(s)w_d(t)

|s−t|^β dsdt+ Z 1

w²_d(t) Z 1

g(s)−g(t)

|s−t|^β dsdt

=I1+I2, say. But

I₁ ≤ kgk∞

Z 1 0

w_d(t) Z 1

|wd(s)−wd(t)|

|s−t|^β dsdt <∞ and, for k ≥1,

I₂ ≤const Z 1

w_d²(t) Z 1

|s−t|^1−βdsdt <∞.

On the other hand, we obtain for k= 0 I₂ =−

Z η 0

w²_d(t) Z 1

|s−t|^β dsdt+ Z 1

w_d²(t) Z η

|s−t|^β dsdt <∞.

By linearity, we conclude that Z 1

wd(t) Z 1

|g(s)wd(s)−g(t)wd(t)|

|s−t|^β dsdt <∞ for all piecewise polynomials g. It follows that

Z 1 0

f(t)w_d(t) Z 1

g(s)w_d(s)−g(t)w_d(t)

|s−t|^β dsdt

≤kfk_∞ Z 1

w_d(t) Z 1

|g(s)w_d(s)−g(t)w_d(t)|

|s−t|^β dsdt <∞.

for all piecewise polynomials f. To show that the Riemann sums in (4.8) converge, we argue similarly as in the proof of (4.7): Let g(u) = (u−η)^k₊ be as above and let f be a piecewise polynomial. Define for u, v ∈[0,1/2] the function h_n(u, v) by

hn(u, v) =

bn/2c−1

s,t=1 t>s

1^hs−1 n+1, s

n+1

h(u)1^h t n+1,t+1

n+1

h(v)f _n+1^t

wd t n+1

g _n+1^s

w_d _n+1^s

−g _n+1^t

w_d _n+1^t

^t−s_n

# . Then, for s, t= 1. . .bn/2c with t > s,

g _n+1^s

w_d _n+1^s

−g _n+1^t

w_d _n+1^t

^t−s_n

≤g _n+1^s wd t n+1

−wd s n+1

^t−s_n

β +w_d _n+1^t |g _n+1^t

−g _n+1^s

^t−s_n

β .

Therefore, for all u∈(0,¹₂) and v ∈(u,¹₂) hn(u, v)≤ kfk∞kwdk∞kgk∞

w_d(v)−w_d(u)

2|u−v|^β

+kfk∞kwdk²_∞

g(v)−g(u)

2|u−v|^β

and at the same time

h_n(u, v)→f(v)w_d(v)

g(u)w_d(u)−g(v)w_d(v)

|u−v|^β

Thus, by Lebesgue’s theorem and symmetry,

− 1 n²

bn/2c−1

t=1

f _n+1^t

w _n+1^t

bn/2c−1

s=1,s6=t

g _n+1^s

w _n+1^s

−g _n+1^t

w _n+1^t

^s−t_n

−→ − Z ¹₂

f(t)w(t) Z ¹₂

g(s)w(s)−g(t)w(t)

|s−t|^β dsdt

Similar arguments apply to other subsets of the summation index and/or area of integra-tion, consider for example the functions

˜h_n(u, v) =

s,t=dn/2e+1 t>s

1^hs−1 n+1, s

n+1

h(u)1^h t n+1,t+1

n+1

h(v)f _n+1^t

w_d _n+1^t

g _n+1^s

wd s n+1

−g _n+1^t wd t

n+1

^t−s_n

# . and

h¯n(u, v) =

t=dn/2e+1

bn/2c−1

n+1−t<ss=1

1^h s n+1,s+1

n+1

h(u)1^h t n+1,t+1

n+1

h(v)f _n+1^t wd t

n+1

g _n+1^s

w_d _n+1^s

−g _n+1^t

w_d _n+1^t

^t−s_n

# .

Finally, for the antipersistent case we need to show that that w ∈ I^|d|,p(R) and w ∈ I^|2d|,q(R) for some 1< p < _|d|¹ , 1< q < _|2d|¹ such that ¹_p+¹_q = 1+|d|withd= ^1−α₂ . This can be done by successively applying the results of section 2.5: define ˜w_d(x) = 1_]0,1/2](x)x^d. Applying the reflexion operator Q and proposition 2.5.5 on example 2.5.1 (a= 0, b= 1), shows that

w_d∈I^|2d|,p for all p≤ 1 d

Since the restriction ofh(x) = (1−x)^|d|on ]0,¹₂] can be extended toRsuch that it satisfies the assumptions of proposition 2.5.4, we conclude that

w_d∈I^|2d|,p for all p≤ 1 d

Similar arguments for the interval [1/2,1] lead to wd ∈I^|2d|,p for all p≤ _d¹. Likewise, we can show that w_d∈I^|d|,p for all p≥1.

As in the long-memory case, note that all arguments still hold ifw is replaced by 1.

4.2.4. If we replace the general h in (4.5) by a (not necessarily continuous) piecewise polynomialµ(·, θ), then the assertion holds (in some sense) uniformly inθ. More precisely, setν_n= Pn ¹

t=1w(_n+1^t ) Pn

t=1w _n+1^t ^t

n with ^t

n denoting the Dirac measure at _n^t. By (4.5), Z 1

h(s)dν_n(s)→ 1 R1

0 w(s)ds Z 1

h(s)w(s) ds

for all h∈ C([0,1]), i.e. (ν_n)n∈N converges weakly to the measure ν defined by ν(]a, b]) :=

Z b a

1_[0,1](s) 1 R1

0 w(s) dsw(s) ds.

Denote by FB = {µ(·, θ) :θ ∈B} a family of (not necessarily continuous) piecewise polynomials for some bounded set B ⊂Θ. It then follows from [BT67, theorem 1, p. 2]

that

limn sup

θ∈B

µ(θ, s) dν_n(s)− Z

µ(θ, s) dν(s)

= 0, (4.9)

which entails

limn sup

θ∈B

1 n

t=1

w _n+1^t

µ _n^t, θ

− Z

µ(s, θ)w(s) ds

= 0. (4.10)

4.2.5. We need to fix some additional notation. Define m^j,k_n ∈Rⁿ by m^j,k_n =h

n −η_kbj,k

t=1 (j = 1, . . . , p_k, k = 0, . . . , l). Given a set of knots k, the design matrix M_n,l,(p_k_),(b_j,k₎(k) is then defined by

M_n,l,(p_k_),(b_j,k₎(k) =

m^1,0_n , . . . ,m^p_n⁰^,0,m^1,1_n , . . . ,m^p_n¹^,1, . . . ,m^1,l_n , . . . ,m^p_n^l^,l

∈R^n×ν. To avoid a too cumbersome notation, we often omit the subscript and write only M_n(k) or M_n. Moreover, we denote

Wn = diag w _n+1¹

, . . . , w _n+1ⁿ

∈R^n×n

and

hx, yi_Rⁿ_,w =

t=1

x_tw _n+1^t

y_t =x⁰W_ny.

If w is the identity mapping, we just write hx, yi_Rⁿ. It follows from (4.10) that for each k∈S^l, there exists N(k) such that Mn(k)⁰WnMn(k) is invertible for all n > N(k).

Fora, b∈[0,1] such that a < b and p, n∈N\ {0}, define the matrix F^a,b_p,n by F^a,b_p,n=

n −^banc_n j−1

1^„banc n ,bbnc

n – t

t=1,...,n,j=1...,p

∈R^n×p.

The columns will be denoted by f_j,n^a,b(j = 1, . . . , p). Moreover, define

v^a,b_j,n =







f_j,n^a,b

kf_j,n^a,bk if f_j,n^a,b 6= 0, 0 otherwise.

We define the matrices V^a,b_p,n by V^a,b_p,n=h

v^a,b_1,n, . . . ,v^a,b_p,ni

∈R^n×p.

Note that banc< t ≤ bbnc if and only if a < t/n ≤b and dane ≤t < dbne if and only if a≤t/n < b.

Forq =bbnc − banc ≥1, define U^a,b_n by

U^a,b_n = [δ_tj]t=1,...,n;j=banc+1,...,bbnc ∈R^n×q. Then, for a < b≤c < d, the following orthogonality relations hold:

U^a,b_n ⊥F^c,d_p,n, U^c,d_n ⊥F^a,b_p,n, U^a,b_n ⊥U^c,d_n , and F^a,b_p,n ⊥F^c,d_p,n.

For a given matrix V ∈R^n×s, denote by sp (V) the corresponding column space and by pr_V,w the orthogonal projection on sp (V) with respect toh·,·i

Rⁿ,w. IfV has full rank, pr_V,w may be written as pr_V,w =V(V⁰WnV)⁻¹V⁰Wn, see [Har97, page 260].

Lemma 4.2.6. Let µbe identifiable and let ∆>0. Denote B(k,∆) =n

k˜ ∈S^l:

k˜−k

≤∆o . Then, there exists a δ >0 such that

lim inf

˜ inf

k/∈B(k,∆)

n⁻¹

µ_n(θ)−pr_M

n(˜k),wµ_n(θ)

2 Rⁿ,w

> δ.

Proof. We shall prove this lemma by contradiction: Assume there exists ∆>0 such that lim inf

inf

˜k/∈B(k,∆)

n⁻¹

µ_n(θ)−pr_M

n(˜k),wµ_n(θ)

2 Rⁿ,w

= 0.

In this case, we can find k∞ ∈S^l and a subsequence (k_n(k))k∈N such that lim_kk_n(k) =k∞

and |k_n(k)−k|>∆ and limk

n⁻¹(k)

µ_n(k)(θ)−pr_M

n(k)(k_n(k)),wµ_n(k)(θ)

2 R^n(k),w

= 0.

By Pythagoras’ theorem, this is equivalent to limk

n⁻¹(k)

µ_n(k)(θ)

R^n(k),w−n⁻¹(k) pr_M

n(k)(k_n(k)),wµ_n(k)(θ)

2 R^n(k),w

= 0 and so, by (4.10),

limk n⁻¹(k) pr_M

n(k)(^kn(k))^,wµ_n(k)(θ)

R^n(k),w = Z 1

µ²(s, θ)w(s) ds. (4.11) Denote byη_n_k_,j thej-th component ofk_n_k (j = 1, . . . , l) and defineη_n_k_,0 = 0 andη_n_k_,l+1 = 1 for all k. Then, for b = max^l_k=0max^p_j=1^k b_j,k+ 1,

sp M_n(k) k_n(k)

⊂

ν=0

F^η_n(k),b^n(k),ν^,η^n(k),ν+1

and thus

1 n(k)

pr_M

n(k)(^kn(k))^,wµ_n(k)(θ)

2 Rⁿ,w

≤ 1 n(k)

ν=0

Fηn(k),ν ,ηn(k),ν+1

n(k),b ,wµ_n(k)(θ)

2 Rⁿ,w

(4.12)

Assume for the moment that n⁻¹(k)

prFηn(k),ν ,ηn(k),ν+1

n(k),b ,wµ_n(k)(θ)

Rⁿ,w

→

pr_Fη∞,ν ,η∞,ν+1

∞,b ,wµ(·, θ)

L²([0,1],w)

(4.13)

where pr_Fη∞,ν ,η∞,ν+1

∞,b ,wµ(·, θ) denotes the orthogonal projection withinL²([0,1], w) ofµ(·, θ) onto the subspace

F^η_∞,b^∞,ν^,η^∞,ν+1 =







sp s⁰1_]η_∞,ν_,η_∞,ν_](s), . . . , s^b1_]η_∞,ν_,η_∞,ν_](s)

if η∞,ν < η∞,ν+1,

0 if η∞,ν =η∞,ν+1.

It then follows from (4.12) and (4.13) that kµ(·, θ)k²_L2([0,1],w)≤

pr_⊕l

ν=0Fη∞,ν ,η∞,ν+1

∞,b ,wµ(·, θ)

L²([0,1],w)

, But k∞6=kand thus µ(·, θ)∈/ Ll

ν=0F^η_∞,b^∞,ν^,η^∞,ν+1, a contradiction.

It remains to prove (4.13): Since k∞ ∈ S^l, we have either |η_n(k),ν −η_n(k),ν+1| → 0 or η_∞,ν < η_∞,ν+1 (ν = 0, . . . , l). If |η_n(k),ν−η_n(k),ν+1| →0, then

n⁻¹(k)

prFηn(k),ν ,ηn(k),ν+1

n(k),b ,wµ_n(k)(θ)

Rⁿ,w

≤n⁻¹(k)

bη_n(k),ν+1n(k)c

t=bη_n(k),νn(k)c

µ²

t n(k), θ

w _n+1^t

→0.

If η∞,ν < η∞,ν+1, thenF^η_n(k),b^n(k),ν^,η^n(k),ν+1 has full column rank fork large enough. For suchk we obtain

prFηn(k),ν ,ηn(k),ν+1

n(k),b ,wµ_n(k)(θ) =F^η_n(k),b^n(k),ν^,η^n(k),ν+1a_n(k) with

a_n(k)=h

F^η_n(k),b^n(k),ν^,η^n(k),ν+1⁰W_n(k)F^η_n(k),b^n(k),ν^,η^n(k),ν+1i−1

F^η_n(k),b^n(k),ν^,η^n(k),ν+1⁰W_n(k)µ_n(k)(θ) By 4.2.4 a_n(k) → a∞ with a∞ denoting the regression coefficients of µ(·, θ) onto the subspace F^η_∞,b^∞,ν^,η^∞,ν+1 within L²([0,1], w). Thus

n⁻¹(k)

prFηn(k),ν ,ηn(k),ν+1

n(k),b ,wµ_n(k)(θ)

Rⁿ,w

=n⁻¹(k)

F^η_n(k),b^n(k),ν^,η^n(k),ν+1a_n(k)

2 Rⁿ,w

→

pr_Fη∞,ν ,η∞,ν+1

∞,b ,wµ(·, θ)

L²([0,1],w)

5 Limit theorems

Im Dokument Piecewise polynomial regression with fractional residuals for the analysis of calcium imaging data (Seite 84-95)