Theoretical properties - Resampling-based tuning of ordered model selection

In this section, we want to show that the same results, which held for the probabilistic loss for the known variance case, continue to hold up to small correction terms in the bootstrap version of the method. In the following, we define the quantities that will govern how well the bootstrap method will perform. We write Ψm,i for the ith column of Ψm.

We measure thedesign regularity by the value δΨ

δ_Ψ ^def= max

m∈M max

1≤i≤nσ_i∥S_m^−1/2Ψ_m,i∥. (4.3.1) where Sm=ΨmΣΨ_m^⊤.

Thepresmoothing biasfor the presmoothing operator Π is measured by the vector

B = Σ^−1/2(f^∗−Πf^∗).

It is the approximation bias of f^∗ by Πf^∗ weighted by the standard devi-ations of the noise terms.

We also measure the presmoothing stochastic noisein terms of the covariance matrix Var(˘ε) of the smoothed noise ˘ε=ε−Π_Σε, where ΠΣdef

= Σ^−1/2ΠΣ^1/2. Namely, this matrix is assumed to be sufficiently close to the unit matrix 1n, in particular, its diagonal elements should be close to one.

This is measured by the operator norm of Var(˘ε)−1_n and by the deviation of the individual variances IEε˘²_i from one. We also need a control on the

maximal variance of the individual variances of ˘εi−εi. We write this as:

c1 def

= ∥Var(˘ε)−1n∥_op, δ1

def= max

1≤i≤n

Var (˘εi−εi), δ₂ ^def= max

1≤i≤n|IEε˘²_i −1|^1/2.

In particular, in the case of homogeneous errors Σ =σ²1n and assuming the presmoothing operator Π to be a p-dimensional projector of the form Π=Ψ_m^⊤†

Ψ_m†Ψ_m^⊤†

−1

Ψ_m† for some model m^†∈M. We have Var(˘ε) = (1n−Π)²=1n−Π≤1n, and

c1 = ∥Var(˘ε)−1n∥_op = ∥Π∥_op = 1, δ²₁ = max

1≤i≤nVar (˘εi−εi) = max

1≤i≤nΨ_m^⊤†,i

Ψ_m^†Ψ_m^⊤†

−1

Ψ_m^†_,i δ²₂ = max

1≤i≤n|IEε˘²_i −1| = max

1≤i≤nΨ_m^⊤†,i

Ψ_m^†Ψ_m^⊤†

−1

Ψ_m^†_,i, and δ₁, δ₂ will usually be of order 

n, if Ψ_m^† satisfies some Lindeberg-type condition similar to (4.3.1). In the following theorem, we will show that when using the bootstrap calibrated bounds z_m,m^♭ ◦(x+q_m^♭◦) instead of z_m,m^◦(x+q_m^◦) , we will get almost the same probability statements for the stochastic noise terms ξ_m,m^◦. For the statement of the theorem, we assume that we can write our models as the projection of a larger model onto a smaller feature set. We write Ψ =Ψmmax ∈ IR^p×n for the design matrix of the largest model m_max ∈ M with feature dimension p. We assume that the Ψm from (4.1.2) can be written as projections of the largest model onto a smaller feature set:

Ψ_m =Π_mΨ for a projector Πm.

Theorem 4.3.1. Let Y = f^∗ +ε be a Gaussian vector in IRⁿ with in-dependent components, Y ∼ N(f^∗, Σ) for Σ = diag(σ²₁, . . . , σ_n²). Let

61 the design matrix Ψ ∈ IR^p×n of the largest model in M, be such that S = Ψ ΣΨ^⊤ ∈ IR^p×p is invertible. For a given presmoothing operator Π: IRⁿ → IRⁿ let the values z_m,m^♭ ◦(x) and q_m^♭◦ for all m > m^◦ be de-fined by (4.2.2) and (4.2.1). Then it holds



m>mmax^◦

∥ξ_m,m◦∥ −z_m,m^♭ ◦(x+q_m^♭◦)

≥0



≤ 8 exp(−x) +∆_ξ(x).

with

∆ξ(x) ^def= p^1/2(4δ₁²xn+ 4

√

2δ1xn+ 4

√

2∥B∥∞δ1

√xn+ 2δΨ

x+ log(2p) +2δ²_Ψ(x+ log(2p)) +∥B∥²_∞+δ_Ψ²∥B∥√

2x, if ∆_ξ/p^1/2≤1/2.

To give a more readable version of the bound, we make the assumption, that we can bound δ_Ψ, δ1 by some common δ and 2p≤n, then we can get a simpler bound:

∆_ξ(x) ≤ C x_np^1/2

δ+∥B∥_∞δ+∥B∥²_∞+∥B∥δ² for some numerical constant C>0 .

The SmA procedure also involves the values pm,m^◦ which are unknown and depend on the noise structure Σ. The next result shows that the bootstrap counterparts p_m,m^♭ ◦ can be well used in place of p_m,m^◦.

Theorem 4.3.2. Assume the conditions of Theorem 4.3.1. Then it holds for p_m,m^♭ ^◦ =IE^♭∥ξ_m,m^♭ ◦∥²:



m>mmax^◦,m,m^◦∈M



 p_m,m^♭ ^◦ pm,m^◦

−1



> ∆p(x)



≤ 3 exp(−x), where

∆p(x) ^def= 4x^1/2_M δΨ + 2x_Mδ_Ψ² +∥B∥²_∞+ 4x^1/2_M δ_Ψ² ∥B∥+δ2

with x_M=x+ 2 log(|M|).

Again we are giving a simplified bound under the assumption that some δ bounds δ₂ and δ_Ψ from above and that x≥1 .

∆_p ≤C x_M

δ+δ²∥B∥+∥B∥²_∞

for some numerical constant C >0 . Finally, we show that the bounds we used in Theorem 3.4.2 in the section to bound the payment for adaptation are the same as for the bootstrap-case up to some correction term:

Proposition 4.3.3. Under the conditions of Theorem 4.3.1 IP



m>mmax^◦



z_m,m^♭ ◦−K_z(x)

(1 +β)√

p_m,m^◦−

2λ_m,m^◦x_M

≥0



≤3 exp(−x) +∆_ξ(x), where

Kz(x) = max{



1 +∆ξ(x),



1 +∆p(x)}

with ∆_ξ, ∆_p from Theorems 4.3.1 and 4.3.2 and again x_M=x+ 2 log(|M|). The above results allow to extend all the oracle bounds for probabilistic loss of Chapter 3 with the obvious corrections of the error probability. We will give one example of such an extension in Theorem 4.3.5 further below.

But first we discuss the sense of the required conditions for bootstrap validity. In typical situations, we have δ = max{δ_Ψ, δ2, δ1} ≤ C

p n. One can see that the bootstrap approximation is accurate if the values ∆_ξ and

∆p are small. This requires that the values δ²p, ∥B∥⁴_∞p, and δ²∥B∥ are sufficiently small. It is easy to see that the last term is smaller in order than the others. We have

δ²p=O(p²/n).

Further, the bias component does not damage the bootstrap validity result if ∥B∥⁴_∞p is a small value. If f^∗ is H¨older-smooth with the parameter s, that is, if

∥B∥∞ ≤Cp^−s, (4.3.2)

63 then the bootstrap procedure is justified for s > 1/4 if p = pn → ∞ as n → ∞ and p²log(n)/n goes to zero. We state one asymptotic result of this sort.

Corollary 4.3.4. Assume that δ = max{δ_Ψ, δ₂, δ₁} ≤ C

p

n. Let also p=p_n satisfy p²_nlog(n)/n→0 as n→ ∞, and (4.3.2) hold for s > 1/4. Then the results of Theorem 4.3.1, 4.3.2 and Proposition 4.3.3 apply with a small value ∆_ξ=∆_ξ,n→0 as n→ ∞.

We now give a bootstrap version of Theorem 3.4.1. We have to change the definition of m^∗ slightly for this. We define:

m^∗ ^def= min



m^◦∈M: max

m>m^◦

∥b_m,m^◦∥²−β²(1 +∆p)pm,m^◦

≤0

 . (4.3.3) If we assume to be in a case where ∆_p is small, this means that we slightly change the value of β. For ∆p going to zero for n going to infinity this definition coincides asymptotically with our original definition . We are now ready to give the following probabilistic oracle result.

Theorem 4.3.5. Assume the conditions of Theorem 4.3.1. Given x and β, let the critical values for the SmA-method be given by z_m,m^♭ ◦ from(4.2.3) and let m^∗ ∈M satisfy(4.3.3). Then the SmA estimator ϕ =ϕ

m satisfies the following bound:



 ϕ−ϕ_m^∗

>z_m^♭∗



≤ 11 exp(−x) +∆_ξ(x), (4.3.4) where z_m^♭∗ is defined as

z_m^♭∗ def

= max

m∈M⁺(m^∗)

z_m^♭∗,m.

This implies the probabilistic oracle bound: with probability at least 1− 11 exp(−x)−∆_ξ(x)



 ϕ−ϕ^∗

 ≤

 ϕ_m^∗−ϕ^∗

+z_m^♭∗. (4.3.5) In the next section, we are going to present some simulations for the method.

Im Dokument Resampling-based tuning of ordered model selection (Seite 60-65)