Proof of Theorem 3.3 - Additional proofs - Partial Least Squares for Serially Dependent Data

3.7 Additional proofs

3.7.3 Proof of Theorem 3.3

Denote with g_h the common density of (X_h, X₀)^T and g₀ the density of X₀. We need some intermediate results to prove the theorem.

Lemma 3.7 Assume that condition (D2) holds. Then we have

n⁻²

Hereζdenotes the Riemann zeta function.

Proof: Recall that by condition (D2) we have|ρ(h)| ≤(h+ 1)⁻^q, h = 0, . . . , n−1for some q >0.

First assumeq ∈ (0,1]. The integral test for series convergence gives lower and upper bounds for the hyperharmonic series as

We need to separate two cases. First let q ∈ (0,1), then it holds from (3.17) and the fact that n⁻² ≤n⁻¹ ≤n⁻^q

Forq = 1we evaluate the limit

Finally, the caseq >1is trivial as the zeta-functionζ(q)is defined as the hyperharmonic series

with coefficientq.

The next lemma and the subsequent corollary show that the quantities appearing in the sums of Theorem 3.2 can be linked to the autocorrelation functionρ:

Lemma 3.8 Under the assumptions (K1), (K2) and (D1) it holds forh >0withρ_h =τ₀⁻¹τ_h

Proof: We will only proof the first inequality, the second one follows in the same way.

By Jensen’s inequality and (K2) we know

∫

The first and third integral term can readily be calculated as

completing the proof by multiplying all integrals withτ₀⁻^dτ₀^d. Corollary 3.2 Under the assumptions (K1), (K2), (D1) and (D2) it holds for all h > 0and q >0

Proof: Recall thatθ(ρ) = 1 +{1−ρ²}⁻^d/2−2^d+1{4−ρ²}⁻^d/2forρ∈[0,1). We seek to find bounds onθand the corollary can be proven by an application of Lemma 3.8.

By assumption (D2) we know there is aρ_∗ such thatρ²_h ≤ρ²_∗ <1for allh >0. Thus consider ρ∈[0, ρ_∗]. We start by finding a constantC > 0with

θ^′(ρ) = ρ{

(1−ρ²)⁻^d/2⁻¹−2^d+1(4−ρ²)⁻^d/2⁻¹}

d≤ρ2C.

ThusC can be taken asC =d{

(1−ρ²_∗)⁻^d/2⁻¹−2^d+1(4−ρ²_∗)⁻^d/2⁻¹} .

Thus we know that the slope ofθis always less than that ofCρ². Finally it holds thatθ(0) = 0 and thus0≤θ(ρ)≤Cρ²,ρ∈[0, ρ_∗].

Under condition (D2) it holds{1−ρ²_∗}⁻^d/2 ≤ {1−2⁻^2q}⁻^d/2, completing the proof by using

Lemma 3.8.

Proof of the theorem:First note that the the operator norm is dominated by the Hilbert-Schmidt norm. By Markov’s inequality we have forν ∈(0,1]

∥S_n−S∥²_HS ≤ν⁻¹E∥S_n−S∥²_HS)

≥1−ν, P(

∥T_n^∗y−Sf∥²_H≤ν⁻¹E∥T_n^∗y−Sf∥²_H)

≥1−ν.

An application of Theorem 3.2, Corollary 3.2 and Lemma 3.7 completes the proof.

Bibliography

Bauer, F., Pereverzev, S., and Rosasco, L. (2007). On regularization algorithms in learning theory. J. Complexity, 23:52–72.

Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic, Boston.

Blanchard, G. and Kr¨amer, N. (2010a). Kernel partial least squares is universally consistent.

InProceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, volume 9, pages 57–64. JMLR.

Blanchard, G. and Kr¨amer, N. (2010b). Optimal learning rates for kernel conjugate gradient regression. Adv. Neural Inf. Process. Syst., 23:226–234.

Caponnetto, A. and de Vito, E. (2007). Optimal rates for regularized least-squares algorithm.

Found. Comp. Math., 7:331–368.

Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning. Bull. Amer.

Math. Soc., 39:1–49.

De Vito, E., Caponnetto, A., and Rosasco, L. (2006). Discretization error analysis for Tikhonov regularization in learning theory. Anal. Appl., 4:81–99.

de Vito, E., Rosasco, L., Caponnetto, A., de Giovanni, U., and Odone, F. (2005). Learning from examples as an inverse problem. J. Mach. Learn. Res., 6:883–904.

Frank, I. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools.

Technometrics, 35(2):109–135.

Giraitis, L., Hira, L., and Surgailis, D. (2012). Large Sample Inference for Long Memory Processes. Imperial College Press, London, 1 edition.

Hanke, M. (1995). Conjugate Gradient Type Methods for Ill-posed Problems. Wiley, New York, 1 edition.

Helland, I. S. (1988). On the structure of partial least squares regression. Comm. Statist.

Simulation Comput., 17(2):581–607.

Kr¨amer, N. and Braun, M. L. (2007). Kernelizing PLS, degrees of freedom, and efficient model selection. InProceedings of the 24th International Conference on Machine Learning, pages 441–448. ACM.

Lindgren, F., Geladi, P., and Wold, S. (1993). The kernel algorithm for PLS. J. Chemometrics, 7:45–59.

Phatak, A. and de Hoog, F. (2002). Exploiting the connection between PLS, Lanczos meth-ods and conjugate gradients: alternative proofs of some properties of PLS. J. Chemometr., 16:361–367.

Rosipal, R. (2003). Kernel partial least squares for nonlinear regression and discrimination.

Neural Netw. World, 13:291–300.

Rosipal, R., Girolami, M., and Trejo, L. (2000). Kernel PCA for feature extraction of event-related potentials for human signal detection performance. InProceedings of ANNIMAB-1 Conference, pages 321–326. Springer.

Rosipal, R. and Trejo, L. (2001). Kernel partial least squares regression in reproducing kernel Hilbert space. J. Mach. Learn. Res., 2:97–123.

Rosipall, R. and Kr¨amer, N. (2006). Overview and recent advances in partial least squares.

Lecture Notes in Comput. Sci., 3940:34–51.

Samorodnitsky, G. (2007). Long Range Dependence. now Publisher, Hanover, 1 edition.

Saunders, C., Gammerman, A., and Vovk, V. (1998). Ridge regression learning algorithm in dual variables. InProceedings of the 15th International Conference on Machine Learning, pages 515–521. Morgan Kaufmann Publishers.

Sch¨olkopf, B., Herbrich, R., and Smola, A. (2001). A generalized representer theorem. In Computational learning theory, pages 416–426. Springer.

Steinwart, I., Hush, D., and Scovel, C. (2005). An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. Technical report, IEEE Trans. Inform. Theory.

Wahba, G. (1999). Support vector machines, reproducing kernel Hilbert spaces and randomized GACV. InAdvances in Kernel Methods - Support Vector Learning, pages 69–88. MIT Press.

Wold, S., Ruhe, A., Wold, H., and Dunn, I. W. (1984). The collinearity problem in linear regression. the partial least squares (PLS) approach to generalized inverses. SIAM J. Sci.

Comput., 5:735–743.

Name: Marco Singer

Address: Eisenbahnstraße 15, 37073 G¨ottingen.

Email: msinger@gwdg.de

Date of birth: 24.07.1986, Braunschweig, Germany.

Marital status: single

Curriculum vitae

10.2006–09.2009 Study of Mathematics (Bachelor), Leibniz-Universit¨at Hannover 09.2009 Bachelor degree

Thesis name: SQP-Methods for equality constrained nonlinear optimization problems

10.2009–11.2012 Study of Mathematics (Master), Leibniz-Universit¨at Hannover

11.2012 Master degree

Thesis name: Goodness of fit tests for nonlinear time series models 01.2013– PhD studies in Mathematics in the GAUSS program,

Georg-August-Universit¨at G¨ottingen

Thesis name: Partial least squares for serially dependent data.

Im Dokument Partial Least Squares for Serially Dependent Data (Seite 111-120)