• Keine Ergebnisse gefunden

3.7 Additional proofs

3.7.3 Proof of Theorem 3.3

Denote with gh the common density of (Xh, X0)T and g0 the density of X0. We need some intermediate results to prove the theorem.

Lemma 3.7 Assume that condition (D2) holds. Then we have

n2

Hereζdenotes the Riemann zeta function.

Proof: Recall that by condition (D2) we have|ρ(h)| ≤(h+ 1)q, h = 0, . . . , n1for some q >0.

First assumeq (0,1]. The integral test for series convergence gives lower and upper bounds for the hyperharmonic series as

We need to separate two cases. First let q (0,1), then it holds from (3.17) and the fact that n2 ≤n1 ≤nq

Forq = 1we evaluate the limit

Finally, the caseq >1is trivial as the zeta-functionζ(q)is defined as the hyperharmonic series

with coefficientq.

The next lemma and the subsequent corollary show that the quantities appearing in the sums of Theorem 3.2 can be linked to the autocorrelation functionρ:

Lemma 3.8 Under the assumptions (K1), (K2) and (D1) it holds forh >0withρh =τ01τh

Proof: We will only proof the first inequality, the second one follows in the same way.

By Jensen’s inequality and (K2) we know

The first and third integral term can readily be calculated as

completing the proof by multiplying all integrals withτ0dτ0d. Corollary 3.2 Under the assumptions (K1), (K2), (D1) and (D2) it holds for all h > 0and q >0

Proof: Recall thatθ(ρ) = 1 +{1−ρ2}d/22d+1{4−ρ2}d/2forρ∈[0,1). We seek to find bounds onθand the corollary can be proven by an application of Lemma 3.8.

By assumption (D2) we know there is aρ such thatρ2h ≤ρ2 <1for allh >0. Thus consider ρ∈[0, ρ]. We start by finding a constantC > 0with

θ(ρ) = ρ{

(1−ρ2)d/212d+1(4−ρ2)d/21}

d≤ρ2C.

ThusC can be taken asC =d{

(1−ρ2)d/212d+1(4−ρ2)d/21} .

Thus we know that the slope ofθis always less than that of2. Finally it holds thatθ(0) = 0 and thus0≤θ(ρ)≤Cρ2,ρ∈[0, ρ].

Under condition (D2) it holds{1−ρ2}d/2 ≤ {122q}d/2, completing the proof by using

Lemma 3.8.

Proof of the theorem:First note that the the operator norm is dominated by the Hilbert-Schmidt norm. By Markov’s inequality we have forν (0,1]

P(

∥Sn−S∥2HS ≤ν1E∥Sn−S∥2HS)

1−ν, P(

∥Tny−Sf∥2H≤ν1E∥Tny−Sf∥2H)

1−ν.

An application of Theorem 3.2, Corollary 3.2 and Lemma 3.7 completes the proof.

Bibliography

Bauer, F., Pereverzev, S., and Rosasco, L. (2007). On regularization algorithms in learning theory. J. Complexity, 23:52–72.

Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic, Boston.

Blanchard, G. and Kr¨amer, N. (2010a). Kernel partial least squares is universally consistent.

InProceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, volume 9, pages 57–64. JMLR.

Blanchard, G. and Kr¨amer, N. (2010b). Optimal learning rates for kernel conjugate gradient regression. Adv. Neural Inf. Process. Syst., 23:226–234.

Caponnetto, A. and de Vito, E. (2007). Optimal rates for regularized least-squares algorithm.

Found. Comp. Math., 7:331–368.

Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning. Bull. Amer.

Math. Soc., 39:1–49.

De Vito, E., Caponnetto, A., and Rosasco, L. (2006). Discretization error analysis for Tikhonov regularization in learning theory. Anal. Appl., 4:81–99.

de Vito, E., Rosasco, L., Caponnetto, A., de Giovanni, U., and Odone, F. (2005). Learning from examples as an inverse problem. J. Mach. Learn. Res., 6:883–904.

Frank, I. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools.

Technometrics, 35(2):109–135.

Giraitis, L., Hira, L., and Surgailis, D. (2012). Large Sample Inference for Long Memory Processes. Imperial College Press, London, 1 edition.

Hanke, M. (1995). Conjugate Gradient Type Methods for Ill-posed Problems. Wiley, New York, 1 edition.

Helland, I. S. (1988). On the structure of partial least squares regression. Comm. Statist.

Simulation Comput., 17(2):581–607.

Kr¨amer, N. and Braun, M. L. (2007). Kernelizing PLS, degrees of freedom, and efficient model selection. InProceedings of the 24th International Conference on Machine Learning, pages 441–448. ACM.

Lindgren, F., Geladi, P., and Wold, S. (1993). The kernel algorithm for PLS. J. Chemometrics, 7:45–59.

Phatak, A. and de Hoog, F. (2002). Exploiting the connection between PLS, Lanczos meth-ods and conjugate gradients: alternative proofs of some properties of PLS. J. Chemometr., 16:361–367.

Rosipal, R. (2003). Kernel partial least squares for nonlinear regression and discrimination.

Neural Netw. World, 13:291–300.

Rosipal, R., Girolami, M., and Trejo, L. (2000). Kernel PCA for feature extraction of event-related potentials for human signal detection performance. InProceedings of ANNIMAB-1 Conference, pages 321–326. Springer.

Rosipal, R. and Trejo, L. (2001). Kernel partial least squares regression in reproducing kernel Hilbert space. J. Mach. Learn. Res., 2:97–123.

Rosipall, R. and Kr¨amer, N. (2006). Overview and recent advances in partial least squares.

Lecture Notes in Comput. Sci., 3940:34–51.

Samorodnitsky, G. (2007). Long Range Dependence. now Publisher, Hanover, 1 edition.

Saunders, C., Gammerman, A., and Vovk, V. (1998). Ridge regression learning algorithm in dual variables. InProceedings of the 15th International Conference on Machine Learning, pages 515–521. Morgan Kaufmann Publishers.

Sch¨olkopf, B., Herbrich, R., and Smola, A. (2001). A generalized representer theorem. In Computational learning theory, pages 416–426. Springer.

Steinwart, I., Hush, D., and Scovel, C. (2005). An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. Technical report, IEEE Trans. Inform. Theory.

Wahba, G. (1999). Support vector machines, reproducing kernel Hilbert spaces and randomized GACV. InAdvances in Kernel Methods - Support Vector Learning, pages 69–88. MIT Press.

Wold, S., Ruhe, A., Wold, H., and Dunn, I. W. (1984). The collinearity problem in linear regression. the partial least squares (PLS) approach to generalized inverses. SIAM J. Sci.

Comput., 5:735–743.

Name: Marco Singer

Address: Eisenbahnstraße 15, 37073 G¨ottingen.

Email: msinger@gwdg.de

Date of birth: 24.07.1986, Braunschweig, Germany.

Marital status: single

Curriculum vitae

10.2006–09.2009 Study of Mathematics (Bachelor), Leibniz-Universit¨at Hannover 09.2009 Bachelor degree

Thesis name: SQP-Methods for equality constrained nonlinear optimization problems

10.2009–11.2012 Study of Mathematics (Master), Leibniz-Universit¨at Hannover

11.2012 Master degree

Thesis name: Goodness of fit tests for nonlinear time series models 01.2013– PhD studies in Mathematics in the GAUSS program,

Georg-August-Universit¨at G¨ottingen

Thesis name: Partial least squares for serially dependent data.