3.7 Additional proofs
3.7.3 Proof of Theorem 3.3
Denote with gh the common density of (Xh, X0)T and g0 the density of X0. We need some intermediate results to prove the theorem.
Lemma 3.7 Assume that condition (D2) holds. Then we have
n−2
Hereζdenotes the Riemann zeta function.
Proof: Recall that by condition (D2) we have|ρ(h)| ≤(h+ 1)−q, h = 0, . . . , n−1for some q >0.
First assumeq ∈ (0,1]. The integral test for series convergence gives lower and upper bounds for the hyperharmonic series as
We need to separate two cases. First let q ∈ (0,1), then it holds from (3.17) and the fact that n−2 ≤n−1 ≤n−q
Forq = 1we evaluate the limit
Finally, the caseq >1is trivial as the zeta-functionζ(q)is defined as the hyperharmonic series
with coefficientq.
The next lemma and the subsequent corollary show that the quantities appearing in the sums of Theorem 3.2 can be linked to the autocorrelation functionρ:
Lemma 3.8 Under the assumptions (K1), (K2) and (D1) it holds forh >0withρh =τ0−1τh
Proof: We will only proof the first inequality, the second one follows in the same way.
By Jensen’s inequality and (K2) we know
∫
The first and third integral term can readily be calculated as
completing the proof by multiplying all integrals withτ0−dτ0d. Corollary 3.2 Under the assumptions (K1), (K2), (D1) and (D2) it holds for all h > 0and q >0
Proof: Recall thatθ(ρ) = 1 +{1−ρ2}−d/2−2d+1{4−ρ2}−d/2forρ∈[0,1). We seek to find bounds onθand the corollary can be proven by an application of Lemma 3.8.
By assumption (D2) we know there is aρ∗ such thatρ2h ≤ρ2∗ <1for allh >0. Thus consider ρ∈[0, ρ∗]. We start by finding a constantC > 0with
θ′(ρ) = ρ{
(1−ρ2)−d/2−1−2d+1(4−ρ2)−d/2−1}
d≤ρ2C.
ThusC can be taken asC =d{
(1−ρ2∗)−d/2−1−2d+1(4−ρ2∗)−d/2−1} .
Thus we know that the slope ofθis always less than that ofCρ2. Finally it holds thatθ(0) = 0 and thus0≤θ(ρ)≤Cρ2,ρ∈[0, ρ∗].
Under condition (D2) it holds{1−ρ2∗}−d/2 ≤ {1−2−2q}−d/2, completing the proof by using
Lemma 3.8.
Proof of the theorem:First note that the the operator norm is dominated by the Hilbert-Schmidt norm. By Markov’s inequality we have forν ∈(0,1]
P(
∥Sn−S∥2HS ≤ν−1E∥Sn−S∥2HS)
≥1−ν, P(
∥Tn∗y−Sf∥2H≤ν−1E∥Tn∗y−Sf∥2H)
≥1−ν.
An application of Theorem 3.2, Corollary 3.2 and Lemma 3.7 completes the proof.
Bibliography
Bauer, F., Pereverzev, S., and Rosasco, L. (2007). On regularization algorithms in learning theory. J. Complexity, 23:52–72.
Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic, Boston.
Blanchard, G. and Kr¨amer, N. (2010a). Kernel partial least squares is universally consistent.
InProceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, volume 9, pages 57–64. JMLR.
Blanchard, G. and Kr¨amer, N. (2010b). Optimal learning rates for kernel conjugate gradient regression. Adv. Neural Inf. Process. Syst., 23:226–234.
Caponnetto, A. and de Vito, E. (2007). Optimal rates for regularized least-squares algorithm.
Found. Comp. Math., 7:331–368.
Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning. Bull. Amer.
Math. Soc., 39:1–49.
De Vito, E., Caponnetto, A., and Rosasco, L. (2006). Discretization error analysis for Tikhonov regularization in learning theory. Anal. Appl., 4:81–99.
de Vito, E., Rosasco, L., Caponnetto, A., de Giovanni, U., and Odone, F. (2005). Learning from examples as an inverse problem. J. Mach. Learn. Res., 6:883–904.
Frank, I. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools.
Technometrics, 35(2):109–135.
Giraitis, L., Hira, L., and Surgailis, D. (2012). Large Sample Inference for Long Memory Processes. Imperial College Press, London, 1 edition.
Hanke, M. (1995). Conjugate Gradient Type Methods for Ill-posed Problems. Wiley, New York, 1 edition.
Helland, I. S. (1988). On the structure of partial least squares regression. Comm. Statist.
Simulation Comput., 17(2):581–607.
Kr¨amer, N. and Braun, M. L. (2007). Kernelizing PLS, degrees of freedom, and efficient model selection. InProceedings of the 24th International Conference on Machine Learning, pages 441–448. ACM.
Lindgren, F., Geladi, P., and Wold, S. (1993). The kernel algorithm for PLS. J. Chemometrics, 7:45–59.
Phatak, A. and de Hoog, F. (2002). Exploiting the connection between PLS, Lanczos meth-ods and conjugate gradients: alternative proofs of some properties of PLS. J. Chemometr., 16:361–367.
Rosipal, R. (2003). Kernel partial least squares for nonlinear regression and discrimination.
Neural Netw. World, 13:291–300.
Rosipal, R., Girolami, M., and Trejo, L. (2000). Kernel PCA for feature extraction of event-related potentials for human signal detection performance. InProceedings of ANNIMAB-1 Conference, pages 321–326. Springer.
Rosipal, R. and Trejo, L. (2001). Kernel partial least squares regression in reproducing kernel Hilbert space. J. Mach. Learn. Res., 2:97–123.
Rosipall, R. and Kr¨amer, N. (2006). Overview and recent advances in partial least squares.
Lecture Notes in Comput. Sci., 3940:34–51.
Samorodnitsky, G. (2007). Long Range Dependence. now Publisher, Hanover, 1 edition.
Saunders, C., Gammerman, A., and Vovk, V. (1998). Ridge regression learning algorithm in dual variables. InProceedings of the 15th International Conference on Machine Learning, pages 515–521. Morgan Kaufmann Publishers.
Sch¨olkopf, B., Herbrich, R., and Smola, A. (2001). A generalized representer theorem. In Computational learning theory, pages 416–426. Springer.
Steinwart, I., Hush, D., and Scovel, C. (2005). An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. Technical report, IEEE Trans. Inform. Theory.
Wahba, G. (1999). Support vector machines, reproducing kernel Hilbert spaces and randomized GACV. InAdvances in Kernel Methods - Support Vector Learning, pages 69–88. MIT Press.
Wold, S., Ruhe, A., Wold, H., and Dunn, I. W. (1984). The collinearity problem in linear regression. the partial least squares (PLS) approach to generalized inverses. SIAM J. Sci.
Comput., 5:735–743.
Name: Marco Singer
Address: Eisenbahnstraße 15, 37073 G¨ottingen.
Email: msinger@gwdg.de
Date of birth: 24.07.1986, Braunschweig, Germany.
Marital status: single
Curriculum vitae
10.2006–09.2009 Study of Mathematics (Bachelor), Leibniz-Universit¨at Hannover 09.2009 Bachelor degree
Thesis name: SQP-Methods for equality constrained nonlinear optimization problems
10.2009–11.2012 Study of Mathematics (Master), Leibniz-Universit¨at Hannover
11.2012 Master degree
Thesis name: Goodness of fit tests for nonlinear time series models 01.2013– PhD studies in Mathematics in the GAUSS program,
Georg-August-Universit¨at G¨ottingen
Thesis name: Partial least squares for serially dependent data.