• Keine Ergebnisse gefunden

0 5 10 15 20 25 30 35 10−10

10−5 100

105 relative absolute bound for bounded kernel functions

eigenvalue index

approximation error r = 10

r = 20 r = 30 r = 50 approx. error

(a) first 35 eigenvalues

0 50 100 150

10−15 10−10 10−5 100

105 relative absolute bound for bounded kernel functions

eigenvalue index

approximation error r = 10

r = 20 r = 30 r = 50 approx. error

(b) first 150 eigenvalues

Figure 3.6: Approximation error and the relative-absolute bound for different truncation errors for the example with bounded kernel function. In this plot, the general bound from Theorem 3.135 for bounded kernel functions is used. The sample size was n = 1000 and the eigenvalues were λi= exp(−i/5)/Z, whereZ is a normalization constant such that the eigenvalues sum to 1.

derivation, but that there actually exist kernels whose eigenvalues converge with the slower rate of√

λr.

We also again see that the error stagnates at around 10−18. This is again the effect of finite-precision arithmetics. Furthermore, we see that with increasingr, the bound becomes larger for smalli. This effect is more prominent for the general bound, because the relative-error term scales much faster withr(r2 as opposed to√

r).

In summary, we can say that the relative-absolute bound reflects the fact quite well that the estimation error for smaller eigenvalues is much smaller than that of large eigenvalues. We have also seen that these tight bounds can only be obtained by using the whole family of bounds (using all bounds for 1≤r≤n). If for some reason one wants to use just one bound, we suggest either settingr=i, or fixing iat a level such that the absolute error term is small enough for whatever application one has in mind. Usingr=ishows that the error can be bounded roughly byλiC+E, for the case of bounded eigenfunctions, and√

λiC+E, for bounded kernel functions, whereE is governed by the sum of all eigenvalues which are smaller thanλi.

3.13. Discussion 57

0 5 10 15 20 25 30 35

10−8 10−6 10−4 10−2 100

102 relative absolute bound for bounded kernel functions

eigenvalue index

approximation error r = 10

r = 20 r = 30 r = 50 approx. error

(a) first 35 eigenvalues

0 50 100 150

10−15 10−10 10−5 100

105 relative absolute bound for bounded kernel functions

eigenvalue index

approximation error r = 10

r = 20 r = 30 r = 50 approx. error

(b) first 150 eigenvalues

Figure 3.7: The same plot as Figure 3.7, but here, the more special estimate (3.192) is used for the relative error term, resulting in a better estimate of the approximation error.

Compared to the existing approaches, the current approach is perhaps most similar to the one taken by Koltchinskii and Gin´e (2000). There, convergence is also considered in a finite-dimensional setting. However, the approach taken in these works uses the Hoffman-Wielandt inequality, which measures the error on an absolute scale. Moreover, that work aims at proving asymptotic results in the form of laws of large numbers and central limit theorems, whereas in this work, we aim at finite sample size confidence bounds.

As already stated before, a significant feature of the approximation error bounds developed in this chapter is that they scale with the magnitude of the true eigenvalue. All current finite sample size bounds treat the error on a fixed scale, such that the error estimate is governed by the largest error, which is usually given by the largest eigenvalues. Therefore, the new bounds provide a much more accurate picture of the approximation errors.

Finitely, note that the different approaches lead to different measures of the approximation error. The approach based on the variational characterization of the eigenvalues naturally leads to differences in sums of eigenvalues. The approach based on functional analysis leads to the distance of a perturbed eigenvalue from the whole true spectrum. In the approach by Koltchinskii and Gin´e (2000), the distance is measured with respect to a 2-norm between the sorted eigenvalues. Finally, our approach considers the maximum over the pair-wise errors of the first n approximate and true eigenvalues. Which kind of distance measure should be preferred certainly depends on the application, although the measure resulting from the functional analytic approach is rather weak.

The approximation bound depends on the norm of two error matrices which have been studied under fairly general assumptions in this chapter. For more specific settings, it might be possible to derive much more accurate error estimates, leading to improved bounds. Thus, the results as presented here do not form the conclusion to this approach, but there rather exists a well-defined interface for adopting these theoretical results to new applications: all that is required is providing accurate estimates of the error matrices.

We would like to restate the fact that the absolute factor is not merely an artifact, but rather reflects the structure of eigenvalues computed with finite precision arithmetics accurately. Since every matrix is already stored with a small perturbation, the eigenvalues have already been affected by a small additive perturbation, and a fully relative approximation error bound is not possible.

Obviously, most interesting are cases where the eigenvalues decay quickly. Otherwise, the approx-imation errors will not vary much between individual eigenvalues and there is no need to treat individual eigenvalues differently. For rapidly decaying eigenvalues, however, the truncation error

term also decays quickly and the truncation point r can be chosen such that the absolute error term becomes small.

Generally, it seems that the bounds are more interesting for fixed, or at least bounded r. In principle, it is possible to increase r depending on n, but depending on the type of kernel, the increase inrcan be rather slow and the resulting error bound large. On the other hand, for fixed r, the relative error term in general decays asO(n12), leading to a typical stochastic convergence speed. Also note that there is no single rsuch that the bound is minimal for all eigenvalues at once. For smallr, the bound tends to be smaller for large eigenvalues, where for larger, the bound becomes better for smaller eigenvalues.

Finally, it should also be stressed that the bounds are strong enough to prove that convergence of the eigenvalues takes place uniformly over all approximate eigenvalues.

3.13.2 Kernels with Bounded Eigenfunctions

The first class we have studied was that of a Mercer kernel whose eigenfunctions are uniformly bounded. An example was given by a kernel constructed using a sine basis. Sometimes, such a kernel is also referred to as a Fourier kernel.

For this setting, we have been able to derive a rather accurate finite sample size bound. In particular, it is possible to bound the truncation error E(r, n) in a deterministic fashion. The relative error termC(r, n) scales rather moderately asr√

logrwithr(see Section 3.11.1). In this case, it also turns out that assuming the best rate for adjustingrwith respect ton,E(r, n) decays quickly, depending on the rate of decay of the eigenvalues, for both the case of polynomial and exponential decay. Finally, ifris allowed to grow asn→ ∞, one obtains a bound which vanishes asymptotically. Its speed depends on the rate of decay of the eigenvalues. In the worst case, for polynomial eigenvalues which decay as O(i−2), this rate is O(n14

logn) while in the best case, the rate isO(n12(logn)32) which is only slightly slower than the (non-relative) absolute bounds.

All of these observations are also nicely reflected by the numerical simulations (see Figure 3.4), where the bounds decay as expected and provide good upper bounds on the approximation error.

It should be stressed that in these cases, using an absolute error estimate would lead to an overestimation of the error for smaller eigenvalues by several order of magnitudes, although the absolute error bounds are asymptotically faster.

3.13.3 Bounded Kernel Functions

The second class of kernel functions were those which are uniformly bounded. This class includes the important radial basis function kernels (rbf-kernels). In this case, the eigenfunctions can in principle grow unboundedly as the eigenvalues become smaller, leading to considerably larger error estimates.

Still, as discussed in Section 3.11.2, the truncation error tends to 0 as r→ ∞, and the rate is slightly slower then in the case of bounded eigenfunctions. More importantly, the relative error term also depends on the eigenvalues themselves, and scales with the factor 1/√

λr. Therefore, having smaller eigenvalues can lead to a much larger relative error term (which will nevertheless ultimately decay to zero). This is also reflected in the achievable asymptotic rate, which is slower than in the case of bounded eigenfunctions.

In Section 3.12.2, we have discussed a concrete example, which has shown that there exist settings such that the relative error term actually scales as predicted withr. We conclude that in this case, convergence is actually much slower than in the case of bounded eigenfunctions.

The numerical simulations nevertheless reveal that the general structure of the bound matches the observed behavior, although the bounds should best be used for finiter, as explained above.