Local Lipschitz constants - Saturation overshoots in porous media

4.2 Saturation overshoots in porous media

5.1.1 Local Lipschitz constants

keep-ing in mind that one central goal of the nonlinear kernel approximation schemes introduced in Chapter 2 targets asparse function representation.

Moreover, in the case of SVMs, various results on learning rates have been established [163, 188, 164], of which the “oracle inequality”-type results [161, 160] are probably closest to our setting. But besides being valid for only cer-tain types of kernels, the inherent statistical perspective causes those rates to include noise assumptions on the training data, which are not necessary in our deterministic setting.

To this end, while the above approach is an analytical result in its own inter-est, it remains an open question if a-priori error bounds that either are not overly conservative or do not depend on a tight coverage of the considered domain can be devised for practical purposes. Even so, due to the modular-ity of the estimator in Theorem 5.1.1, any a-priori estimation technique can be applied and substituted forE_A. Additionally, whatever estimation forE_A is used, the overall estimator still involves aglobal Lipschitz constantL_K of the kernel, which contributes to the estimation result in an exponential way.

As this might even be the more crucial part in the overall estimation process, we will focus on ways to improve that part of the estimation process in the following.

5.1. ERROR ESTIMATION FOR KERNEL-BASED SYSTEMS 145 is used. But as x^r(t)is known during the reduced simulation, a variant like

fˆ(x(t))−fˆ(x^r(t))

G ≤ L_f_ˆ(x^r(t))||x(t)−x^r(t)||_G

should be possible, where L utilizes information from the reduced system.

As those estimations use local information we call them “local Lipschitz con-stant estimations”. Moreover, assuming that

||e(t)||_G = ||x(t)−x^r(t)||_G < Θ(t)

for some Θ > 0(maybe a coarse upper bound, see Section 5.1.2 for details) could further improve the local Lipschitz constant estimation, as it would mean

x(t) ∈ BΘ(x^r(t)),

which allows for a even more localized Lipschitz constant estimation

fˆ(x(t))−fˆ(x^r(t)) G

≤L_f_ˆ_,Θ(x^r(t))||x(t)−x^r(t)||_G.

Therefore, we propose an approach using local secant gradients involving an a-priori error bound ||e(t)||_G ≤ Θ, which enables efficient computation of those secant gradients utilizing the kernel expansion center locations. The derived theory and results for a-posteriori error estimation of kernel-based dynamical systems have been published in [182] in similar form.

The key to our local Lipschitz constant computations is to use a subclass of the RBF kernels introduced in Section 3.2.1, (1.1.4, p.10), induced by a certain class of scalar functionsφ:

Definition 5.1.2 (Bell functions). A function φ : R⁺ → R⁺ is called a “bell function”, if

i) φ ∈ C²(R⁺),

ii) ||φ||_L∞ ≤ B, B > 0, (5.9a)

iii) φ⁰(0) ≤0, (5.9b)

iv) ∃r₀ > 0 : φ⁰⁰(r)(r −r₀) > 0∀r 6= r₀. (5.9c)

Condition iv) means that r₀ denotes a unique turning point where φ is strictly concave/convex on [0, r₀]/[r₀,∞[, respectively. We will denote the set of all bell functions with B and assume φ ∈ B for the remainder of this section. For example, the Gaussian kernel is induced by a bell function φ(r) = e^−(r/β)². Then we haver₀ = β/√

2and L_K = |φ⁰(r₀)|in the context of Theorem 5.1.1.

We will state results regarding bell functions first and connect these to the multidimensional case later.

Lemma 5.1.3(Basic properties of bell functions). Let φ ∈ B. Then we have

x∈minR+

φ⁰(x) = φ⁰(x₀) (5.10a)

φ⁰(x) ^x→∞−→ 0 (5.10b)

φ⁰(x) < 0∀x ∈ R⁺, (5.10c) where(5.10c)means especially thatφ is strictly monotonously decreasing and takes its maximum value inφ(0).

Proof. Property (5.10a) follows directly from (5.9b) and (5.9c). Especially we haveφ⁰(x₀) < 0. Now we show (5.10b) by contradiction. Assume

∃ > 0, z₀ > 0∀z ≥ z₀ : |φ⁰(z)| ≥ . Then by the mean value theorem we see that

∀n∈ N∃ξn > z0 : φ(z₀ +n)−φ(z₀)

n = φ⁰(ξn).

But this means

|φ(z₀ +n)−φ(z₀)| = n|φ⁰(ξ_n)| ≥ n ^n→∞−→ ∞,

5.1. ERROR ESTIMATION FOR KERNEL-BASED SYSTEMS 147 which contradicts with the boundedness (5.9a) ofφ. Finally, we can see state-ment (5.10c) as follows: Condition (5.9b) and (5.9c) guarantee

φ⁰(x) < 0∀ x ∈]0, x₀].

Condition (5.9c) also says for x > x₀ that φ⁰ is strictly monotonously in-creasing; but this already means

φ⁰(x) < 0∀x > x0,

since if we had a x⁰ with φ⁰(x⁰) = 0this would imply φ⁰(x) > 0∀ x > x⁰ in contradiction with condition (5.10b).

The notion of secant gradients will prove useful for our further investiga-tions.

Definition 5.1.4 (Secant gradient function). For any φ ∈ B, s ∈ R⁺, we define the secant gradient functionS_s : R⁺ −→ Ras

S_s(r) :=







φ(r)−φ(s)

r−s , r 6= s, φ⁰(s), r = s.

(5.11)

Its derivative is given as S_s⁰(r) :=







φ⁰(r)−^{φ(r)−φ(s)}_r−s

r−s = ^φ⁰^(r)−S_r−s^s^(r) r 6= s,

φ⁰⁰(r) r = s.

Well-definedness and continuous differentiability ofS_scan easily be verified by elementary analysis. The following lemma characterizes the minimum secant gradient position.

Lemma 5.1.5 (Minimum secant gradient). Letφ ∈ B, s ∈ R⁺. Then r_s := arg min

r∈R+

S_s(r) (5.12)

exists, is unique and exactly one of the following cases holds:

s < r₀ ≤ r_s, (5.13a)

r_s ≤ r₀ < s, (5.13b)

s= r0 = rs. (5.13c)

Proof. It is clear that φ⁰(r0) ≤ Ss(r) ≤ 0∀ r ∈ R⁺. Using the boundedness ofφ we see that

|S_s(r)| ≤

φ(r)−φ(s) r −s

≤ B +φ(s)

|r −s|

r→∞−→ 0.

AsSsis continuous andSs(r0) < 0∀swe see thatrs < ∞and hence obtain existence of a minimizer. Next we show the inequalities (5.13) and note that the necessary condition for a minimizer is

0 = S_s⁰(r_s) = φ⁰(r_s)−S_s(r_s)

r_s −s ⇔φ⁰(r_s) = S_s(r_s). (5.14) Recall that every bell function has a unique r₀ > 0 as given by property (5.9c, p.146). Then we differentiate the cases

• s < r₀

We continue with proof by contradiction and therefore assume we had r_s < r₀. Now choose a r ∈ R⁺ with max(r_s, s) < r < r₀. Then we have S_s(r) = ^{φ(r)−φ(s)}_r−s = φ⁰(ξ) for some ξ ∈]s, r[ by the mean value theorem. Now as φ is strictly concave on [0, r₀] by the bell function condition (5.9c, p.146) and φ⁰(r) ≤ 0 by (5.9b, p.145) we obtain S_s(r_s) = φ⁰(r_s) > φ⁰(ξ) = S_s(r), which contradicts with the minimizing property of r_s as obviouslyr_s is not a minimizer ofS_s.

• r₀ < s

The caser₀ < sis obtained by an analogous argument using the strict convexity of φ on [r0,∞[ instead of the concavity as in the former case.

5.1. ERROR ESTIMATION FOR KERNEL-BASED SYSTEMS 149

• r₀ = s

This case is clear when considering the limit process for both other cases.

At last, we prove the uniqueness of the maximizers. Assume there are two maximizing r₁, r₂ ∈ R⁺ with r₁ 6= r₂ Without loss of generality say r₁ <

r₂. Consider the case s < r₀. Then we know by inequality (5.13a) that we must have s < r₀ ≤ r₁ < r₂. Now, φ is strictly convex on [r₀,∞[ and in combination with the necessary condition (5.14) we get

S_s(r₁) =φ⁰(r₁) > φ⁰(r₂) = S_s(r₂),

which contradicts the maximizing assumptions on r₁, r₂. The case r₀ < s follows by the same arguments using (5.13b) and concavity. Finally, r₀ = s directly implies r1 = r2 by equation (5.13c), which shows well-definedness of (5.12, p.147) and concludes the proof.

Figure 5.1 shows two examples for minimum secant gradients and the po-sition of r_s with each s < r₀ (left) and s > r₀ (right), using the Gaussian inducing bell function φ(r) = exp(−r²/β²) withβ = 2.

0 1 2 3 4 5 6 7

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1 1.2

r0 φ(r

rm φ(r

m) s

φ(s)

rs φ(r

Bell function demo, s=0.282843, rs=2.058702 φφ(r

s) + φ’(r s)(r

s−s)

0 1 2 3 4 5 6 7

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1 1.2

r0 φ(r

rm φ(r

s φ(s) rs

φ(r_s)

Bell function demo, s=5.391323, rs=0.400439 φφ(r

s) + φ’(r s)(r

s−s)

Figure 5.1: Gaussianφwithβ = 2,r₀ =√

2andr_m =

√ 2

1−e^−1/2 ≈3.5942

The following Lemma establishes a helpful result in order to include the aforementioned a-priori bound Θ.

Lemma 5.1.6(Monotony of secant gradient derivatives for bell functions). Lets ∈ R⁺\{r₀}andS_s be given as in (5.11, p.147). Then

S_s⁰(r)(r −r_s) > 0∀ r ∈ R⁺\{r_s}.

Proof. Choose s ∈ R⁺ and recall S_s⁰(r) = ^φ⁰^(r)−S_r−s^s^(r) from Definition 5.1.4.

First consider the cases < r₀ and differentiate by locations of r:

• r < s < r₀ ≤r_s

We have r −rs < 0, r −s < 0and Ss(r) > φ⁰(r)by concavity.

• r = s < r0 ≤rs

We have

S_s⁰(s) = lim

t→sS_s⁰(t) = lim

t→s

φ⁰(t)−Ss(t) t−s

= lim

t→s

φ⁰⁰(t)−S_s⁰(t)

1 = φ⁰⁰(s)−lim

t→sS_s⁰(t).

This meansS_s⁰(s) = ^φ⁰⁰₂^(s) < 0and asr < r_swe haveS_s⁰(r)(r−r_s) > 0.

• s < r ≤r₀ ≤ r_s

We have r −rs < 0, r −s > 0and Ss(r) < φ⁰(r)by concavity.

• s < r0 ≤ r < rs

Since φ⁰(r₀) < S_s(r₀) and φ⁰(r_s) = S_s(r_s) we must have φ⁰(r) <

S_s(r)∀r ∈ [r₀, r_s[. Otherwise, sinceφ⁰(·)−S_s(·)is continuous, there would exist an r⁰ ∈ [r₀, r_s[with

S_s(r⁰) =φ⁰(r⁰) < φ⁰(r_s) = S_s(r_s),

asr⁰ < r_sandφis strictly convex on[r₀,∞[. But this is a contradiction to the minimal choice ofrs, and together withr −rs < 0, r−s > 0 we obtain S_s⁰(r)(r −r_s) > 0.

• s < r₀ ≤ r_s < r

5.1. ERROR ESTIMATION FOR KERNEL-BASED SYSTEMS 151 Atr_s we have S_s⁰(r_s) = 0and thus

S_s⁰⁰(r_s) = φ⁰⁰(r_s)−2S_s⁰(r_s)

r_s−s = φ⁰⁰(r_s) r_s−s > 0,

as φ is strictly convex for r > r₀. So there exists an > 0 with S_s⁰(r_s + ) > 0. In fact for any point r > r₀ with S_s⁰(r) = 0 we have S_s⁰⁰(r) > 0, meaning all of them are local minima. But as S_s⁰ is continuous there cannot be two consecutive local minima without a local maximum. Hence, S_s⁰(r) cannot equal zero forr > r_s and since S_s⁰(rs +) > 0we must have S_s⁰(r) > 0∀ r > rs.

Similar to the previous proofs, the caser₀ < sis obtained using an analogous argumentation with concavity and convexity interchanged.

Now, further assume to restrict admissible r values to r ∈ B_Θ(s) ∩ R⁺ for a Θ > 0, where BΘ(s) denotes an open ball of radius Θ around s and B_Θ(s) its closure. With this condition we can state a result regarding local Lipschitz constant computations.

Proposition 5.1.7 (Local Lipschitz estimation using secant gradients). Let Θ > 0, s ∈ R⁺, φ ∈ B andΓ := B_Θ(s)∩ R⁺. Then

|φ(r)−φ(s)| ≤ |S_s(r_Θ,s)||r−s|, ∀r ∈ Γ, with

r_Θ,s =







s+ sign (r_s−s) Θ, r_s ∈/ Γ,

r_s, r_s ∈ Γ.

(5.15)

Proof. Assume s 6= r₀. Then Lemma 5.1.6 implies that S_s is monotonically decreasing on[0, r_s]and increasing on [r_s,∞[. Forr_s ∈/ Γ, asS_s is negative,

|S_s| takes its maximum value at the border of Γ that is closest to r_s. Since arg min_r∈_R₊ Ss(r) = arg max_r∈_R₊|S_s(r)|, casers ∈ Γ follows by definition.

The case s = r₀ implies r_s = r₀ and hence case two, as of course r₀ ∈ Γ =

B_Θ(r₀)andΘ > 0. We finally conclude|φ(x)−φ(y)| ≤ ||S_s||_L∞(Γ)|r−s| =

|S_s(r_Θ,s)||r−s|.

Next, we generalize these results to the multidimensional kernel setting.

Corollary 5.1.8 (Local Lipschitz constants for bell function kernels). Let the conditions of Proposition 5.1.7 hold and choosey,z ∈ Γ. Further assume to have given an RBF KernelK induced by φ and sets = ||y −z||_G. Then we have

|K(x,z)−K(y,z)| ≤ |S_s(r_Θ,s)| ||x−y||_G ∀ x ∈ B_Θ(y).

Proof. Fix y,z ∈ Γ, Θ > 0, let R :=

x ∈ Γ

||x−z||_G ∈ BΘ(s) . Then using Proposition 5.1.7 we estimate

|K(x,z)−K(y,z)| = |φ(||x−z||_G)−φ(||y−z||_G)|

≤ |S_s(r_Θ,s)|

||x−z||_G − ||y−z||_G

≤ |S_s(rΘ,s)| ||x−y||_G ∀x ∈ R.

The fact thatBΘ(y) ⊂R finishes the proof.

Finally, those results can be used to state an improved version of the error estimator derived in Theorem 5.1.1. As the above results hold true at any location for any bound Θ, we will assume the a-priori error bound to be time-dependent, i.e.

||e(t)||_G ≤ Θ(t)∀ t∈ [0, T]. (5.16) If no further knowledge is available, Θ(t) ≡ ∞is a valid option, in whose case we obtainr_Θ,s ≡ r_sin the context if Proposition 5.1.7. We also introduce the notation

d_i(t) := ||x^r(t)−x_i||_G, i = 1. . . N, (5.17) for the distance of the reduced statex^r(t)to each expansion centerxiduring the reduced simulation. In fact, the coarse error bound means nothing but

5.1. ERROR ESTIMATION FOR KERNEL-BASED SYSTEMS 153 x(t) ∈ B_Θ(t)(x^r(t)) ∀ t ∈ [0, T]. Consequently, for any t ∈ [0, T], i ∈ {1. . . N} we identify x,y,z ∈ R^d and Θ > 0 from Corollary 5.1.8 with x(t),x^r(t)of the full/reduced solution, the kernel expansion centersx_i and Θ(t), respectively. This allows to obtain local Lipschitz constant estimations at each time tusing the current reduced simulation’s statex^r(t)as “center”

of locality:

Theorem 5.1.9(LocalSecant gradientLipschitz errorEstimator (LSLE)). Let the error system be given as in (5.2, p.139), where the kernelK of the expansion (5.8, p.142) is induced by a bell functionφ. Further, letΘ(t)be an a-priori error bound. Then the state space error is bounded via

||e(t)||_G ≤∆^Θ_LSLE(t)∀ t∈ [0, T], with

∆^Θ_LSLE(t) :=

α(s)e

β(r)dr

ds+e

β(s)ds

E₀, α(t) := ||E_A(x(t))||_G +

I −V W^Tfˆ(V z(t)) G, β(t) :=

i=1

||c_i||_G

S_d_i_(t)(r_Θ(t),d_i_(t)) .

Proof. The α term is derived exactly as in the proof of Theorem 5.1.1. Next, using the kernel expansion and Corollary 5.1.8 yields

fˆ(x(t))−fˆ(x^r(t)) G ≤

i=1

||c_i||_G(K(x(t),x_i)−K(x^r(t),x_i))

≤

i=1

||c_i||_G

S_d_i_(t)(r_Θ(t),d_i_(t))

||e(t)||_G

= β(t)||e(t)||_G.

Application of Lemma 5.0.1 for the ODE∆⁰(t) = β(t)∆(t) + α(t), ∆(0) = E₀ yields the result.

Im Dokument Model reduction for nonlinear systems : kernel methods and error estimation (Seite 150-160)