• Keine Ergebnisse gefunden

Optimization methods for mathematical models for geophysical flows

N/A
N/A
Protected

Academic year: 2022

Aktie "Optimization methods for mathematical models for geophysical flows"

Copied!
115
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

geophysical flows

Von der Fakult¨ at f¨ ur Mathematik, Informatik und Naturwissenschaften der RWTH Aachen University zur Erlangung des akademischen Grades einer Doktorin der

Naturwissenschaften genehmigte Dissertation

vorgelegt von

Dipl.-Math. Kateryna Graf

aus

Cherson, der Ukraine

Berichter:

Prof. Dr. Michael Herty Prof. Dr. Martin B¨ ucker

Tag der m¨ undlichen Pr¨ ufung: 25.11.2020

Diese Dissertation ist auf den Internetseiten der Universit¨atsbibliothek verf¨ugbar.

(2)
(3)

Contents

List of Figures v

List of Tables vii

Nomenclature viii

Introduction 3

1 Methods for model identification and parameter estimation 7

1.0 Introduction to basic probability theory . . . 8

1.1 Methods for identifying models . . . 12

1.1.1 Akaike criterion and related problems . . . 12

1.1.1.1 Akaike information criterion (AIC) . . . 12

1.1.1.2 Related criteria to AIC . . . 14

1.1.1.3 Akaike weights design criterion (AWDC) . . . 16

1.1.2 Method for model identification and parameter estimation (MIPE) . 19 1.1.3 Combined method of MIPE and AWDC . . . 22

1.1.3.1 Description of the method . . . 22

1.1.3.2 Test example . . . 24

1.1.3.3 Numerical illustration . . . 25

1.2 Optimal experimental design (OED) . . . 28

1.3 Space mapping (SM) . . . 32

1.3.1 Aggressive space mapping (ASM) . . . 35

1.3.2 Trust region aggressive space mapping (TRASM) . . . 35

1.3.3 Linear example of ASM and TRASM . . . 36

1.3.4 Comparing of ASM and TRASM . . . 38

1.3.5 Explanation of the used methods . . . 39

2 Mathematical modeling and numerical simulation 41 2.1 Mathematical modeling of the geothermal reservoir . . . 41

2.1.1 Continuity and incompressible Navier-Stokes equations . . . 41

2.1.2 Flow in porous media and Darcy’s law . . . 42

2.1.3 Oberbeck-Boussinesq flows . . . 43

2.1.4 Heat transport . . . 44

2.1.5 Species transport . . . 44

2.2 Numerical simulations . . . 46

2.2.1 Finite difference approximation for Boussinesq flows . . . 46

2.2.2 Finite volume methods for advection dominated transport problems 46 2.2.2.1 Stability and converence results . . . 48

2.2.2.2 Upwind Differencing Scheme . . . 49

2.2.3 Heat and species transport . . . 50 iii

(4)

2.2.3.1 Finite differences and upwind-discretization. . . 50

2.2.3.2 Il’in flux blending scheme . . . 50

2.2.3.3 Smolarkiewicz advection scheme . . . 51

2.2.4 Time discretization . . . 51

2.2.4.1 Approximation of general ordinary differential equations . 53 2.2.4.2 Runge-Kutta methods. . . 56

3 Applications 63 3.1 Introduction . . . 63

3.1.1 Intoduction in geothermal systems . . . 63

3.1.2 SHEMAT . . . 65

3.1.2.1 Physical properties in SHEMAT. . . 66

3.1.2.2 Stationary flows . . . 68

3.2 Optimal experimental design . . . 70

3.3 Case studies . . . 72

3.3.1 Reservoir in Australia . . . 72

3.3.1.1 OED on two dimensional cases . . . 73

3.3.1.2 Space mapping . . . 78

3.3.2 Reservoir in Tuscany Italy . . . 80

3.3.2.1 Optimal experimental design . . . 83

Summary and outlook 91

Acknowledgements 93

Bibliography 95

(5)

List of Figures

1.1 Approach of combining MIPE and AWDC. . . 22

1.2 Influence of the Fisher’s matrix and distanceγ from parametershandtstep respectively. . . 26

1.3 D, A, T and E- optimal designs calculated for the Example 1.2.1. Here σ1 ∈ [0.01,1] and σ2 ∈ [0.01,1]. (All results are presented on (−4,−4) point of view, only ΨAwas taken on (0,0) point of view.) . . . 30

1.4 Illustration of the original SM optimization algorithm; (a) a new point xmfj+1 is obtained using the current mapping approximation p(j)SM, (b) the point xmf j+1 does not satisfy the stopping criterion and the setsSfj and Scj are augmented byxmf j+1andxmcj+1, respectively; (c) a new mappingp(j+1)SM is estimated and is used to obtain a new iterate xmf j+1. . . 34

1.5 The result of Example 1.3.1 for the (a) ASM and (b) TRASM methods. . . 36

1.6 The result of Example 1.3.2 for the (a) ASM and (b) TRASM methods. . . 37

1.7 The result of Example 1.3.3 for the (a) ASM and (b) TRASM methods. . . 37

1.8 Compearing of ASM (red line) and TRASM (blue dashed line) methods for Example 1.3.3. . . 38

2.1 Illustration of finite volume methods [86]. . . 47

2.2 Schematic illustration of the CFL-condition: The CFL-condition is satisfied in the first panel and violated in the second one [86]. . . 49

2.3 Stability domains, gray shaded, for Euler (a) and trapezoidal (b) methods. . 60

3.1 Overview of different types of geothermal systems: shallow (№1), mid (№2 -underground thermal storage) and deep systems (№3 and№4 - hydrother- mal and petrothermal) [90] . . . 65

3.2 Structure of Earth . . . 65

3.3 Node numbering convention used in SHEMAT[23] . . . 69

3.4 Geological model of the study area in Australia. . . 73

3.5 There are six spatial layers identifiable by different colors, each character- ized by a constant hydraulic permeability κ1, κ2, . . . , κ6. The second layer is the thin blue layer taking a diagonal course in the range fromx= 5.8·103 tox= 7.3·103. . . 73

3.6 Temperature distribution of model showed in Figure 3.5. High temperatures are given in red while lower temperatures are colored blue. . . 74

3.7 A borehole of a certain depth at a fixed location, which gives us the tem- perature measurements value for the undeground system. . . 74

3.8 Result of computation of A, D, E-optimal design criteria for the model showed in Figure 3.5. In each Figures 3.8a, 3.8b and 3.8c, the vertical axis (or theY-dimension) is the corresponding value of Ψi∈{A, E D}, and the horizontal axis (or the X-dimension) is the position on the model. . . 75

v

(6)

3.9 The optimal borehole locationx= 6875 is computed for a fixedκ2 = 2·10−12. 75 3.10 Values of the OED criterion ΨD(θ, x) plotted versus the horizontal position

x for three different values ofκ2, representing the permeability of rock unit

2 in Figure 3.4. . . 76

3.11 Result of computation different optimal criteria. . . 77

3.12 Model with six units . . . 77

3.13 Model with four units . . . 78

3.14 Forward simulation to get the difference between the temperature of the model from Figure 3.12 and Figure 3.13. . . 78

3.15 Geological model of the study area in Italy. . . 80

3.16 Arrows show the direction and magnitude of Darcy flow, and colors corre- spond to the temperature at 1.25 km below sea level. Black lines indicate the borders between different rock units. . . 82

3.17 Sub-model of the reservoir model shown in Figure 3.15, used for the op- timal design study. Color denotes the permeability of the starting model. The hatched area indicates the Burano unit for which the permeability has to be estimated, based on temperature observations in a borehole with a maximum depth of 1 km bsl. Yellow lines indicate existing boreholes in the real field. . . 83

3.18 In this model different colors refer to different permeability values. . . 83

3.19 Plot of the function max θ ΨD(FI(θ, f)) for varying experimental conditions f. The permeability θ = κj in the fault zone j is unknown. A borehole location (experimental design f) may be chosen freely but only along the X-axis component. . . 84

3.20 Cut from 3D model (Figure 3.15) with 2 fault zones. . . 85

3.21 Two dimensional cut of the model Figure 3.15 with one fault zone. . . 86

3.22 Normalized D-optimal design measure D(θ, f). The dark blue color cor- responds to desirable areas, orange and yellow correspond to undesirable borehole positions. . . 88

3.23 Normalized D-optimal design measure D(θ, f) on the reservoir map as shown in Figure 3.16. . . 89

(7)

List of Tables

1.1 The result of the calculation of different parameter values h, influence of the Fisher’s matrix FI, influence of tstep onγij. . . 26 1.2 Influence ofσ on the distance between the models. Small values of param-

eter σ come along with too large distances between the models. Hence, the weights w1(ˆθ, f, x, σ2) cannot be determinated numerically (introduced as

“NaN”), see Table 1.2c. . . 27 1.3 The result of the Example 1.1.1 calculated in Matlab. . . 27 1.4 Mean iteration’s value for the of space mapping algorithms with different

dimension m. . . 38 3.1 Corresponding values of rock compressibilityα for different rock types [73] . 67 3.2 Thermal conductivity at room temperature of different rock types [3]. “mean”

denotes the mean value of thermal conductivity and “sd” denotes its stan- dart deviation . . . 68 3.3 Units and their parameters of Perth-model. Information about the param-

eters is taken from [2, 33, 61, 127]. . . 72 3.4 ASM algorithm on the reservoir . . . 79 3.5 Computational cost of the result . . . 79 3.6 Simulation parameters of the lithological units in Figure 3.15. The symbol

“∗” denotes a parameter that has to be determined. . . 81

vii

(8)

Nomenclature

Abbreviation

AIC Akaike information criterion

AWDC Akaike weight design criterion

ASM Aggressive space mapping

BIC Bayesian information criterion

bsl below sea level

DIC deviance information criterion

FIC Focused information criterion

K-L information Kullback-Leibler information

MIPE method of model identification and parameter estimation

MLE maximum likelihood estimate

NIC network information criterion

OED optimal experimental design

SM space mapping

TIC Takeuchi Information Criterion

TRASM Trust region aggressive space mapping Symbols

empty set

(S,S, µ) measure space

(S,S) measurable space

(Ω,F,P) probability space U(y)

V(y)

score function

A(x, y, θ) loss function

¯

c concentration of particles in a finite volume

C¯i finite volume cell

C observation operator in MIPEmethod

C constantin AIC defined asC:=R

Rf(x) ln(f(x))dx.

c constant

cm rock specific heat capasity [J kg−1K−1] cw fluid specific heat capasity [J kg−1K−1]

Dev(y, θ) deviance

D(f, p(θ)) discrepancy function

viii

(9)

D diffusion coefficient

Dobs observed data

E[X|G] conditional expectation

E[X] mean (expected value)

E,I set of indices

fcoarse(·) coarse function

Fcoarse(·) coarse model cost function Ff ine(·) fine model cost function FI(·) Fisher’s information matrix

ff ine(·) fine function

F σ-field of subsets of Ω

Fn(x, ω) empirical distribution function FX(x) distribution function

FX,Y(x, y) joint distribution

G sub-σ-field ofF

Q source term

g gravity [m/s2]

˜

α compressibility of rock phase [Pa−1]

β parameters

β˜ compressibility of fluid phase [Pa−1]

difference

δ, δfull mean squared error

ij measurements errors

η, ηS vectors in FIC definition

φ porosity

γij measure of the distance between the modelsiandj

κ permeability [m2]

κ,κS components of risk

λe effective thermal conductivity [W/(mK)]

λm rock thermal conductivity [W m−1K−1] λw fluid thermal conductivity [W m−1K−1]

µ measure

µw dynamic viscosity of water density [kg/(ms)]

µij optimal value of Equation (1.19)

2f(x) Hessian matrix

gradient of a function

nonempty set

Ψi∈{A,D,E,L,T}(·,·) optimal design

·c)e effective heat capacity [J/(m3K)]

·c)w fluid volumetric heat capacity [J/(m3K)]

ρw water density [kg/m3]

(10)

τij upper bound for measurements errorsij

θ parameters

θ MLE ofθ

θ0 best approximation parameters value ζ(θ, β) parameters of interest in FIC

H radiogenic heat production [W/m3]

h hydraulic head [m]

HS projection matrix

hw specific enthalpy of water [J/kg]

1r indicator function onrR

I identity matrix

i, j, k, `, m, n, p indexes

J ,ˆ Gˆ estimators of the information matrixJ

J Jacobian matrix

J information matrix

KL(f, y(·, θ)) Kullback-Leibler information function

K number of estimated parameters in AIC

L(·) likelihood function

L2(Ω,F,P) Hilbert space with inner product

˜

m prognosis

Mi hierarchical series of the models P(A|G) conditional probability

πS linear mapping, projection matrix

P probability measure

pSM space-mapping function

P hydraulic pressure [Pa]

P(z, hw) pore water pressure

pDev penalty function of deviance

q conditional density

r(S) risk

S collection of subsets ofS, referred to as aσ-field (σ-algebra)

S nonempty set

Ss specific storage coefficient [m−1]

|S| number of elements inS

T temperature [℃]

u(y, θ) score vector of datayand parametersθ

¯

vS,v¯ vectors in FIC definition

Var[X] variance

v Darcy flux [m/s]

W source term that describes mass [kgm−3s−1]

wik AIC weight

X random variable

Y = (Y1, . . . , Yn)T data observations

Z Banach space of observations in MIPE

(11)
(12)
(13)

Introduction

Regenerative energy resources such as wind, water, and sun are the sources of choice that are ecologically compatible and socially acceptable. Regenerative energy is considered to significantly contribute to sustainability because it does not involve combustion of fossil fuel and does not directly influence global warming. As the earthquake and tsunami disaster of Fukushima in 2011 showed us, nuclear power can no longer serve as a reliable source to overcome the rising demand for energy. Also, non-regenerative energy resources such as coal, oil, or gas are limited and may damage the environment massively.

Here, we work with an essential sustainable resource - geothermal energy. It can be used as thermal energy, i.e., for generating electrical energy or to heat houses or swimming pools. Besides, deep geothermal energy is independent of meteorological conditions and is capable of providing baseload energy supply. Before we can use geothermal energy, it is crucial to find a suitable location of geothermal reservoirs drilling. To reach this, we can use, for example, interpretations of seismic reflection surveys of potential areas that generate a rough model of the subsurface. When the results are promising, the next step is to drill different exploratory holes at locations that deliver as much information as possible from the measured data. This data can then be passed to a numerical simulation of the subsurface to understand the characteristics of suitable rocks in more detail. It can also be used to determine the optimal position of the boreholes that will be used in production mode. The geothermal reservoir development includes deep drilling boreholes, which is extremely expensive and risky. So, to work out the details of suitable borehole locations in progress has great importance. Also, we give a set of existing boreholes and demonstrate how a sophisticated numerical technique helps to find a location of an additional exploratory borehole that reduces risk and, ultimately, saves cost.

The following questions that also arise are among the most crucial aspects in geothermal engineering:

1. Given the subsurface model with its parameters such as matrix thermal conductivity, heat capacity, permeability, porosity etc., what is the value of a resulting physical quantity of interest, say temperature, at a certain location and over time?

2. From given measurements of the temperature, is it possible to estimate the param- eters of a certain rock unit?

3. What are the experimental conditions, say the location of a borehole where measure- ments of the temperature are taken, under which these parameters can be estimated with minimum uncertainty?

While solutions to the first two questions are given in one or the other way in many publications, solutions to third question are not routinely available and are the topic of our current research to be described in the remainder of this work. When used in an iterative fashion, these three questions aim at gradually improving the subsurface model toward capturing the underlying geothermal processes more and more realistically.

3

(14)

This work is devoted to mathemetical aspects of the reservoir development. Its results are introduced in [114, 129–131].

This theses consists of two parts: theoretical background and practical usage. In Chap- ter 1, we give a review on basic probability theory. A lot of sources provide information about it, and here we collect information from [29, 31, 50, 53, 54, 74, 101, 104, 134].

This review is a start point to Akaike information criterion and criteria based on it. All of these criteria approximate the distance between a fitted model and “best” model. Since the information criteria measure a distance, we can apply them in model selection by choosing the model with minimum value of criterion. These criteria are analytic approximations to the types of validation errors that cross validation attempts to estimate through a computational experiment. These criteria can be used if a computing of cross validation experiment is expensive or if Monte-Carlo based methods are not directly applicable.

Information about these criteria and different ways to use them are introduced in [4, 5, 36, 45, 55, 56, 88, 103, 107, 108, 110, 128, 133, 135, 136, 142]. We combine the MIPE method [16] with Akaike weights design criterion to have a possibility for a better estimation of a model.

A core part of the work is optimal experimental design (OED) methodology and its appli- cation on the geothermal reservoirs. Gereral OED methods are presented in [9, 11, 13, 14, 22, 26, 47–49, 52, 67, 99, 100, 102, 117, 118, 138, 141]. Today, despite high significance and relevance to geothermal engineering, it is not at all common to study OED for problems arising from geothermal energy. However, OED is used to a greater extent in other areas of the geosciences. The work related to OED in geoscientific questions involves groundwater modeling and management [106]. This study addresses the determination of the location of observation points as well as the location of pumping wells and the underlying pumping rate. The overall goal is to analyze simulation models in decision making for groundwater management. In another study [143], the authors aim at finding locations for a network of observation wells to estimate pumping rates from observations of head or drawdown.

Similar applications of OED include the design of optimal observation networks for con- fined aquifer parameter estimation [89], transport parameter estimation [59], unconfined parameter estimation [6], and dispersion parameters [51].

We solve an OED problem for a geothermal reservoir in regions of western Australia and in the Tuscany region, Italy. The engineering problem consists of determining the location of a borehole such that the uncertainty of estimating the hydraulic permeability from tem- perature measurements is minimized. The solution procedure, however, is not restricted to this particular OED problem; it is also applicable to a wide range of different OED prob- lems arising from engineering applications. During the usage of OED methodology on the geothermal models, we get a new challenge - computation time and numerical cost of the models. The complexity of the problem requires extensive computational efforts and parallelized execution of the computer codes. Parallelization, however, is needed, since a large range of parameers in question have to be analyzed for the multiple OED problems.

We solve this problem independently. For the given problem, the time to evaluate the parallelized forward model usingSHEMAT-Suiteis about 1 min on a single shared-memory cluster node with eight cores. The time to perform a single execution of the automatic differentiation code to evaluate the forward model with its first-order derivatives takes also about 1 min, parallelized on that same node. To obtain the results for the ten different permeability values, the resulting parallel runtime would be in the order of 30 days on a single shared memory node. However, using a different shared-memory node of the com- pute cluster for each value of permeability, the overall parallel run time is reduced to just 3 days with the help of EFCOSS. Information on the software mentioned before is available

(15)

in [23, 58, 82, 84, 120, 122, 129].

This complexity brings us to another research part - derivative free optimization methods, introduced in [64, 85]. The space mapping approach is the matter of choice. This method is based on the surrogate models and allows us to go through the optimization problem with less expensive models. It was described by [15, 20, 70].

In Chapter 2, we refer to the processes that stay behind the “black-box” tool SHEMAT- Suite. We introduce here, Navier-Stocks equations, Darcy’s law, and flow processes in porous media, give some review on the flow models of physical systems. The second part of this chapter is a summary of finite difference approximation for Boussinesq flows and finite volume methods for advection dominated flows. The section contains a discussion for time discretization methods of partial differential equations. Chapter 2 is based on the sources [23, 83, 86, 92, 98, 115, 146].

In Chapter 3, we introduce the reservoirs and results according to our main engineering task: To solve the problem of positioning the borehole according to data from geoscience.

We refer the interested reader for further literature on geoscience to [1, 2, 25, 27, 28, 33, 37–

40, 42, 43, 58, 66, 69, 76, 90, 95, 111, 121, 125–127, 139].

Chapter 3 reviews different types of geothermal systems. We distinguish them by the ground heat in following types:

Shallow systemshave low heat. The usage of this energy may still be economical, since these systems are relatively close to the surface.

Medium systems are well-suited for storing thermal energy underground.

Deep systems have the highest temperatures and are often related to volcanic regions. Then, the energy of the hot stream can directly be used if it is reached by a borehole.

Simulating the underground flow pattern is imperative for finding an optimal borehole position.

The SHEMATsoftware package, which is used for simulating the heat and mass transport, is based on mathematical models that we introduce in this chapter. Here, we assume that the possibly multidimensional fluid-dynamic equations are in a steady state. Then, appropriate model parameter must be determined using the tools in Chapter 1. We consider various case studies:

For a reservoir in Australia we investigate a two-dimensional model. In particular, two- and three-dimensional models are considered. Based on experimental data, we determine unknown model parameters to find an optimal location of a borehole.

We compare different optimal design criteria. Furthermore, we discuss a possible drawback of space mapping.

We consider a three-dimensional problem for a reservoir in Italy, which was obtained by seismic data. Since this problem is computationally extremely expensive, we introduce a sub-model of reduced size. In particular, optimal experimental designs are compared for two- and three-dimensional systems.

(16)
(17)

Methods for model identification and parameter estimation

Chapter 1 provides some theoretical results on mathematical methods for the identification of mathematical models. It also includes information on theoretical and numerical results for the sensitivity analysis of such models as well as on observational errors.

One of our goals is to identify the “correct” model among some given models. As basic tool we consider weight-based methods, namely: Akaike information criterion (AIC), Akaike weight design criterion (AWDC), method of model identification and parameter estimation (MIPE) such as a combined method of MIPE with AWDC.

On the other side, we want to determine how parameters influence the model. We can achieve this, e.g., by a sensitivity analysis or by using the Fisher information matrix (see Definition 1.0.11). The discipline of optimal experimental design (OED) summarizes these mathematical tools. We also show the limitation of OED. In particular, in the case when the computation of derivatives is costly, we might use alternative methods. For this case, we do the parameter estimation with derivative-free methods, and we choose space mapping, which uses the surrogate model to find the solution of the optimization problem for a given model. Observed data cause some measurement or computation errors in the models. This is the reason why we use the MIPE method and why we try to determine the observations errors.

We organize Chapter 1 as follows:

Section 1.0 We briefly recal results on probability theory based on the literature [29, 53, 54, 74, 104].

Section 1.1.1 We present the Akaike information criterion (Section 1.1.1.1) and the Akaike weights design criterion (Section 1.1.1.3). These criteria are based on the Kullback-Leibler information. In particular, the Akaike information criterion is based on [4, 5, 45], the Akaike weights design criterion is based on [45, 107]. Furthermore, related criteria are introduced. We consider the Bayesian information criterion [45, 128], focused information criterion [55, 88, 142], Network information criterion [110], Takeuchi information cri- terion [136, 142] and Deviance information criterion [133].

Section 1.1.2 We distinguish between weight-based and combined models. We present a method for model identification and parameter estimation based on the model’s weight [16].

7

(18)

Section 1.1.3 This section presents the result of combined methods from Section 1.1.1.3 and Section 1.1.2.

Section 1.2 We introduce different optimal experimental designs such as D-,E-, L-, T- and A-optimal design. For more details, we refer the intrested reader to [9, 13, 14, 22, 26, 100, 117, 141].

Section 1.3 Non-gradient methods of the model optimization problem such as an ag- gressive space mapping (Section 1.3.1) and a trust region aggressive space mapping (Section 1.3.2) considered. These methods allow optimizing mod- els that are difficult to evaluate with the help of surrogate models. This section is based on results in [15, 20, 70].

These methods will be used in practical applications presented in Chapter 3.

1.0 Introduction to basic probability theory

The mathematical discussion of probability theory began with a discussion of dice in the works of Gerolamo Cardano [50], Blaise Pascal and Pierre de Fermat [116]. Cardano was working on a game whose outcome is strongly influenced by some randomizing device (such as cards, dice), and upon which participants may bet anything of monetary value. This game came to us under the name “game of chance”. In the correspondence of Pascal and Fermat, we can see the discussion of a game of chance, in which two players have equal chances of winning each round. In this game, the player who first wins a certain amount of rounds takes the prize. However, what happens if the game is interrupted before some player wins? This problem is also known as the “problem of points”.

Kolmogorov introduced an axioms system for probability theory, especially for infinitely many possible outcomes in 1933 [101]. In his work, he founded the modern probability theory by combining the notion of sample space and measure theory.

We start by recalling the necessary results from probability theory.

Definition 1.0.1. A measure space [29] is a triple (S,S, µ), where S is a nonempty set; S is a collection of subsets of S, referred to as a σ-field (σ-algebra), which includes the empty set ∅ and is closed under complements and countable unions. The measure µ:S →[0,∞] satisfies

• µ(∅) = 0,

• (countable additivity)µ(∪n=1An) =

P

n=1

µ(An) ifA1, A2, . . .is a sequence of pairwise disjoint sets inS.

The pair (S,S) is denoted as measurable space.

Example1.0.1. Assume we have some tosses of a balanced coin, falling out of tail is denoted as T and head as H. So the set S is S ={H, T} and the smallest possible collection of the subsets S1 is S1={S,∅}. In this case the measurable space will be (S,S1).

Definition 1.0.2. If (Si,Si), i = 1,2 is a pair of measurable spaces, then a function f :S1→S2 is said to bea measurable map [29] iff−1(B) :={x∈S1 :f(x)∈B} ∈ S1 for all B ∈ S2.

Definition 1.0.3. A probability space [29, 134] is a triple (Ω,F,P), where Ω is a nonempty set, F is a σ-field of subsets of Ω, and P is a finite measure on the measurable space (Ω,F) with P(Ω) = 1. The measure P is referred to as aprobability measure.

(19)

Example 1.0.2. Let a fair coin be tossed. So the sample space is Ω ={H, T}, theσ-field F ={∅,{H},{T},Ω}and probability P(∅) = 0,P({H}) = P({T}) = 0.5,P(Ω) = 1.

Definition 1.0.4. A random variable [29] X is a measurable map on a probability space (Ω,F,P) into a measurable space (S,S). Continuous random variable [134]

takes the values from a given real interval.

Example 1.0.3. Let a fair coin be tossed twice and let the random variable X be the number of fallouts of tail. Then the space Ω here is defined by Ω = {0,1,2}, σ-field F1 will beF1 ={∅,Ω} and probability P(X) = 34.

Definition 1.0.5. Cumulative distribution function [134] FX(x) of a random vari- able X on (Ω,F,P) with values inRis then

FX(x) := P{X ≤x} for − ∞< x <∞.

Definition 1.0.6. A function f(x) is a probability density [74, 134] of continuous random variable X and defined as

f(x) := dFX(x) dx .

Definition 1.0.7. LetX be a random variable, then themean (expected value)[134]

of X is defined as

E[X] :=

Z

−∞

xdFX(x).

Definition 1.0.8. The variance [29] Var[X] of a random variable X ∈L2(Ω,F,P)1 is defined by the average squared deviation of X from its meanE[X]:

Var[X] :=E h

X−E[X]2i

=E X2

−E[X]2. We introduce the following notation:

• gradientof a function f(x) :Rn→R, x∈Rn is∇f(x),

• Jacobian matrix J of a function ˜f(x) :Rn→Rm isJ :=

"

∂f˜i(x)

∂xj

#

,i= 1, . . . , m, j= 1, . . . , n,

• Hessian matrix ∇2f(x) of a function f(x) : Rn → R is ∇2f(x) :=

2f(x)

∂xi∂xj

, i, j= 1, . . . , n

Definition 1.0.9. For dataY = (Y1, . . . , Yn)T with a positive densityf(y, θ) and param- etersθ= (θ1, . . . , θp)T thescore vector [56] is a vector in Rp of the following form

u(y, θ) :=∇θln(f(y, θ)). (1.1)

Definition 1.0.10. For data Y = (Y1, . . . , Yn)T with a density and parameters θ = (θ1, . . . , θp)T theinformation matrix [56] is thep×p matrix of the following form

J(y, θ) :=∇2θf(y, θ). (1.2)

1The spaceL2(Ω,F,P) is a Hilbert space with inner product.

(20)

Definition 1.0.11. Let a modelf(y, θ), data observationsY = (Y1, . . . , Yn)T, parameters θ = (θ1, . . . , θp)T, best approximation parameters value θ0 and true density of the data g(y) =f(y, θ0), for all y be given. Then the Fisher’s information matrixof the given model [56] is thep×p matrix in the form

FI(θ0) :=

Z

f(y, θ0)u(y, θ0)u(y, θ0)Tdy=− Z

f(y, θ0)J(y, θ0)dy, whereu(y, θ) and J(y, θ) are defined in Equations (1.1) and (1.2) respectively.

Definition 1.0.12. Let X be a real-valued random variable defined on a probability space (Ω,F,P), FX(x) the distribution ofX, andGa sub-σ-field ofF. The conditional expectation [74] ofX givenG is the following function

E[X|G] :=

Z

xFX(dx|G)(ω),

where it is understood that, evaluated at any particular value ofω, the conditional expec- tation can be any value ofR.

Definition 1.0.13. The conditional probability[74] ofAgivenG, denoted by P(A|G), is

P(A|G) :=E[1A|G], A∈ F.

Definition 1.0.14. Let X be a (S,S)-valued random variable defined on a probability space (Ω,F,P), µ a σ-finite measure on (S,S), and G a sub-σ-field of F. A nonnegative measurable function q on (Ω×S,F × S) is called a conditional density[74] of X with respect toµgiven G if the map from Ω7→Rdefined by

B 7→

Z

B

q(·, x)µ(dx) is a conditional distribution of X given G.

Definition 1.0.15. LetX = (X1, . . . , Xn) be real-valued random variables on (Ω,F,P) then the joint cumulative distribution[134] of Xi, i= 1, . . . , n is

FX(x) := P{Xi ≤xi, i= 1, . . . , n}, xi ∈R.

Definition 1.0.16. For random variablesX1, . . . , Xnthe empirical distribution func- tion [31] is

Fn(x, ω) := 1 n

n

X

i=1

1(−∞,x](Xi(ω)).

Definition 1.0.17. The likelihoodL(·) [53, 77] is the density of the observed data. The likelihood may depend on a parameterθwithθ∈Θ, where Θ is the parameter space. The likelihood L(θ) ofθ is the model density of the observed data Dobs viewed as a function of θgiven Dobs

L(θ) :=

n

Y

i=1

f(θ, Diobs), wheref(θ, Dobsi ) is probability density of a joint distribution.

(21)

Example 1.0.4. Assume that a symmetric coin is tossed twice and it lands twice to tail, so the observation is Dobs = “TT”. The probability of X= “landing to tail” is P(X) = 0.5.

According toDobsthe probability of getting twice the tail is P2(X) = 0.25. The likelihood of P(X) = 0.5 with observation Dobs = “TT” is

L(·) =L(P(X) = 0.5|Dobs) = P(Dobs|P(X) = 0.5) = 0.25.

Definition 1.0.18. The maximum likelihood estimate (MLE) [53] of θ is a value that maximizes

θ∈arg max

θ∈Θ ln(L(θ)) over the parameter space Θ.

The observed score function is the derivative with respect toθof the log-likelihood (assume that the likelihood is differentiable function of θ)

u(·, θ) =∇θln(L(θ)). (1.3)

A simple computation shows that: If the likelihood is a sufficiently smooth function ofθ, a maximum likelihood estimate ˆθ is a solution of the equation

0 =u(·, θ)

where u(·, θ) is defined by (1.3). The observed information function is minus the second derivative of the log-likelihood (assume that the likelihood is a twice differentiable function of θ)

J(·, θ) =−Jθ{u(·, θ)}=−∇2θln(L(θ)). (1.4) We assume that the observed information J(·,θ) defined by (1.4) and evaluated at ˆˆ θ, is positive definite. If the model is correct, then the parameter θ0 that identifies the distribution in the model that is actually generated by the observed dataDobsand satisfies

0 =E[u(·, θ0)]. (1.5)

We assume: the likelihood L(θ) has the density f(y, θ) and the random variable Y has the density f(y, θ0). Equation (1.5) can be written as

E[u(·, θ0)] =E[∇θln(L(θ))]

θ=θ0

= Z

θln(f(y, θ)) θ=θ0

f(y, θ0)dy

=

Z 1

f(y, θ0)f(y, θ0)∇θf(y, θ) θ=θ0

dy

=∇θ Z

f(y, θ) θ=θ0

dy=∇θ1 = 0.

(22)

1.1 Methods for identifying models

“All models are wrong but some are useful”

George E. P. Box (1919–2013) The model identification starts with a system of parameter-dependend models. It is mo- tivated by the need to find the “true” model along with the corresponding parameters.

Here, the models are (partial) differential equations with unknown parameters. By dis- tinguishing between competing models, we need appropriate experimental tests. For each experiment, we prescribe a system input and observe a part of the system state. Identi- fying the correct model and corresponding parameters is crucial in many applications. If the model is known, the problem of identifying parameters in the model has been studied extensively in the literature: for problems with a finite number of parameters [8, 24], and for problems with distributed parameters [71, 107, 124, 137, 145].

Here, we present different approaches for model identification. These approaches are based on distances between the models and their weights.

In [16], an approach of model identification and parameter estimation is presented. This method is based on the experiments and models distances. For each model pair the authors conduct an experiment with computed inputs and observations. They use the results of the experiment and the previously computed model distances to determine the true model.

Next, parameters in the identified model are estimated.

An approach, presented in [12], is to assume that a guess of the true model is available.

Then the distances between all other models and the “true” model is computed and the next experiment is computed to maximize the sum of these distances. If necessary, the guess of the true model is updated based on the new experiment and the process is repeated.

The approaches in [35, 47, 75, 91, 107] differ from each other and from [12] in the measure of model distance. For a discussion of the various proposed discrimination criteria, we refer to [21, 107]. For example, the model distances proposed in [107] build on the so-called Akaike information criterion (AIC) [4]. The AIC measures the loss of information when the true model is replaced by an approximating model, [46]. From the AIC, the authors of [107]

derive weights for each of the models in system under consideration and compute a design by minimizing the sum of the Akaike weights. The Akaike weights, however, depend on current parameter estimates, and statistical information of measurement errors, which are updated based on experiments using the new design. The result is a sequential procedure.

In each step, designs are computed by minimizing the sum of Akaike weights. Then the experiments are conducted, parameter estimates are updated and the set of candidate models are updated.

1.1.1 Akaike criterion and related problems 1.1.1.1 Akaike information criterion (AIC)

Hirotugu Akaike published his work on statistical model identification in the 1970s [4]. In his work, Akaike introduced classical maximum likelihood estimation (MLE) (see Defini- tion 1.0.18) as a procedure of statistical model estimation and the minimum information theoretical criterion (namedAkaike information criterion (AIC)). MLE is used, for exam- ple, for choosing the highest dimension of polynomial regression or the multi-step Markov chain. AIC based on the Kullback-Leibler information [36, 103], which shortly can be explained as an information loss when modely that depends on the parameters θis used

(23)

to approximate observations f, or as a “distance” between a full reality f and a model y.

Akaike suggested AIC for solving the problem of selecting between different models with different numbers of parameters.

Definition 1.1.1. Let f : R → R, y : R → R be given smooth non-negative func- tions. Then the Kullback-Leibler information (K-L information) KL(f, y(·|θ)) is defined as

KL(f, y(·, θ)) :=

Z

R

f(x) ln

f(x) y(x|θ)

dx, (1.6)

wherey(·, θ) is a parametric family of density functions.

We consider Equation (1.6):

KL(f, y(·, θ)) = Z

R

f(x) ln

f(x) y(x|θ)

dx

= Z

R

f(x) ln(f(x))dx− Z

R

f(x) ln(y(x|θ))dx

=Ef[ln(f(x))]−Ef[ln(y(·|θ))].

(1.7)

In Equation (1.7) the expectations (see Definition 1.0.7) are taken with respect to observa- tionsf andEf[ln(f(x))] does not depend on the measurementθor modely(·, θ). Therefor we define the constant

C :=

Z

R

f(x) ln(f(x))dx. (1.8)

From Equations (1.7) and (1.8) it follows:

KL(f, y(·|θ)) =C−Ef[ln(y(·|θ))].

Akaike has built a relationship between the K-L information KL(f, y(·, θ)) and the likeli- hood theory through the estimation of EDobsEf[ln(y(·|θ(Dˆ obs)))], where the inner part of Ef[ln(y(·|θ))] is replaced by the MLE ofθ, based on the modely and dataDobs. He found that the maximized log-likelihood value is approximately equal to the number of estimated parametersK in the modely(·, θ), and it is a biased estimate ofEDobsEf[ln(y(·|θ(Dˆ obs)))].

The unbiased estimator of EDobsEf[ln(y(·|θ(Dˆ obs)))] is ln(L(y(ˆθ)|Dobs))−K. Akaike ob- tained the relation:

ln(L(y(ˆθ)|Dobs))−K =C−Ef[ln(y(ˆθ))],

where L(y(ˆθ)|Dobs) is the likelihood function (see Definition 1.0.17) of the model y(·, θ) with parameter values ˆθand measurementsDobs. The approximating modely(ˆθ) depends on estimated parameter ˆθ and Dobs is the measurement data for that model. According to this, Akaike defined his information criterion by:

AIC :=(−2) ln(max likelihood)

+ 2(number of independently adjusted parameters within the model).

Definition 1.1.2. We define AIC as:

AIC :=−2 ln(L(y(ˆθ)|Dobs)) + 2K. (1.9) A low AIC value of the model shows that the model is the “best one” and it has the smallest lost of information. We call it AICc. In some literature, for example in [56], the AICc or AICc is used as corrected AIC and will be obtained with a calibrated penalty factor 2K in Equation (1.9).

(24)

1.1.1.2 Related criteria to AIC

There are related criteria based on K-L information, for example, Bayesian information criterion [45], focused information criterion [55, 88], Takeuchi information criterion [136], deviance information criterion [133], network information criterion [110] and others.

Definition 1.1.3. According to Schwarz’s definition [128] the Bayesian information criterion (BIC) is:

BIC :=−2 ln(L(y(ˆθ)|Dobs)) +Kln(n).

So it is a penalized log-likelihood function where the penalty is equal to the logarithm of the sample size (n) times the number of estimated parameters (K) in the model. It is worth mentioning, that BIC for the large-sample limit will converge to MLE (Defini- tion 1.0.18) [53].

Definition 1.1.4. For given parameters θ ∈ Rm, β ∈ Rq we consider independent data Y1, . . . , Ynwith a density of the formf(y;θ, β). The true model parameters are denoted as θ0 ∈Rm, β0 ∈Rq with the model densityf0 :=f(y;θ0, β0+δn), where δ∈Rq expresses the degrees and directions of the model change and nis a sample size.

We define the parameters of interest

ζ(θ, β) :=ζ[f(·;θ, β)], ζ0:=ζ[f0].

For any selected index setS ⊆ {1, . . . , q}and some ˜β0,j ∈Rwe setβj :=β0,j for allj∈S and βj := ˜β0,j for all j∈SC, where SC is the complement ofS.

The parameter, based on the MLE estimator (ˆθS,βˆS), reads ˆζS := ζ(f(·; ˆθS,βˆS, β0,SC)), SC ⊂ {1, . . . , q}.

We assume thatζ(θ, β) is smooth in a neighborhood of (θ0, β0) and define the information matrix (see Definition 1.0.10)

Jfull:=

J00 J01 J01 J11

at the model f0 of score function U(y)

V(y)

=

θln(f(y;θ0, β0))

βln(f(y;θ0, β0))

.

The linear mappingπS : ¯v7→v¯S is given by the projection matrixπS ∈R|S|×q, here |S|is the number of elements inS. Then the risk r(S) is

r(S) := (∇θζ(θ, β))TJ00−1(∇θζ(θ, β))

T(I−κ1/2HSκ1/2)δδT(I −κ1/2HSκ1/2)η+ηTκ1/2HSκ1/2η.

(25)

Components η, δ,κ and the projection matrixHS ∈Rq×q are defined in (1.10) as πfull :=I,

πSv¯:= ¯vS,

κ := (J11−J10J00−1J01)−1, κS := (πSκ−1πST)−1, HS :=κ−1/2πTSκSπSκ−1/2,

η :=J10J00−1θζ(θ, β)− ∇βζ(θ, β), ηS :=πSη,

δfull :=√

n(βfull−β0).

(1.10)

Definition 1.1.5. Focused information criterion (FIC) (see [55, 56, 88]) is an un- biased estimator for the risk r(S) plus an additive constant not depending on S and introduced as

FIC :=ηT(I−κ1/2HSκ−1/2fullδTfull(I−κ1/2HSκ−1/2)Tη+ 2ηSTκSηS, whereHS,κ,κS, δfull, ηS are defined in (1.10).

Definition 1.1.6. The Takeuchi Information Criterion (TIC)(see [56, 136]) is TIC := 2 ln(L(y(ˆθ)|Dobs))−2 Tr( ˆJ−1G),ˆ

where

Jˆ=−1

n∇2θln(L(y(ˆθ)|Dobs)), Gˆ= 1

n

n

X

i=1

(∇θln(y(xi,θ)))(∇ˆ θln(y(xi,θ)))ˆ T.

We consider a stochastic system which receives an input vector x ∈Rn with probability distributionFX(x) (Definition 1.0.5) and densityq(x) (Definition 1.0.6). An output vector y∈R` is described by a conditional probabilityq(y|x) (Definition 1.0.13).

Since the true system is unknown, we consider the family of parameterized conditional distributions {p(y|x, θ), θ ∈ Rm}. A faithful model should allow the realizationq(y|x) = p(y|x, θ).

The empirical distribution of a training set yields a reasonable approximation of the joint density FX,Y(x, y) = q(y|x)q(x). This motivates to identify the corresponding output vectory∈R` as the correct one, which is in fact unknown.

To evaluate estimates, we introduce a loss functionA and aloss function with regulariza- tion term AR. Then the loss is specified, when the input x is known instead af the true solution y. Possible choices without regularization would be:

• mean squared error

A(x, y, θ) = 1 2 Z

ky−yk2q(y|x, θ)dy which reads in the deterministic case

A(x, y, θ) = 1

2ky−yk2

(26)

• log likelihood ratioA(x, y, θ) = ln P(y|x) q(y|x, θ)

More generally, a regularization term ξ(θ) [108] can be added to make the parameter θ stay in an appropriate region. Thus, we defineAR=A+ξ(θ). The loss function is used to define a discrepancy function that describes the expected loss between two distributions.

In the following we assume this introduced setting.

Definition 1.1.7. The discrepancy functionof a target q and a machine p(θ) is D(q, p(θ)) :=

Z

AR(x, y, θ)q(x)q(y|x)dxdy

Definition 1.1.8. LetMi ={qi(y|x, θi), θi ∈Rmi}be a hierarchical series of the models Mi ⊂Mj fori < j, let the parameter ˜θi be the parameter of the model Mi obtained by learning , and let variable t be the number of observed examples of training set. Then, the Network Information Criterion (NIC) (see [110]) is

NIC(qi) :=D(p, qi( ˜θi)) + 1 tTr

Gi( ˜θi)Qi( ˜θi)−1 for Gi( ˜θi) = Varqh

θAR(x, y,θ˜i)i and Qi( ˜θi) =Eq

h

2θAR(x, y,θ˜i)i .

The function D(p, qi( ˜θi)) is defined in Definition 1.1.7 and AR(x, y,θ˜i) denotes the loss function with regularization term.

Definition 1.1.9. For the parametersθ∈Rm and data y of the given model f(y, θ), we define the devianceDev(y, θ) [56] as the bivariate function

Dev(y, θ) :=−2 lnf(y, θ).

The properties of the deviance function are:

• Dev(y, y) = 0, ∀y,

• Dev(y1, y2)>0, ∀y1 6=y2

Definition 1.1.10. For the parameters θ∈Rm and the estimate ¯θ=E[θ|Dobs] we define the penalty function

pDev:=E[(Dev(y, θ)−Dev(y,θ))|D¯ obs] and the Deviance Information Criterion (DIC) [56] as

DIC := Dev(y,θ) + 2p¯ Dev. 1.1.1.3 Akaike weights design criterion (AWDC)

We consider the following problem. We have m ∈ N different models, and the size of AIC is large. Assume, that Xi is a normally distributed random variable with E[Xi] = 0 and Var(Xi) =σi2. Since the individual AIC values contain constants C, it is much more effective to calculate the difference

i:= AICi−AICc, (1.11)

where AICc is the minimum of the m different AICi, i = 1,2, . . . , m values. Due to Equation (1.11) we have the inequality ∆i ≥0 for all models. A best modelisatisfies the

(27)

equality ∆i = 0. An interpretation would be that: The larger the difference ∆i is, the less possible is that the selected modeli will be chosen as the best approximating model from the set of “candidate models”. With a small transformation ∆i 7→exp (−0.5∆i) the evidence ratio is

L(yi(θ)|Dobs) L(yj(θ)|Dobs).

It shows the relative likelihood of the modeli to the model j and it can be explained as follows: If the ratio is large, then the model yj is poorer than the model yi, based on the datad. And the likelihoodL(y(ˆθ)|d) for the j’s model can be written as follows:

L(yj(ˆθ)|Dobs) =

n

Y

i=1

1 2πσi2

1/2

exp

−(xi−m˜j(θ, f))2i2

(1.12) Based on this ratio we build the Akaike weights wi as

wi(ˆθ, f, x, σ2) = exp(−0.5∆i)

m

P

j=1

exp(−0.5∆j)

. (1.13)

The weights satisfy 0≤wi(ˆθ, f, x, σ2)≤1 and

m

P

i=1

wi= 1, wheremis the number of model candidates and ˆθ∈Θ is the parameter set which should be estimated for all models. The weights state the probability of the modelito be the correct one. The difference between model iand the “best” modelcis ∆i = AICi−AICc.

For the best modelc and the likelihoodL(yj(ˆθ)|Dobs) Definition 1.13 yields the weight wc(ˆθ, f, x, σ2) =

m

X

j=1 n

Y

i=1

exp

−( ˜mc,i(θ, f)−m˜j,i(θ, f))2

i2 +Kc+Kj

−1

, (1.14) where ˜mj,i(θ, f) is the prognosis of the model j for the i-th measurements, σ2i is the deviation of the measured values and Kj and Kcare the number of estimated parameters in model j and crespectively.

We can repeat the previous computation for the difference ∆i,k = AICi−AICk if the best model cis unknown. Then, the weightwik is built through:

wik(ˆθ, f, x, σ2) = exp(−0.5∆i,k) Pm

j=1

Pm l=1

exp(−0.5∆j,`)

. (1.15)

We plug Equations (1.9) and (1.12) into Equation (1.15) to obtain the weight

wik(ˆθ, f, x, σ2) =

exp(Kk−Ki)

n

Q

s=1

exp2x

s( ˜mi(θ,f)−m˜k(θ,f))−( ˜m2i(θ,f)−m˜2k(θ,f)) 2s

m

P

j=1 m

P

l=1

exp(K`−Kj)

n

Q

s=1

exp

2xs( ˜mj(θ,f)−m˜`(θ,f))−( ˜m2j(θ,f)−m˜2`(θ,f)) s2

.

Then we determine a best model by i ∈arg min

i m

X

k=1

wik(ˆθ, f, x, σ2).

We take a look at the following example.

(28)

Example 1.1.1. The model (1.16) is given by:

yi =A−1i (θ)f, Ai(θ) =I+θeieTi , f` >0, f` ∈F =

f ∈Rn

kfk22= 1 ,

(1.16)

whereei is an unit vector.

Assume thati= 1 is the index of the best model, θ= 1 is the optimal parameter of this model and observations aref` = 12(ei+ej). Then we have A1(1) = diag(2,1, . . . ,1) and

A1(1)−1f` =





1

2(ei+ej) ifi6= 1 and j6= 1,

1

4ei+12ej ifi= 1 and j6= 1,

1

2ei+14ej ifi6= 1 and j= 1.

It follows that y = A−1i (θ)f` = 2(1+θ)1 ei + 12ej. For the model (1.16) the prognosis will be ˜mj(ˆθ, f) =yj(ˆθ, f) with parameter ˆθ ∈ arg min

θ∈Θ{yi(θ, f), yj(θ, f)}. According to this assumption the weight (1.14) is calculated through the model distancesy1(ˆθ, f)−yj(ˆθ, f):

w1(ˆθ, f, x, σ2) =

m

X

j=1 n

Y

i=1

exp

y1(ˆθ, f)−yj(ˆθ, f)2

i2 +K1+Kj

−1

. (1.17)

The numerical result for the Example 1.1.1 calculated in Matlabis presented in the Ta- ble 1.2 and Table 1.3.

Referenzen

ÄHNLICHE DOKUMENTE

H.: A plane stress yield function for anisotropic sheet material by interpolation of biaxial stress states. ; Yan , Y.: The equivalent plastic strain-dependent Yld2000-2d yield

For the worst case, when a bipartite graph contains the vertex subsets of the same cardinality |V1 | = |V2 |, the algorithm requires the DP matrix one-quarter of the size of that

Josef Leydold – Mathematik für VW – WS 2017/18 1 – Logik, Mengen und Abbildungen – 2 / 26..

We shall consider a hierarchical model of a production planning problem.. An algorithm for solving this game is given below.. Let us now illustrate the problem by

In the simulations, we consider the following forecast combination approaches: (1) smoothed Akaike information criterion model averaging estimator (labeled S-AIC), (2) smoothed

In this paper we investigate the finite sample performance of four kernel-based estimators that are currently available for additive nonparametric regression models - the

Figure 2.5: Linear, bilinear, multilinear and polynomial systems After introducing some basics of tensor calculus, first Boolean (discrete- valued) state space models and

Variable selection: Number of variables kept in the model versus average model errors in evaluating diabetic foot ulcer