DOI: 10.1111/mafi.12289
O R I G I N A L A R T I C L E
Markov chains under nonlinear expectation
Max Nendel
Center for Mathematical Economics, Bielefeld University, Bielefeld, Germany
Correspondence
Max Nendel, Center for Mathematical Economics, Bielefeld University, 33615 Bielefeld, Germany.
Email:Max.Nendel@uni-bielefeld.de
Funding information
Deutsche Forschungsgemeinschaft, Grant/Award Number: CRC 1283
Abstract
In this paper, we consider continuous-time Markov chains with a finite state space under nonlinear expecta- tions. We define so-calledQ-operators as an extension of Q-matrices or rate matrices to a nonlinear setup, where the nonlinearity is due to model uncertainty. The main result gives a full characterization of convexQ-operators in terms of a positive maximum principle, a dual rep- resentation by means ofQ-matrices, time-homogeneous Markov chains under convex expectations, and a class of nonlinear ordinary differential equations. This extends a classical characterization of generators of Markov chains to the case of model uncertainty in the generator. We fur- ther derive an explicit primal and dual representation of convex semigroups arising from Markov chains under convex expectations via the FenchelβLegendre transfor- mation of the generator. We illustrate the results with several numerical examples, where we compute price bounds for European contingent claims under model uncertainty in terms of the rate matrix.
K E Y W O R D S
generator of nonlinear semigroup, imprecise Markov chain, model uncertainty, nonlinear expectation, nonlinear ODE
This is an open access article under the terms of theCreative Commons AttributionLicense, which permits use, distribution and reproduc- tion in any medium, provided the original work is properly cited.
Β© 2020 The Authors.Mathematical Financepublished by Wiley Periodicals LLC
474 wileyonlinelibrary.com/journal/mafi Mathematical Finance.2021;31:474β507.
1 INTRODUCTION AND MAIN RESULT
In mathematical finance, model uncertainty or ambiguity is an almost omnipresent phenomenon, which, for example, appears due to incomplete information about certain aspects of an underly- ing asset or insufficient data in order to perform reliable statistical estimation methods for the parameters of a stochastic process. The latter typically leads to so-called parameter uncertainty in the generator of a stochastic process. Prominent examples for this type of uncertainty include a BlackβScholes model with uncertain volatility, the so-called uncertain volatility model, cf.
Avellaneda, Levy, and ParΓ‘s (1995), Avellaneda and ParΓ‘s (1996), and Vorbrink (2014), and a Brownian motion under drift or volatility uncertainty leading to theg-framework, see, for exam- ple, Coquet, Hu, MΓ©min, and Peng (2002) or theG-framework by Peng (2007) and Peng (2008), respectively. Lately, these approaches have been generalized to LΓ©vy processes with uncertainty in the LΓ©vy triplet, cf. Denk, Kupper, and Nendel (2020), Hu and Peng (2009), and Neufeld and Nutz (2017), and uncertainty in the generator of Feller processes, cf. Nendel and RΓΆckner (2019). While these works give sufficient conditions in order to guarantee the existence of stochastic processes under model uncertainty and to establish a connection to nonlinear partial differential equations, there is no necessary condition that determines the maximal degree of ambiguity that can be cap- tured by an uncertain process.
In the present paper, we address this issue in a simplified setup, where we consider a finite state space. We provide sufficient and necessary conditions in terms of the generators of time- homogeneous continuous-time Markov chains that guarantee the existence of a continuous-time Markov chain under a convex expectation. We further establish a one-to-one relation between the transition operators of convex Markov chains and a class of nonlinear ordinary differential equations. In particular, we extend a classical relation between Markov chains, rate matrices, and ordinary differential equations to the case of model uncertainty. The ordinary differential equa- tion related to a convex Markov chain is a spatially discretized version of a HamiltonβJacobiβ
Bellman equation, and the nonlinear transition operators are related, via a dual representation, to a control problem where, roughly speaking, βnatureβ tries to control the system into the worst possible scenario (see Remark4.18). The explicit description of the transition operators gives rise to a numerical scheme, different from RungeβKutta methods, for the computation of price bounds for European contingent claims under model uncertainty. We illustrate this method and other numerical methods in several examples, where we consider an underlying Markov chain, which is a discrete version, more precisely, the generator is a finite difference discretization of the generator of a Brownian motion with uncertain drift, cf. Coquet et al. (2002), and uncertain volatility, cf. Peng (2007) and Peng (2008). The main tools, we use in our analysis, are convex duality, a semigroup-theoretic approach to control problems due to Nisio (1976/77), see also Denk et al. (2020) and Nendel and RΓΆckner (2019), and a convex version of Kolmogorovβs extension theorem due to Denk, Kupper, and Nendel (2018), which allows to extend the expectation to functionals that depend on the whole path. Restricting the time parameter, in the present work, to the set of natural numbers leads to a discrete-time Markov chain, in the sense of Denk et al.
(2018, Example 5.3).
The concept we use to describe ambiguity is the notion of a nonlinear expectation introduced by Peng (2005). Nonlinear expectations closely relate to other concepts describing model uncertainty, backward stochastic differential equations (BSDEs), cf. Cohen (2012), and Coquet et al. (2002), and 2BSDEs, cf. Cheridito, Soner, Touzi, and Victoir (2007) and Denis, Hu, and Peng (2011). We refer to Pardoux and Peng (1992), Pardoux and Peng (1990), and El Karoui, Peng, and Quenez (1997) for a detailed study of BSDEs and their applications within the field of mathematical finance. If a
nonlinear expectationξ±is sublinear, thenπ(π) βΆ=ξ±(βπ)defines a coherent monetary risk mea- sure as introduced by Artzner, Delbaen, Eber, and Heath (1999), Delbaen (2000), and Delbaen (2002), see also FΓΆllmer and Schied (2011) for an overview of monetary risk measures. Moreover, ifξ± is a sublinear expectation, thenξ± is a coherent upper prevision, cf. Walley (1991), and vice versa. There is a similar one-to-one relation between convex expectations, convex upper previ- sions, cf. Pelessoni and Vicig (2003) and Pelessoni and Vicig (2005), and convex risk measures, cf. FΓΆllmer and Schied (2002) and Frittelli and Rosazza Gianin (2002). Further concepts, which are closely related to nonlinear expectations and describe model uncertainty, are Choquet capac- ities (see, e.g., Dellacherie & Meyer,1978), game-theoretic probability by Vovk and Shafer (2014), and niveloids, see, for example, Cerreia-Vioglio, Maccheroni, Marinacci, and Rustichini (2014).
Our setup is inspired by Peng (2005), where Markov chains under nonlinear expectations are considered in an axiomatic way. However, the existence of stochastic processes under nonlinear expectations has only been considered in terms of finite-dimensional nonlinear marginal distri- butions, whereas completely path-dependent functionals could not be regarded. Markov chains under model uncertainty have been considered among others by Avellaneda and Buff (1999), De Cooman, Hermans, and Quaeghebeur (2009), Hartfiel (1998), and Ε kulj (2009). Avellaneda and Buff (1999) study a finite difference discretization of the uncertain volatility model lead- ing to a Markov chain setting. Hartfiel (1998) considers so-called Markov set-chains in discrete time, using matrix intervals in order to describe model uncertainty in the transition matrices.
Later, Ε kulj (2009) approached Markov chains under model uncertainty using Choquet capaci- ties, which results in higher dimensional matrices on the power set, while De Cooman et al. (2009) considered imprecise Markov chains using an operator-theoretic approach with upper and lower expectations. In Denk et al. (2018, Example 5.3), Denk et al. describe model uncertainty in the transition matrix via a nonlinear transition operator, which, together with the results obtained in Denk et al. (2018), allows the construction of discrete-time Markov chains on the canonical path space. In continuous time, in particular, computational aspects of sublinear imprecise Markov chains have been studied amongst others by Krak, De Bock, and Siebes (2017) and Ε kulj (2015).
Another concept that is closely related to Markov chains under nonlinear expectations, as dis- cussed in the present paper, are BSDEs on Markov chains by Cohen and Elliott (2008) and Cohen and Elliott (2010a), see also Cohen and Szpruch (2012), Cohen and Hu (2013), and Cohen and Elliott (2010b) for the discrete-time case. Here, a reference Markov chainπ = (ππ‘)π‘β₯0with gener- ator(ππ‘)π‘β₯0is fixed, and one considers BSDEs driven byπ. This can be viewed as a discretization of the classical BSDE setup, where the state space isβ, the driving process is a Brownian Motion, and the generator is1
2ππ₯π₯. Cohen and Szpruch (2012) show that Markovian solutions to BSDEs on Markov chains are related via their driver to a system
π’β²(π‘) = π(π‘, π’(π‘)) + π΄(π‘)π’(π‘) for allπ‘β₯0, π’(0) = π’0
of nonlinear ordinary differential equations with a nonlinear functionπthat is assumed to be globally Lipschitz in the variableπ’. In the present paper,π(π‘, π’) =ξ½π’ for a convex operatorξ½. The biggest difference between our approach and the theory of BSDEs on Markov chains lies in the fact that we do not consider a fixed reference Markov chain that drives the model. On the other hand, our approach is restricted to considering Markovian solutions to BSDEs on Markov chains.
From a technical standpoint, further differences are that the theory of BSDEs allows for more gen- erality in terms of nonlinearity of the driver, while we do not require global Lipschitz continuity of the generator allowing for a possibly unbounded convex conjugate. Additionally, we only focus
on the time-homogeneous case. However, regarding the existence of Markov chains under con- vex expectations and their connection to nonlinear ordinary differential equations (ODEs), this restriction could easily be overcome with a slight modification of the construction of the transition operators.
Dentcheva and RuszczyΕski (2018) consider Markov risk measures for a countable state space, see also Fan and RuszczyΕski (2018a), Fan and RuszczyΕski (2018b), and RuszczyΕski (2010) for the discrete-time case. Here, the focus lies on time-consistent risk measurement related to a fixed reference continuous-time Markov chainπ = (ππ‘)π‘β₯0. Using so-called semiderivatives in the direction of the generatorπ΄, the authors derive, in the case of a coherent risk measure, a sub- linear ordinary differential equation related to the risk measure, where the dual representation of the nonlinear generator depends on the generatorπ΄of the baseline modelπ. Clearly, in the theory of Markov risk measures, the focus lies more on law-invariant risk measures such as the average value at risk, and is therefore not directly comparable with our approach, where we explic- itly avoid to fix a baseline model but rather try to capture very general forms of uncertainty in the generator. However, on a technical level, our approach also allows to consider risk evaluations related to convex generators that do not depend on a fixed reference generator.
In view of the aforementioned existing literature on imprecise versions of Markov chains, the contribution of this paper can be summarized as follows (see Remark2.6for further details):
β We propose a framework describing Markov chains under model uncertainty in terms of the rate matrix. Our approach complements the existing literature on BSDEs on Markov chains and Markov risk measures covering a different range of examples and applications in a consistent way. The key difference between our framework and the aforementioned existing approaches lies in the fact that we do not consider a fixed reference Markov chain describing the dynam- ics of an underlying asset. Moreover, our approach relies on analytic rather than stochastic methods using distributional rather than pathwise properties, and thus leading to restrictions in certain directions but advantages in other directions.
β We show that, as in the linear case, Markov chains under convex expectations with certain regularity at time 0 are linked via a one-to-one relation to certain convex functions (their gen- erator) and to solutions to convex differential equations, which can be solved, for example, by using an explicit Euler method or any other RungeβKutta method. In particular, we prove the global existence of solutions to a class of convex differential equations with unbounded convex conjugate, that is, without a global Lipschitz condition on the generator.
β We show that the transition semigroup of a convex Markov chain can be explicitly constructed using any (!) dual representation of the generator. In particular, for numerical computations, a
βminimalβ dual representation in terms of certain βcorner pointsβ can be used to solve the non- linear Kolmogorov equation. Based on the explicit construction of the semigroup, we propose a novel algorithm for the numerical computation of solutions to a class of nonlinear ODEs. More- over, we show that every convex transition semigroup is the least upper bound (in the sense of semigroups) of a family of linear transition semigroups, and vice versa.
β The convex expectations we consider are defined on the whole path space without fixing any reference measure. We show that the nonlinear expectation, although possibly undominated, always admits a dual representation in terms of countably additive probability measures. More- over, we derive an explicit dual representation in terms of an optimal control problem, where nature tries to control the system into the worst possible scenario, giving a control-theoretic interpretation to Markov chains under convex expectations.
1.1 Structure of the paper
In Section2, we fix the notation, introduce our setup and basic definitions, and state the main result (Theorem2.5). In Section3, we prove the first part of Theorem2.5(implications(π£) β (ππ) β (π) β (πππ)). The main tool, we use in this part, is convex duality inβπ. Moreover, we discuss how, in the sublinear case, computational efficiency can be improved by reducing compact and suitably convex sets of generator matrices to their βcorner points.β The effectiveness of this reduction is demonstrated in Section5. In Section4, we prove the remaining implications(πππ) β (ππ£) β (π£) of Theorem2.5. Here, we use a combination of so-called Nisio semigroups, as introduced in Nisio (1976/77), the theory of ordinary differential equations, and a Kolmogorov-type extension theorem for convex expectations derived in Denk et al. (2018). We conclude this section by showing that the semigroup envelope admits a dual representation as a cost functional related to an optimal control problem. In Section5, we use and compare two different numerical methods, based on the results from Sections3and4, in order to compute price bounds for European contingent claims, where the underlying is a discrete version of a Brownian motion with drift uncertainty (g-framework) and volatility uncertainty (G-framework).
2 NOTATION, BASIC DEFINITIONS, AND MAIN RESULT
Given a measurable space(Ξ©,ξ²), we denote the space of all bounded measurable functionsΞ© β βbyξΈβ(Ξ©,ξ²). Anonlinear expectationis then a functionalξ± βΆξΈβ(Ξ©,ξ²) β β, which satisfies
β ξ±(π)β€ ξ±(π)wheneverπ(π)β€π(π)for allπ β Ξ©,
β ξ±(πΌ1Ξ©) = πΌfor allπΌ β β.
Ifξ±is additionally convex, that is, for allπ, π βξΈβ(Ξ©,ξ²)andπ β [0, 1],
ξ±(ππ + (1 β π)π)β€πξ±(π) + (1 β π)ξ±(π),
we say thatξ±is aconvex expectation. It is well known (see, e.g., Denk et al.,2018or FΓΆllmer &
Schied,2011) that every convex expectationξ±admits a dual representation in terms of finitely addi- tive probability measures. Ifξ±, however, even admits a dual representation in terms of (countably additive) probability measures, we say that(Ξ©,ξ²,ξ±)is aconvex expectationspace. More precisely, we say that(Ξ©,ξ²,ξ±)is aconvex expectationspace if there exists a setξΌof probability measures on (Ξ©,ξ²)and a family(πΌβ)ββξΌ β [0, β)withinfββξΌπΌβ= 0such that
ξ±(π) = sup
ββξΌ(πΌβ(π) β πΌβ)
for allπ βξΈβ(Ξ©,ξ²). Here,πΌβdenotes the expectation w.r.t. a probability measureβon(Ξ©,ξ²). If πΌβ= 0 for all β βξΌ, we say that (Ξ©,ξ²,ξ±) is a sublinear expectation space. Here, the set
ξΌrepresents the set of all models that are relevant under the expectationξ±. In the case of a sublin- ear expectation space, the functionalξ± is the best case among all plausible modelsξΌ. In the case of a convex expectation space, the functionalξ±is a weighted best case among all plausible models
ξΌwith an additional penalization termπΌβfor everyβ βξΌ. Intuitively,πΌβcan be seen as a mea- sure for how much importance we give to the priorβ βξΌunder the expectationξ±. For example,
a low penalization, that is,πΌβclose or equal to 0, gives more importance to the modelβ βξΌthan a high penalization.
Throughout, we consider a finite nonempty state spaceπwith cardinality π βΆ=|π|β β. We endowπ with the discrete topology2π and w.l.o.g. assume thatπ = {1, β¦ , π}. The space of all bounded measurable functionsπ β βcan therefore be identified byβπvia
π’ = (π’1, β¦ , π’π)π withπ’πβΆ= π’(π) for allπ β {1, β¦ , π}.
Therefore, we denote bounded measurable functionsπ’as vectors of the formπ’ = (π’1, β¦ , π’π)πβ βπ, whereπ’πrepresents the value ofπ’in the stateπ β {1, β¦ , π}. Onβπ, we consider the norm
βπ’βββΆ= max
π=1,β¦,π|π’π|= max
πβ{1,β¦,π}|π’(π)|
for a vectorπ’ β βπ. Moreover, forπΌ β β, the vectorπΌ β βπdenotes the constant vectorπ’ β βπ withπ’π= πΌfor allπ β {1, β¦ , π}. For an arbitrary matrixπ = (πππ)1β€π,πβ€πβ βπΓπ, we denote byβπβ
the operator norm ofπ βΆ βπβ βπw.r.t. the normββ ββ, that is,
βπβ= sup
π£ββπ⧡{0}
βππ£ββ
βπ£ββ = max
π=1,β¦,π
( π
β
π=1
|πππ| )
.
Inequalities of vectors are always understood componentwise, that is, forπ’, π£ β βπ, π’β€π£ βΊ βπ β {1, β¦ , π} βΆ π’π β€π£π.
In the same way, all concepts inβπthat include inequalities are to be understood componentwise.
For example, a vector fieldπΉ βΆ βπ β βπis calledconvexif
πΉπ(ππ’ + (1 β π)π£)β€ππΉπ(π’) + (1 β π)πΉπ(π£)
for allπ β {1, β¦ , π},π’, π£ β βπandπ β [0, 1]. A vector fieldπΉis calledsublinearif it is convex and positive homogeneous (of degree 1). Moreover, for a setπ β βπof vectors, we writeπ’ = sup π ifπ’π= supπ£βππ£πfor allπ β {1, β¦ , π}andπ’ = max πifπ’ = sup πand, for allπ β {1, β¦ , π}, there exists someπ£ β πwithπ’π= π£π.
In the following, we briefly recall the basic definitions and concepts from the theory of (time-homogeneous) Markov chains. A (time-homogeneous) Markov chain is a quadruple (Ξ©,ξ², (β1, β¦ , βπ), (ππ‘)π‘β₯0), where:
(M1) (Ξ©,ξ²)is a measurable space.
(M2) ππ‘βΆ Ξ© β {1, β¦ , π}isξ²-measurable for allπ‘β₯0.
(M3) (β1, β¦ , βπ)is a collection of probability measures, where, forπ β {1, β¦ , π},βπ(π0= π) = 1, that is,βπdenotes the probability distribution under which the Markov chain starts in the stateπ. Moreover, we use the notation
πΌπ(π) βΆ= πΌβπ(π) and πΌ(π) βΆ= (πΌ1(π), β¦ , πΌπ(π))π forπ β {1, β¦ , π}and all random variablesπ βΆ Ξ© β β.
(M4) For allπ , π‘β₯0andπ β {1, β¦ , π},
πΌπ(π’(ππ +π‘)|ξ²π ) = πΌπ(π’(ππ‘+π )|ππ ) = πΌππ (π’(ππ‘)).
In particular,πΌπ(π’(ππ‘+π )|ππ = π) = πΌπ(π’(ππ‘)) for allπ, π β {1, β¦ , π}.
A matrixπ = (πππ)1β€π,πβ€πβ βπΓπis called aQ-matrixorrate matrixif it satisfies the following conditions:
(Q1) πππβ€0for allπ β {1, β¦ , π},
(Q2) πππβ₯0for allπ, π β {1, β¦ , π}withπβ π, (Q3) βπ
π=1πππ = 0for allπ β {1, β¦ , π}.
It is well known that every continuous-time Markov chain with certain regularity properties at timeπ‘ = 0can be related to aQ-matrixand vice versa. More precisely, for a matrixπ β βπΓπ, the following statements are equivalent:
(i) πis aQ-matrix.
(ii) There is a Markov chain(Ξ©,ξ², (β1, β¦ , βπ), (ππ‘)π‘β₯0)such that
ππ’0= lim
ββ0
πΌ(π’0(πβ)) β π’0
β for allπ’0β βπ, whereπ’0(π)is theπth component ofπ’0forπ β {1, β¦ , π}.
In this case, for each vectorπ’0β βπ, the functionπ’ βΆ [0, β) β βπ, π‘ β¦ πΌ(π’0(ππ‘))is the unique classical solutionπ’ β πΆ1([0, β); βπ)to the initial value problem
π’β²(π‘) = ππ’(π‘), π‘β₯0, π’(0) = π’0,
that is,π’(π‘) = ππ‘ππ’0for allπ‘β₯0, whereππ‘πis the matrix exponential ofπ‘π. We refer to Norris (1998) for a detailed illustration of this relation.
We say that a (possibly nonlinear) operatorξ½βΆ βπβ βπsatisfies thepositive maximum prin- cipleif, for everyπ’ = (π’1, β¦ , π’π)πβ βπandπ β {1, β¦ , π},
(ξ½π’)π β€0 wheneverπ’π β₯π’πfor allπ β {1, β¦ , π}.
This notion is motivated by the positive maximum principle for generators of Feller processes, see, for example, Jacob (2001, Equation (0.8)). Notice that a matrixπ β βπΓπis aQ-matrixif and only if it satisfies the positive maximum principle andπ1 = 0, where1 βΆ= (1, β¦ , 1)πβ βπdenotes the constant 1 vector. In fact, Property (Q3) is just a reformulation ofπ1 = 0. Moreover, ifπsatisfies the positive maximum principle, thenπππ= (πππ)πβ€0for allπ β {1, β¦ , π}andβπππ = (π(βππ))πβ€ 0for allπ, π β {1, β¦ , π}withπβ π. That is,πfulfills (Q1) and (Q2). On the other hand, ifπis a Q-matrix,π’ = (π’1, β¦ , π’π)πβ βπandπ β {1, β¦ , π}withπ’πβ₯π’πfor allπ β {1, β¦ , π}, then(ππ’)π=
βπ
π=1ππππ’πβ€π’πβπ
π=1πππ = 0, which shows thatπsatisfies the positive maximum principle.
To state the main result, we introduce the following definitions.
Definition 2.1. A (possibly nonlinear) mapξ½βΆ βπ β βπis called aQ-operatorif the following conditions are satisfied:
(i) (ξ½πππ)πβ€0for allπ > 0and allπ β {1, β¦ , π},
(ii) (ξ½(βπππ))π β€0for allπ > 0and allπ, π β {1, β¦ , π}withπβ π, (iii) ξ½πΌ = 0for allπΌ β β, where we identifyπΌwith(πΌ, β¦ , πΌ)π β βπ.
Definition 2.2.Aconvex Markov chainis a quadruple(Ξ©,ξ²,ξ±, (ππ‘)π‘β₯0)that satisfies the following conditions:
(i) (Ξ©,ξ²)is a measurable space.
(ii) ππ‘βΆ Ξ© β {1, β¦ , π}isξ²-measurable for allπ‘β₯0.
(iii) ξ±= (ξ±1, β¦ ,ξ±π)π, where (Ξ©,ξ²,ξ±π) is a convex expectation space for all π β {1, β¦ , π} and
ξ±(π’0(π0)) = π’0. Here and in the following, we use the notation
ξ±(π) βΆ= (ξ±1(π), β¦ ,ξ±π(π))πβ βπ
forπ βξΈβ(Ξ©,ξ²).
(iv) The following version of the Markov property is satisfied: For allπ , π‘β₯0,π β β,0β€π‘1<
β― < π‘π β€π , andπ£0β (βπ)(π+1),
ξ±(π£0(π, ππ +π‘)) =ξ±[
ξ±ππ ,π‘(π£0(π, β ))]
, (1)
whereπ βΆ= (ππ‘1, β¦ , ππ‘π)andξ±π,π‘(π’0) βΆ=ξ±π(π’0(ππ‘))for allπ’0β βπandπ β {1, β¦ , π}. We say that the Markov chain (Ξ©,ξ²,ξ±, (ππ‘)π‘β₯0) is linear or sublinear if the mapping ξ±βΆ
ξΈβ(Ξ©,ξ²) β βπis, additionally, linear, or sublinear, respectively.
Notice that the properties(π)β(πππ) in the previous definition are a one-to-one translation of (M1)β(M3) to a convex setup. The Markov property given in(ππ£)of the previous definition is the nonlinear analog of the classical Markov property (M4) without using conditional expectations.
Due to the nonlinearity of the expectation, the definition and, in particular, the existence of a conditional (nonlinear) expectation are quite involved, which is why we avoid to introduce this concept. In order to get the idea behind the formulation in (iv), chooseπ£0= π’(ππ +π‘)1π΅(π)for a measurable functionπ’ βΆ {1, β¦ , π} β βand arbitraryπ΅ β {1, β¦ , π}π. Then, ifξ±is linear, Equation (1) reads as
ξ±(π’(ππ +π‘)1π΅(π)) =ξ±(
ξ±ππ ,π‘(π’)1π΅(π)) ,
which is equivalent to (M4). On the other hand, for every linear Markov chain, Property (M4) implies Property(ππ£). Hence, in the linear case, Definition2.2is consistent with the classical def- inition of a Markov chain.
In line with Denk et al. (2018, Definition 5.1), we say that a (possibly nonlinear) mapξ± βΆ βπβ βπis akernel, ifξ± ismonotone, that is,ξ±(π’)β€ ξ±(π£)for allπ’, π£ β βπwithπ’β€π£, andξ±preserves constants, that is,ξ±(πΌ) = πΌfor allπΌ β β.
Definition 2.3. A family S= (S(π‘))π‘β₯0of (possibly nonlinear) operatorsS(π‘) βΆ βπβ βπ is called asemigroupif
(i) S(0) = πΌ, whereπΌ = πΌπis theπ-dimensional identity matrix, (ii) S(π + π‘) =S(π )S(π‘)for allπ , π‘β₯0.
Here and throughout, we make use of the notationS(π )S(π‘) βΆ=S(π ) β¦S(π‘). If, additionally, S(β) β πΌuniformly on compact sets asβ β 0, we say that the semigroupSisuniformly con- tinuous. We callSMarkovianifS(π‘)is akernelfor allπ‘β₯0. We say thatSislinear,sublinear, or convexifS(π‘)is linear, sublinear, or convex for allπ‘β₯0, respectively.
Definition 2.4. LetξΌβ βπΓπ be a set ofQ-matrices andπ = (ππ)πβξΌ a family of vectors with supπβξΌππ= ππ0 = 0for someπ0βξΌ, that is,ππβ€0for allπ βξΌ and there exists someπ0βξΌ withππ0 = 0. We denote by
ππ(π‘)π’0βΆ= πππ‘π’0+β«
π‘ 0
πππ ππdπ = π’0+β«
π‘ 0
ππ π(
ππ’0+ ππ) dπ
forπ‘β₯0,π’0β βπandπ βξΌ. Then,ππ= (ππ(π‘))π‘β₯0is an affine linear semigroup. We call a semi- groupSthe(upper) semigroup envelope(later alsoNisio semigroup) of(ξΌ, π)if
(i) S(π‘)π’0β₯ππ(π‘)π’0for allπ‘β₯0,π’0β βπandπ βξΌ,
(ii) for any other semigroup T satisfying (i) we have that S(π‘)π’0β€T(π‘)π’0 for all π‘β₯0 and π’0β βπ.
That is, the semigroup envelope S is the smallest semigroup that dominates all semigroups (ππ)πβξΌ.
The following main theorem gives a full characterization of convexQ-operators.
Theorem 2.5. Letξ½βΆ βπβ βπbe a mapping. Then, the following statements are equivalent:
(i) ξ½is a convexQ-operator.
(ii) ξ½is convex, satisfies the positive maximum principle, andξ½πΌ = 0for allπΌ β β, whereπΌ βΆ=
(πΌ, β¦ , πΌ)πβ βπ.
(iii) There exists a set ξΌβ βπΓπ ofQ-matrices and a family π = (ππ)πβξΌ β βπ of vectors with ππβ€0for allπ βξΌandππ0 = 0for someπ0βξΌ, such that
ξ½π’0= sup
πβξΌ
(ππ’0+ ππ)
(2)
for allπ’0β βπ, where the supremum is to be understood componentwise.
(iv) There exists a uniformly continuous convex Markovian semigroupSwith
ξ½π’0= lim
ββ0
S(β)π’0β π’0 β
for allπ’0β βπ.
(v) There is a convex Markov chain(Ξ©,ξ²,ξ±, (ππ‘)π‘β₯0)such that
ξ½π’0= lim
ββ0
ξ±(π’0(πβ)) β π’0 β
for allπ’0β βπ.
In this case, for each initial valueπ’0β βπ, the functionπ’ βΆ [0, β) β βπ, π‘ β¦ξ±(π’0(ππ‘))is the unique classical solutionπ’ β πΆ1([0, β); βπ)to the initial value problem
π’β²(π‘) =ξ½π’(π‘) = sup
πβξΌ
(ππ’(π‘) + ππ)
, π‘β₯0, (3)
π’(0) = π’0.
Moreover, the Markovian semigroupSfrom (iv) is the (upper) semigroup envelope of(ξΌ, π), and π’(π‘) =S(π‘)π’0for allπ‘β₯0.
Remark2.6. Consider the situation of Theorem2.5.
(a) The dual representation in(πππ)gives a model uncertainty interpretation toQ-operators. The setξΌ can be seen as the set of all plausible rate matrices, when considering theQ-operator
ξ½. For everyπ βξΌ, the vectorππβ€0can be interpreted as a penalization, which measures how much importance we give to each rate matrixπ. The requirement that there exists some π0βξΌ withππ0 = 0can be interpreted in the following way: There exists at least one rate matrixπ0 within the set of all plausible rate matricesξΌ to which we assign the maximal importance, which is the minimal penalization.
(b) The semigroup envelope S of(ξΌ, π)can be constructed more explicitly, in particular, an explicit (in terms of(ξΌ, π)) dual representation can be derived. For details, we refer to Sec- tion4(Definition4.2and Remark4.18). Moreover, we would like to highlight that the semi- group envelopeScan be constructed w.r.t. any dual representation(ξΌ, π)as in (πππ) and results in the unique classical solution to (3) independent of the choice of the dual represen- tation(ξΌ, π)ofξ½. This gives, in some cases, the opportunity to efficiently compute the semi- group envelope numerically via its primal/dual representation (see Remark3.3and Exam- ple5.2).
(c) The same equivalence as in Theorem2.5holds if convexity is replaced by sublinearity in(π), (ππ),(ππ£), and(π£)andππ= 0for allπ βξΌin(πππ). In this case, the setξΌin(πππ)can be chosen to be compact as we will see in the proof of Theorem2.5.
(d) Theorem2.5extends and includes the well-known relation between (linear) Markov chains, Q-matrices, and ordinary differential equations.
(e) A remarkable consequence of Theorem2.5is that every convex Markovian semigroup, which is differentiable at timeπ‘ = 0, is the semigroup envelope with respect to the FenchelβLegendre transformation (or any other dual representation as in(πππ)of its generator, which is a convex Q-operator.
(f) Althoughξ½has an unbounded convex conjugate, the convex initial value problem
π’β²(π‘) =ξ½π’(π‘) for allπ‘β₯0, π’(0) = π’0, (4) has a unique global solution.
(g) Solutions to (4) remain bounded. Therefore, a Picard iteration or RungeβKutta methods, such as the explicit Euler method, can be used for numerical computations, and the convergence rate (depending on the size of the initial valueπ’0) can be derived from the a priori estimate in Banachβs fixed point theorem.
(h) As in the linear case, by solving the differential equation (4), one can (numerically) compute expressions of the form
π’(π‘) =ξ±(π’0(ππ‘)).
We illustrate this computation procedure in Example5.1.
3 PROOF OF (π) β (ππ) β (π) β (πππ)
We say that a setξΌ β βπΓπof matrices isrow-convexif, for any diagonal matrixπ β βπΓπwith ππ βΆ= πππβ [0, 1]for allπ β {1, β¦ , π},
ππ + (πΌ β π)π βξΌ for allπ, π βξΌ,
whereπΌ = πΌπβ βπΓπ is theπ-dimensional identity matrix. Notice that, for allπ β {1, β¦ , π}, the πth row of the matrixππ + (πΌ β π)πis the convex combination of theπth row ofπandπwithππ. Notice that a setξΌβ βπΓπis row-convex if and only if it is convex and, for arbitraryπ, π βξΌ, the matrix that results from replacing theπth row ofπby theπth row ofπis again an element ofξΌ.
For example, the set of allQ-matrices is row-convex.
Remark3.1. Letξ½be a convexQ-operator. For every matrixπ β βπΓπ, let
ξ½β(π) βΆ= sup
π’ββπ
(ππ’ βξ½(π’)) β [0, β]π
be theconjugate functionofξ½. Notice that0β€ ξ½β(π)for allπ β βπΓπ, sinceξ½(0) = 0. Let
ξΌββΆ={
π β βπΓπ|||ξ½β(π) β [0, β)π}
andππββΆ= βξ½β(π)for allπ βξΌβ. Then, the following facts are well-known results from convex duality theory inβπ.
(a) The setξΌβis row-convex and the mappingξΌββ βπ, π β¦ξ½β(π)is lower semicontinuous.
(b) Let πβ₯0 and ξΌπβ βΆ= {π β βπΓπ|ξ½β(π)β€π}. Then, ξΌπβ β βπΓπ is compact and row- convex. Therefore,
ξ½π βΆ βπ β βπ, π’ β¦ max
πβξΌβπ
(ππ’ + ππβ)
(5)
defines a convex operator, which is Lipschitz continuous. Notice that the maximum in (5) is to be understood componentwise. However, for fixedπ’0β βπ, the maximum can be attained, simultaneously in every component, by a single element ofξΌπβ, that is, for allπ’0β βπ, there exists someπ0βξΌπβ with
ξ½ππ’0= π0π’0+ ππβ0.
This is due to the fact thatξΌπβ is row convex and that, forπ βξΌβ, theπth component of the vectorπβπonly depends on theπth row ofπ.
(c) Letπ β₯0. Then, there exists someπβ₯0, such that
ξ½π’0= max
πβξΌπβ
(ππ’0+ ππβ)
=ξ½ππ’0
for allπ’0β βπwithβπ’0βββ€π . In particular,ξ½is locally Lipschitz continuous and
ξ½π’0= max
πβξΌβ
(ππ’0+ ππβ)
for allπ’0β βπ,
where, for fixedπ’0β βπ, the maximum can be attained, simultaneously in every component, by a single element ofξΌβ. In particular, there exists someπ0βξΌβwithπβπ0 = supπβξΌβπβπ=
ξ½(0) = 0.
Proof of Theorem2.5.(π£) β (ππ): Asξ±πis a convex expectation for allπ β {1, β¦ , π}, it follows that the operatorξ½is convex withξ½πΌ = 0for allπΌ β β. Now, letπ’0β βπandπ β {1, β¦ , π}withπ’0,π β₯π’0,π
for allπ β {1, β¦ , π}. LetπΌ > 0be such that
βπ’0+ πΌββ= (π’0+ πΌ)π= π’0,π+ πΌ,
and defineπ£0βΆ= π’0+ πΌ. Then,
ξ½π£0= lim
ββ0
ξ±(π’0(πβ) + πΌ) β π£0
β = lim
ββ0
ξ±(π’0(πβ)) β π’0
β =ξ½π’0. Assume that(ξ½π’0)π> 0. Then, there exists someβ > 0such that
ξ±π(π£0(πβ)) β π£0,π > 0.
Hence,
βββξ±(π£0(πβ))βββββ₯ ξ±π(π£0(πβ)) > π£0,π =βπ£0ββ,
which is a contradiction to
βββξ±(π£0(πβ))βββββ€βπ£0ββ. This shows thatξ½satisfies the positive maximum principle.
(ππ) β (π): This follows directly from the positive maximum principle, considering the vectors πππandβπππfor allπ > 0andπ β {1, β¦ , π}.
(π) β (πππ): Letξ½be a convexQ-operator. Moreover, letξΌβandπβ= (ππβ)πβξΌβbe as in Remark5.
Then, by Remark5(c), it only remains to show that everyπ βξΌβis aQ-matrix. To this end, fix an arbitraryπ βξΌβ. Then, for allπΌ β β,
ππΌ = 1
ππ(ππΌ)β€ 1
π(ξ½(ππΌ) +ξ½β(π)) = 1
πξ½β(π) β 0 asπ β β.
Therefore,ππΌβ€0for allπΌ β β. Sinceπ is linear, it follows thatπ1 = 0. Now, letπ β {1, β¦ , π}. Then, by definition of aQ-operator, we obtain that
πππβ€ 1
π(ξ½(πππ) +ξ½β(π))πβ€ 1
π(ξ½β(π))πβ 0 asπ β β,
that is,πππβ€0. Now, letπ, π β {1, β¦ , π}withπβ π. Then, again by definition of aQ-operator, it follows that
βπππ β€ 1
π(ξ½(βπππ) +ξ½β(π))πβ€ 1
π(ξ½β(π))πβ 0 asπ β β, that is,πππ β₯0. Therefore,πis aQ-matrix.
It remains to show the implications (πππ) β (ππ£) β (π£), which is done in the entire next
section. β‘
Before we start with the proof of the remaining implications(πππ) β (ππ£) β (π£), we would like to point out how, in the sublinear case, the setξΌβofQ-matrices from Remark3.1can be reduced to certain βcorner points.β This can be done using the concept of row convexity, introduced at the beginning of this section, together with Minkowskiβs theorem on extremal points of convex sets inβπ. LetξΉβ βπΓπbe a nonempty set of matrices. Then, we define therow-convex hullofξΉ by
rch(ξΉ) βΆ=
{βπ π=1
ππππ||
||π β β , π1, β¦ , ππ β [0, β)πΓπ,
βπ π=1
ππ= πΌ, π1, β¦ ππ βξΉ }
.
For a convex setπΆ β βπ, we denote the set of all extreme points ofπΆ byπΈ(πΆ). Recall that an extreme point of a convex setπΆ β βπ is an elementπ₯ β πΆsuch thatπ₯ = ππ¦ + (1 β π)π§, forπ β (0, 1)andπ¦, π§ β πΆ, implies thatπ₯ = π¦ = π§. For a matrixπ β βπΓπandπ β {1, β¦ , π}, we denote by
ππβΆ= (ππ1, β¦ , πππ) β βπ
theπth row ofπ. LetξΌβ βπΓπbe a nonempty compact row-convex set of matrices. Then, we say that a setξΎβξΌisξΌ-row-extremeif
{ππ|π βξΎ} = πΈ({ππ|π βξΌ}) for allπ β {1, β¦ , π}.
That is, the set of allπth rows ofξΎis the set of all extreme points of theπth rows ofξΌ. We say that a setξΉβξΌisminimalξΌ-row-extreme, ifξΉis row-extreme forξΌ andξΎβξΉimpliesξΎ=ξΉ for anyξΌ-row-extreme setξΎβξΌ.
Proposition 3.2. LetξΌ β βπΓπbe nonempty, compact, and row-convex. Then, there exists a min- imalξΌ-row-extreme setξΉβξΌ. Moreover,ξΌ = rch(ξΎ)is the row-convex hull of any (minimal)
ξΌ-row-extreme setξΎβξΌand
maxπβξΌ ππ’0= max
πβξΎππ’0 for allπ’0β βπ, (6)
where the maxima are to be understood componentwise.
Proof. By Minkowskiβs theorem, the set of allξΌ-row-extreme sets is nonempty, and one readily verifies that the latter together with the partial orderβͺ―, given byξΎ1βͺ―ξΎ2if and only ifξΎ1βξΎ2, has the chain property. Hence, by Zornβs lemma, there exists a maximal elementξΉwithin the set of allξΌ-row-extreme sets, which, by definition, is a minimalξΌ-row-extreme set. Now, letξΎ be an arbitraryξΌ-row-extreme set andπ’0β βπ. Then,
maxπβξΌ (ππ’0)π = max
πβξΌ (ππβ π’0) = max
πβξΎ (ππβ π’0) = max
πβξΎ (ππ’0)π.
β‘
Remark3.3. Letξ½βΆ βπβ βπbe a sublinearQ-operator, andξΌβas in Remark3.1. Then,
ξΌβ={
π β βπ|||ππβ=ξ½β(π) = 0}
is a nonempty, compact, and row-convex set. By the previous proposition, there exists a minimal
ξΌβ-row-extreme setξΉβξΌβ, and, for allπ’0β βπ,
ξ½π’0= max
πβξΉππ’0,
where the maximum is to be understood componentwise. Sinceπβ= (ππβ)πβξΌβ = 0, it follows that (ξΉ, 0)is a dual representation as in Theorem2.5(iii). Notice that, in many cases, the cardinality ofξΉis way smaller than the cardinality ofξΌβ. Therefore, concerning computational aspects, the dual representation(ξΉ, 0)is often way more tractable than the dual representation(ξΌβ, 0), and, by Theorem2.5, both representations result in the same semigroup envelope, and thus, the same solution to the ODE (3).
Example 3.4. Letπ0, π β βπΓπbe two fixedQ-matrices andππ, πββ βwithππβ€πβ. We define the sublinearQ-operatorξ½βΆ βπβ βπby
ξ½π’0βΆ= π0π’0+ max
πβ[ππ,πβ]πππ’0 for allπ’0β βπ.
We consider the maximal row-convex setξΌββ βπΓπrepresentingξ½, defined as in Remark3.1.
Then,
ξΌβ={
π0+ ππ|||π βdiag([ππ, πβ])} ,
where diag([ππ, πβ])denotes the set of all diagonal matricesπ β βπΓπwith diagonal entriesπππβ [ππ, πβ]for allπ β {1, β¦ , π}. Now, let
ξΉβΆ= {π0+ πππ, π0+ πβπ}.
Then,ξΉis a minimalξΌβ-row-extreme set, and thus,ξΌβ= rch(ξΉ). In particular, by the previous remark, the tuple
({π0+ πππ, π0+ πβπ}, (0, 0))
is a dual representation as in Theorem2.5(iii), which is way more tractable than the dual repre- sentation(ξΌβ, 0).
4 PROOF OF (πππ) β (ππ) β (π)
Throughout, letξΌβ βπΓπbe a set ofQ-matrices andπ = (ππ)πβξΌ β βπwithππβ€0for allπ βξΌ andππ0 = 0for someπ0βξΌ, such that the map
ξ½βΆ βπβ βπ, π’ β¦ sup
πβξΌ
(ππ’ + ππ)
is well-defined. For everyπ βξΌ, we consider the linear ODE
π’β²(π‘) = ππ’(π‘) + ππ, forπ‘β₯0, (7)
withπ’(0) = π’0β βπ. Then, by a variation of constant, the solution to (7) is given by
π’(π‘) = πππ‘π’0+β«
π‘ 0
πππ ππdπ = π’0+β«
π‘ 0
ππ π(
ππ’0+ ππ
)dπ =βΆ ππ(π‘)π’0 (8)
forπ‘β₯0, whereππ‘πβ βπΓπ is the matrix exponential ofπ‘π for allπ‘β₯0. Then, the familyππ= (ππ(π‘))π‘β₯0defines a uniformly continuous semigroup of affine linear operators (see Definition2.3).
Remark4.1. Note that, for allπ βξΌandπ‘β₯0, the matrix exponentialππ‘πβ βπΓπis astochastic matrix, that is,
(i) (ππ‘π)ππβ₯0for allπ, π β {1, β¦ , π}, (ii) ππ‘π1 = 1.
Therefore,ππ‘πβ βπΓπis a linear kernel, that is,ππ‘ππ’0β€ππ‘ππ£0for allπ’0, π£0β βπwithπ’0β€π£0and ππ‘ππΌ = πΌfor allπΌ β β, which implies thatππ(π‘)is monotone for allπ βξΌandπ‘β₯0.
For the family(ππ)πβξΌor, more precisely, for(ξΌ, π), we will now construct theNisio semigroup, and show that it gives rise to the unique classical solution to the nonlinear ODE (3). To this end, we consider the set of finite partitions
π βΆ={
π β [0, β)|||0 β π,|π|< β} .
The set of partitions with end pointπ‘β₯0will be denoted byππ‘, that is,ππ‘βΆ= {π β π|max π = π‘}. Notice that
π =β
π‘β₯0
ππ‘.
For allββ₯0andπ’0β βπ, we define
ξ±βπ’0βΆ= sup
πβξΌππ(β)π’0,
where the supremum is taken componentwise. Note thatξ±βis well-defined since ππ(β)π’0= πβππ’0+β«
β 0
ππ πππdπ β€πβππ’0β€βπ’0ββ
for allπ βξΌ,ββ₯0andπ’0β βπ, where we used the fact thatπβπis a kernel. Moreover,ξ±βis a convex kernel, for allββ₯0, as it is monotone and
ξ±βπΌ = πΌ + sup
πβξΌβ«
β 0
ππ πππdπ = πΌ
for allπΌ β β, where we used the fact that there is someπ0βξΌwithππ0 = 0. For a partitionπ = {π‘0, π‘1, β¦ , π‘π} β πwithπ β βand0 = π‘0< π‘1< β― < π‘π, we set
ξ±πβΆ=ξ±π‘1βπ‘0β¦ξ±π‘πβπ‘πβ1.
Moreover, we setξ±{0}βΆ=ξ±0. Then,ξ±πis a convex kernel for allπ β πsince it is a concatenation of convex kernels.
Definition 4.2. TheNisio semigroupS= (S(π‘))π‘β₯0of(ξΌ, π)is defined by S(π‘)π’0βΆ= sup
πβππ‘ξ±ππ’0 for allπ’0β βπandπ‘β₯0.
Notice thatS(π‘) βΆ βπ β βπis well-defined and a convex kernel for allπ‘β₯0sinceξ±πis a convex kernel for allπ β π. In many of the subsequent proofs, we will first concentrate on the case, where the familyπis bounded and then use an approximation of the Nisio semigroup by means of other Nisio semigroups. This approximation procedure is specified in the following remark.
Remark 4.3. Let πβ₯0, ξΌπ βΆ= {π βξΌ| βππβββ€π} and ππβΆ= (ππ)πβξΌπ. Notice that, by assumption, there exists someπ0βξΌwithππ0 = 0, which implies thatπ0βξΌπ. SinceξΌπβξΌ (and by definition ofππ), the operator
ξ½π βΆ βπ β βπ, π£ β¦ sup
πβξΌπ
(ππ£ + ππ
)
is well-defined. LetSπbe the Nisio semigroup w.r.t.(ξΌπ, ππ)for allπβ₯0. Since
β
πβ₯0
ξΌπ=ξΌ,
it follows thatξ½π βξ½andSπ(π‘) βS(π‘), for allπ‘β₯0, asπ β β. Moreover, for allπ βξΌπ, π’0β βπwithβπ’0ββ= 1, andπ β {1, β¦ , π},
(ππ’0)πβ€(
ξ½π’0β ππ)
πβ€βξ½π’0ββ+βππβββ€π + max
π£βππβ1βξ½π£ββ,
whereππβ1βΆ= {π£ β βπ| βπ£ββ= 1} and, in the last step, we used the fact that ξ½βΆ βπβ βπ is convex and therefore continuous. This implies that the setξΌπ is bounded in the sense that supπβξΌ
πβπβ< β. In particular,
πβξΌsupπ
βππ’0+ ππβββ€ sup
πβξΌπ
(βπββπ’0ββ+βππββ)
β€π + sup
πβξΌπ
βπββπ’0ββ< β (9)
for allπ’0β βπ.
Lemma 4.4.Assume that the familyπis bounded, that is,(ξΌ, π) = (ξΌπ, ππ)for someπβ₯0. Then, for allπ’0β βπ, the mapping[0, β) β βπ, β β¦ξ±βπ’0is Lipschitz continuous.
Proof. Letπ’0β βπand0β€β1< β2. Then, by (8), for allπ βξΌ, we have that
βππ(β2)π’0β ππ(β1)π’0βββ€ β«ββ2
1
βββπππ (ππ’0+ ππ)ββββdπ β€(β2β β1)βππ’0+ ππββ,
which implies that
βξ±β2π’0βξ±β1π’0βββ€sup
πβξΌβππ(β2)π’0β ππ(β1)π’0βββ€(β2β β1) (
supπβξΌβππ’0+ ππββ )
. (10)
Note thatsupπβξΌβππ’0+ ππββ< βby (9). β‘