Markov chains under nonlinear expectation

(1)

DOI: 10.1111/mafi.12289

O R I G I N A L A R T I C L E

Markov chains under nonlinear expectation

Max Nendel

Center for Mathematical Economics, Bielefeld University, Bielefeld, Germany

Correspondence

Max Nendel, Center for Mathematical Economics, Bielefeld University, 33615 Bielefeld, Germany.

Email:Max.Nendel@uni-bielefeld.de

Funding information

Deutsche Forschungsgemeinschaft, Grant/Award Number: CRC 1283

Abstract

In this paper, we consider continuous-time Markov chains with a finite state space under nonlinear expectations. We define so-calledQ-operators as an extension of Q-matrices or rate matrices to a nonlinear setup, where the nonlinearity is due to model uncertainty. The main result gives a full characterization of convexQ-operators in terms of a positive maximum principle, a dual representation by means ofQ-matrices, time-homogeneous Markov chains under convex expectations, and a class of nonlinear ordinary differential equations. This extends a classical characterization of generators of Markov chains to the case of model uncertainty in the generator. We further derive an explicit primal and dual representation of convex semigroups arising from Markov chains under convex expectations via the Fenchel–Legendre transformation of the generator. We illustrate the results with several numerical examples, where we compute price bounds for European contingent claims under model uncertainty in terms of the rate matrix.

K E Y W O R D S

generator of nonlinear semigroup, imprecise Markov chain, model uncertainty, nonlinear expectation, nonlinear ODE

This is an open access article under the terms of theCreative Commons AttributionLicense, which permits use, distribution and reproduc- tion in any medium, provided the original work is properly cited.

474 wileyonlinelibrary.com/journal/mafi Mathematical Finance.2021;31:474–507.

(2)

1 INTRODUCTION AND MAIN RESULT

In mathematical finance, model uncertainty or ambiguity is an almost omnipresent phenomenon, which, for example, appears due to incomplete information about certain aspects of an underlying asset or insufficient data in order to perform reliable statistical estimation methods for the parameters of a stochastic process. The latter typically leads to so-called parameter uncertainty in the generator of a stochastic process. Prominent examples for this type of uncertainty include a Black–Scholes model with uncertain volatility, the so-called uncertain volatility model, cf.

Avellaneda, Levy, and Parás (1995), Avellaneda and Parás (1996), and Vorbrink (2014), and a Brownian motion under drift or volatility uncertainty leading to theg-framework, see, for example, Coquet, Hu, Mémin, and Peng (2002) or theG-framework by Peng (2007) and Peng (2008), respectively. Lately, these approaches have been generalized to Lévy processes with uncertainty in the Lévy triplet, cf. Denk, Kupper, and Nendel (2020), Hu and Peng (2009), and Neufeld and Nutz (2017), and uncertainty in the generator of Feller processes, cf. Nendel and Röckner (2019). While these works give sufficient conditions in order to guarantee the existence of stochastic processes under model uncertainty and to establish a connection to nonlinear partial differential equations, there is no necessary condition that determines the maximal degree of ambiguity that can be cap- tured by an uncertain process.

In the present paper, we address this issue in a simplified setup, where we consider a finite state space. We provide sufficient and necessary conditions in terms of the generators of time- homogeneous continuous-time Markov chains that guarantee the existence of a continuous-time Markov chain under a convex expectation. We further establish a one-to-one relation between the transition operators of convex Markov chains and a class of nonlinear ordinary differential equations. In particular, we extend a classical relation between Markov chains, rate matrices, and ordinary differential equations to the case of model uncertainty. The ordinary differential equation related to a convex Markov chain is a spatially discretized version of a Hamilton–Jacobi–

Bellman equation, and the nonlinear transition operators are related, via a dual representation, to a control problem where, roughly speaking, “nature” tries to control the system into the worst possible scenario (see Remark4.18). The explicit description of the transition operators gives rise to a numerical scheme, different from Runge–Kutta methods, for the computation of price bounds for European contingent claims under model uncertainty. We illustrate this method and other numerical methods in several examples, where we consider an underlying Markov chain, which is a discrete version, more precisely, the generator is a finite difference discretization of the generator of a Brownian motion with uncertain drift, cf. Coquet et al. (2002), and uncertain volatility, cf. Peng (2007) and Peng (2008). The main tools, we use in our analysis, are convex duality, a semigroup-theoretic approach to control problems due to Nisio (1976/77), see also Denk et al. (2020) and Nendel and Röckner (2019), and a convex version of Kolmogorov’s extension theorem due to Denk, Kupper, and Nendel (2018), which allows to extend the expectation to functionals that depend on the whole path. Restricting the time parameter, in the present work, to the set of natural numbers leads to a discrete-time Markov chain, in the sense of Denk et al.

(2018, Example 5.3).

The concept we use to describe ambiguity is the notion of a nonlinear expectation introduced by Peng (2005). Nonlinear expectations closely relate to other concepts describing model uncertainty, backward stochastic differential equations (BSDEs), cf. Cohen (2012), and Coquet et al. (2002), and 2BSDEs, cf. Cheridito, Soner, Touzi, and Victoir (2007) and Denis, Hu, and Peng (2011). We refer to Pardoux and Peng (1992), Pardoux and Peng (1990), and El Karoui, Peng, and Quenez (1997) for a detailed study of BSDEs and their applications within the field of mathematical finance. If a

(3)

nonlinear expectationis sublinear, then𝜌(𝑋) ∶=(−𝑋)defines a coherent monetary risk measure as introduced by Artzner, Delbaen, Eber, and Heath (1999), Delbaen (2000), and Delbaen (2002), see also Föllmer and Schied (2011) for an overview of monetary risk measures. Moreover, if is a sublinear expectation, then is a coherent upper prevision, cf. Walley (1991), and vice versa. There is a similar one-to-one relation between convex expectations, convex upper previ- sions, cf. Pelessoni and Vicig (2003) and Pelessoni and Vicig (2005), and convex risk measures, cf. Föllmer and Schied (2002) and Frittelli and Rosazza Gianin (2002). Further concepts, which are closely related to nonlinear expectations and describe model uncertainty, are Choquet capac- ities (see, e.g., Dellacherie & Meyer,1978), game-theoretic probability by Vovk and Shafer (2014), and niveloids, see, for example, Cerreia-Vioglio, Maccheroni, Marinacci, and Rustichini (2014).

Our setup is inspired by Peng (2005), where Markov chains under nonlinear expectations are considered in an axiomatic way. However, the existence of stochastic processes under nonlinear expectations has only been considered in terms of finite-dimensional nonlinear marginal distri- butions, whereas completely path-dependent functionals could not be regarded. Markov chains under model uncertainty have been considered among others by Avellaneda and Buff (1999), De Cooman, Hermans, and Quaeghebeur (2009), Hartfiel (1998), and Škulj (2009). Avellaneda and Buff (1999) study a finite difference discretization of the uncertain volatility model leading to a Markov chain setting. Hartfiel (1998) considers so-called Markov set-chains in discrete time, using matrix intervals in order to describe model uncertainty in the transition matrices.

Later, Škulj (2009) approached Markov chains under model uncertainty using Choquet capaci- ties, which results in higher dimensional matrices on the power set, while De Cooman et al. (2009) considered imprecise Markov chains using an operator-theoretic approach with upper and lower expectations. In Denk et al. (2018, Example 5.3), Denk et al. describe model uncertainty in the transition matrix via a nonlinear transition operator, which, together with the results obtained in Denk et al. (2018), allows the construction of discrete-time Markov chains on the canonical path space. In continuous time, in particular, computational aspects of sublinear imprecise Markov chains have been studied amongst others by Krak, De Bock, and Siebes (2017) and Škulj (2015).

Another concept that is closely related to Markov chains under nonlinear expectations, as dis- cussed in the present paper, are BSDEs on Markov chains by Cohen and Elliott (2008) and Cohen and Elliott (2010a), see also Cohen and Szpruch (2012), Cohen and Hu (2013), and Cohen and Elliott (2010b) for the discrete-time case. Here, a reference Markov chain𝑋 = (𝑋_𝑡)_𝑡≥0with generator(𝑞_𝑡)_𝑡≥0is fixed, and one considers BSDEs driven by𝑋. This can be viewed as a discretization of the classical BSDE setup, where the state space isℝ, the driving process is a Brownian Motion, and the generator is¹

2𝜕_𝑥𝑥. Cohen and Szpruch (2012) show that Markovian solutions to BSDEs on Markov chains are related via their driver to a system

𝑢^′(𝑡) = 𝑓(𝑡, 𝑢(𝑡)) + 𝐴(𝑡)𝑢(𝑡) for all𝑡≥0, 𝑢(0) = 𝑢₀

of nonlinear ordinary differential equations with a nonlinear function𝑓that is assumed to be globally Lipschitz in the variable𝑢. In the present paper,𝑓(𝑡, 𝑢) =𝑢 for a convex operator. The biggest difference between our approach and the theory of BSDEs on Markov chains lies in the fact that we do not consider a fixed reference Markov chain that drives the model. On the other hand, our approach is restricted to considering Markovian solutions to BSDEs on Markov chains.

From a technical standpoint, further differences are that the theory of BSDEs allows for more gen- erality in terms of nonlinearity of the driver, while we do not require global Lipschitz continuity of the generator allowing for a possibly unbounded convex conjugate. Additionally, we only focus

(4)

on the time-homogeneous case. However, regarding the existence of Markov chains under convex expectations and their connection to nonlinear ordinary differential equations (ODEs), this restriction could easily be overcome with a slight modification of the construction of the transition operators.

Dentcheva and Ruszczyński (2018) consider Markov risk measures for a countable state space, see also Fan and Ruszczyński (2018a), Fan and Ruszczyński (2018b), and Ruszczyński (2010) for the discrete-time case. Here, the focus lies on time-consistent risk measurement related to a fixed reference continuous-time Markov chain𝑋 = (𝑋_𝑡)_𝑡≥0. Using so-called semiderivatives in the direction of the generator𝐴, the authors derive, in the case of a coherent risk measure, a sublinear ordinary differential equation related to the risk measure, where the dual representation of the nonlinear generator depends on the generator𝐴of the baseline model𝑋. Clearly, in the theory of Markov risk measures, the focus lies more on law-invariant risk measures such as the average value at risk, and is therefore not directly comparable with our approach, where we explicitly avoid to fix a baseline model but rather try to capture very general forms of uncertainty in the generator. However, on a technical level, our approach also allows to consider risk evaluations related to convex generators that do not depend on a fixed reference generator.

In view of the aforementioned existing literature on imprecise versions of Markov chains, the contribution of this paper can be summarized as follows (see Remark2.6for further details):

– We propose a framework describing Markov chains under model uncertainty in terms of the rate matrix. Our approach complements the existing literature on BSDEs on Markov chains and Markov risk measures covering a different range of examples and applications in a consistent way. The key difference between our framework and the aforementioned existing approaches lies in the fact that we do not consider a fixed reference Markov chain describing the dynam- ics of an underlying asset. Moreover, our approach relies on analytic rather than stochastic methods using distributional rather than pathwise properties, and thus leading to restrictions in certain directions but advantages in other directions.

– We show that, as in the linear case, Markov chains under convex expectations with certain regularity at time 0 are linked via a one-to-one relation to certain convex functions (their generator) and to solutions to convex differential equations, which can be solved, for example, by using an explicit Euler method or any other Runge–Kutta method. In particular, we prove the global existence of solutions to a class of convex differential equations with unbounded convex conjugate, that is, without a global Lipschitz condition on the generator.

– We show that the transition semigroup of a convex Markov chain can be explicitly constructed using any (!) dual representation of the generator. In particular, for numerical computations, a

“minimal” dual representation in terms of certain “corner points” can be used to solve the nonlinear Kolmogorov equation. Based on the explicit construction of the semigroup, we propose a novel algorithm for the numerical computation of solutions to a class of nonlinear ODEs. More- over, we show that every convex transition semigroup is the least upper bound (in the sense of semigroups) of a family of linear transition semigroups, and vice versa.

– The convex expectations we consider are defined on the whole path space without fixing any reference measure. We show that the nonlinear expectation, although possibly undominated, always admits a dual representation in terms of countably additive probability measures. More- over, we derive an explicit dual representation in terms of an optimal control problem, where nature tries to control the system into the worst possible scenario, giving a control-theoretic interpretation to Markov chains under convex expectations.

(5)

1.1 Structure of the paper

In Section2, we fix the notation, introduce our setup and basic definitions, and state the main result (Theorem2.5). In Section3, we prove the first part of Theorem2.5(implications(𝑣) ⇒ (𝑖𝑖) ⇒ (𝑖) ⇒ (𝑖𝑖𝑖)). The main tool, we use in this part, is convex duality inℝ^𝑑. Moreover, we discuss how, in the sublinear case, computational efficiency can be improved by reducing compact and suitably convex sets of generator matrices to their “corner points.” The effectiveness of this reduction is demonstrated in Section5. In Section4, we prove the remaining implications(𝑖𝑖𝑖) ⇒ (𝑖𝑣) ⇒ (𝑣) of Theorem2.5. Here, we use a combination of so-called Nisio semigroups, as introduced in Nisio (1976/77), the theory of ordinary differential equations, and a Kolmogorov-type extension theorem for convex expectations derived in Denk et al. (2018). We conclude this section by showing that the semigroup envelope admits a dual representation as a cost functional related to an optimal control problem. In Section5, we use and compare two different numerical methods, based on the results from Sections3and4, in order to compute price bounds for European contingent claims, where the underlying is a discrete version of a Brownian motion with drift uncertainty (g-framework) and volatility uncertainty (G-framework).

2 NOTATION, BASIC DEFINITIONS, AND MAIN RESULT

Given a measurable space(Ω,), we denote the space of all bounded measurable functionsΩ → ℝby^∞(Ω,). Anonlinear expectationis then a functional ∶^∞(Ω,) → ℝ, which satisfies

∙ (𝑋)≤ (𝑌)whenever𝑋(𝜔)≤𝑌(𝜔)for all𝜔 ∈ Ω,

∙ (𝛼1_Ω) = 𝛼for all𝛼 ∈ ℝ.

Ifis additionally convex, that is, for all𝑋, 𝑌 ∈^∞(Ω,)and𝜃 ∈ [0, 1],

(𝜃𝑋 + (1 − 𝜃)𝑌)≤𝜃(𝑋) + (1 − 𝜃)(𝑌),

we say thatis aconvex expectation. It is well known (see, e.g., Denk et al.,2018or Föllmer &

Schied,2011) that every convex expectationadmits a dual representation in terms of finitely additive probability measures. If, however, even admits a dual representation in terms of (countably additive) probability measures, we say that(Ω,,)is aconvex expectationspace. More precisely, we say that(Ω,,)is aconvex expectationspace if there exists a setof probability measures on (Ω,)and a family(𝛼_ℙ)_ℙ∈ ⊂ [0, ∞)withinf_ℙ∈𝛼_ℙ= 0such that

(𝑋) = sup

ℙ∈(𝔼_ℙ(𝑋) − 𝛼_ℙ)

for all𝑋 ∈^∞(Ω,). Here,𝔼_ℙdenotes the expectation w.r.t. a probability measureℙon(Ω,). If 𝛼_ℙ= 0 for all ℙ ∈, we say that (Ω,,) is a sublinear expectation space. Here, the set

represents the set of all models that are relevant under the expectation. In the case of a sublinear expectation space, the functional is the best case among all plausible models. In the case of a convex expectation space, the functionalis a weighted best case among all plausible models

with an additional penalization term𝛼_ℙfor everyℙ ∈. Intuitively,𝛼_ℙcan be seen as a measure for how much importance we give to the priorℙ ∈under the expectation. For example,

(6)

a low penalization, that is,𝛼_ℙclose or equal to 0, gives more importance to the modelℙ ∈than a high penalization.

Throughout, we consider a finite nonempty state space𝑆with cardinality 𝑑 ∶=|𝑆|∈ ℕ. We endow𝑆 with the discrete topology2^𝑆 and w.l.o.g. assume that𝑆 = {1, … , 𝑑}. The space of all bounded measurable functions𝑆 → ℝcan therefore be identified byℝ^𝑑via

𝑢 = (𝑢₁, … , 𝑢_𝑑)^𝑇 with𝑢_𝑖∶= 𝑢(𝑖) for all𝑖 ∈ {1, … , 𝑑}.

Therefore, we denote bounded measurable functions𝑢as vectors of the form𝑢 = (𝑢1, … , 𝑢𝑑)^𝑇∈ ℝ^𝑑, where𝑢_𝑖represents the value of𝑢in the state𝑖 ∈ {1, … , 𝑑}. Onℝ^𝑑, we consider the norm

‖𝑢‖_∞∶= max

𝑖=1,…,𝑑|𝑢_𝑖|= max

𝑖∈{1,…,𝑑}|𝑢(𝑖)|

for a vector𝑢 ∈ ℝ^𝑑. Moreover, for𝛼 ∈ ℝ, the vector𝛼 ∈ ℝ^𝑑denotes the constant vector𝑢 ∈ ℝ^𝑑 with𝑢_𝑖= 𝛼for all𝑖 ∈ {1, … , 𝑑}. For an arbitrary matrix𝑞 = (𝑞_𝑖𝑗)_{1≤𝑖,𝑗≤𝑑}∈ ℝ^𝑑×𝑑, we denote by‖𝑞‖

the operator norm of𝑞 ∶ ℝ^𝑑→ ℝ^𝑑w.r.t. the norm‖⋅‖_∞, that is,

‖𝑞‖= sup

𝑣∈ℝ^𝑑⧵{0}

‖𝑞𝑣‖_∞

‖𝑣‖_∞ = max

𝑖=1,…,𝑑

( _𝑑

∑

𝑗=1

|𝑞_𝑖𝑗| )

.

Inequalities of vectors are always understood componentwise, that is, for𝑢, 𝑣 ∈ ℝ^𝑑, 𝑢≤𝑣 ⟺ ∀𝑖 ∈ {1, … , 𝑑} ∶ 𝑢_𝑖 ≤𝑣_𝑖.

In the same way, all concepts inℝ^𝑑that include inequalities are to be understood componentwise.

For example, a vector field𝐹 ∶ ℝ^𝑑 → ℝ^𝑑is calledconvexif

𝐹_𝑖(𝜆𝑢 + (1 − 𝜆)𝑣)≤𝜆𝐹_𝑖(𝑢) + (1 − 𝜆)𝐹_𝑖(𝑣)

for all𝑖 ∈ {1, … , 𝑑},𝑢, 𝑣 ∈ ℝ^𝑑and𝜆 ∈ [0, 1]. A vector field𝐹is calledsublinearif it is convex and positive homogeneous (of degree 1). Moreover, for a set𝑀 ⊂ ℝ^𝑑of vectors, we write𝑢 = sup 𝑀 if𝑢𝑖= sup_𝑣∈𝑀𝑣𝑖for all𝑖 ∈ {1, … , 𝑑}and𝑢 = max 𝑀if𝑢 = sup 𝑀and, for all𝑖 ∈ {1, … , 𝑑}, there exists some𝑣 ∈ 𝑀with𝑢_𝑖= 𝑣_𝑖.

In the following, we briefly recall the basic definitions and concepts from the theory of (time-homogeneous) Markov chains. A (time-homogeneous) Markov chain is a quadruple (Ω,, (ℙ₁, … , ℙ_𝑑), (𝑋_𝑡)_𝑡≥0), where:

(M1) (Ω,)is a measurable space.

(M2) 𝑋𝑡∶ Ω → {1, … , 𝑑}is-measurable for all𝑡≥0.

(M3) (ℙ₁, … , ℙ_𝑑)is a collection of probability measures, where, for𝑖 ∈ {1, … , 𝑑},ℙ_𝑖(𝑋₀= 𝑖) = 1, that is,ℙ_𝑖denotes the probability distribution under which the Markov chain starts in the state𝑖. Moreover, we use the notation

𝔼_𝑖(𝑌) ∶= 𝔼_ℙ_𝑖(𝑌) and 𝔼(𝑌) ∶= (𝔼₁(𝑌), … , 𝔼_𝑑(𝑌))^𝑇 for𝑖 ∈ {1, … , 𝑑}and all random variables𝑌 ∶ Ω → ℝ.

(7)

(M4) For all𝑠, 𝑡≥0and𝑖 ∈ {1, … , 𝑑},

𝔼_𝑖(𝑢(𝑋𝑠+𝑡)|_𝑠) = 𝔼𝑖(𝑢(𝑋_𝑡+𝑠)|𝑋_𝑠) = 𝔼_𝑋_𝑠(𝑢(𝑋_𝑡)).

In particular,𝔼_𝑖(𝑢(𝑋_𝑡+𝑠)|𝑋_𝑠 = 𝑗) = 𝔼_𝑗(𝑢(𝑋_𝑡)) for all𝑖, 𝑗 ∈ {1, … , 𝑑}.

A matrix𝑞 = (𝑞_𝑖𝑗)_{1≤𝑖,𝑗≤𝑑}∈ ℝ^𝑑×𝑑is called aQ-matrixorrate matrixif it satisfies the following conditions:

(Q1) 𝑞_𝑖𝑖≤0for all𝑖 ∈ {1, … , 𝑑},

(Q2) 𝑞_𝑖𝑗≥0for all𝑖, 𝑗 ∈ {1, … , 𝑑}with𝑖≠𝑗, (Q3) ∑𝑑

𝑗=1𝑞_𝑖𝑗 = 0for all𝑖 ∈ {1, … , 𝑑}.

It is well known that every continuous-time Markov chain with certain regularity properties at time𝑡 = 0can be related to aQ-matrixand vice versa. More precisely, for a matrix𝑞 ∈ ℝ^𝑑×𝑑, the following statements are equivalent:

(i) 𝑞is aQ-matrix.

(ii) There is a Markov chain(Ω,, (ℙ₁, … , ℙ_𝑑), (𝑋_𝑡)_𝑡≥0)such that

𝑞𝑢0= lim

ℎ↘0

𝔼(𝑢₀(𝑋_ℎ)) − 𝑢₀

ℎ for all𝑢0∈ ℝ^𝑑, where𝑢₀(𝑖)is the𝑖th component of𝑢₀for𝑖 ∈ {1, … , 𝑑}.

In this case, for each vector𝑢₀∈ ℝ^𝑑, the function𝑢 ∶ [0, ∞) → ℝ^𝑑, 𝑡 ↦ 𝔼(𝑢₀(𝑋_𝑡))is the unique classical solution𝑢 ∈ 𝐶¹([0, ∞); ℝ^𝑑)to the initial value problem

𝑢^′(𝑡) = 𝑞𝑢(𝑡), 𝑡≥0, 𝑢(0) = 𝑢0,

that is,𝑢(𝑡) = 𝑒^𝑡𝑞𝑢₀for all𝑡≥0, where𝑒^𝑡𝑞is the matrix exponential of𝑡𝑞. We refer to Norris (1998) for a detailed illustration of this relation.

We say that a (possibly nonlinear) operator∶ ℝ^𝑑→ ℝ^𝑑satisfies thepositive maximum prin- cipleif, for every𝑢 = (𝑢₁, … , 𝑢_𝑑)^𝑇∈ ℝ^𝑑and𝑖 ∈ {1, … , 𝑑},

(𝑢)_𝑖 ≤0 whenever𝑢𝑖 ≥𝑢𝑗for all𝑗 ∈ {1, … , 𝑑}.

This notion is motivated by the positive maximum principle for generators of Feller processes, see, for example, Jacob (2001, Equation (0.8)). Notice that a matrix𝑞 ∈ ℝ^𝑑×𝑑is aQ-matrixif and only if it satisfies the positive maximum principle and𝑞1 = 0, where1 ∶= (1, … , 1)^𝑇∈ ℝ^𝑑denotes the constant 1 vector. In fact, Property (Q3) is just a reformulation of𝑞1 = 0. Moreover, if𝑞satisfies the positive maximum principle, then𝑞_𝑖𝑖= (𝑞𝑒_𝑖)_𝑖≤0for all𝑖 ∈ {1, … , 𝑑}and−𝑞_𝑖𝑗 = (𝑞(−𝑒_𝑖))_𝑗≤ 0for all𝑖, 𝑗 ∈ {1, … , 𝑑}with𝑖≠𝑗. That is,𝑞fulfills (Q1) and (Q2). On the other hand, if𝑞is a Q-matrix,𝑢 = (𝑢₁, … , 𝑢_𝑑)^𝑇∈ ℝ^𝑑and𝑖 ∈ {1, … , 𝑑}with𝑢_𝑖≥𝑢_𝑗for all𝑗 ∈ {1, … , 𝑑}, then(𝑞𝑢)_𝑖=

∑𝑑

𝑗=1𝑞_𝑖𝑗𝑢_𝑗≤𝑢_𝑖∑𝑑

𝑗=1𝑞_𝑖𝑗 = 0, which shows that𝑞satisfies the positive maximum principle.

(8)

To state the main result, we introduce the following definitions.

Definition 2.1. A (possibly nonlinear) map∶ ℝ^𝑑 → ℝ^𝑑is called aQ-operatorif the following conditions are satisfied:

(i) (𝜆𝑒_𝑖)_𝑖≤0for all𝜆 > 0and all𝑖 ∈ {1, … , 𝑑},

(ii) ((−𝜆𝑒_𝑗))_𝑖 ≤0for all𝜆 > 0and all𝑖, 𝑗 ∈ {1, … , 𝑑}with𝑖≠𝑗, (iii) 𝛼 = 0for all𝛼 ∈ ℝ, where we identify𝛼with(𝛼, … , 𝛼)^𝑇 ∈ ℝ^𝑑.

Definition 2.2.Aconvex Markov chainis a quadruple(Ω,,, (𝑋_𝑡)_𝑡≥0)that satisfies the following conditions:

(i) (Ω,)is a measurable space.

(ii) 𝑋_𝑡∶ Ω → {1, … , 𝑑}is-measurable for all𝑡≥0.

(iii) = (₁, … ,_𝑑)^𝑇, where (Ω,,_𝑖) is a convex expectation space for all 𝑖 ∈ {1, … , 𝑑} and

(𝑢₀(𝑋0)) = 𝑢0. Here and in the following, we use the notation

(𝑌) ∶= (₁(𝑌), … ,_𝑑(𝑌))^𝑇∈ ℝ^𝑑

for𝑌 ∈^∞(Ω,).

(iv) The following version of the Markov property is satisfied: For all𝑠, 𝑡≥0,𝑛 ∈ ℕ,0≤𝑡₁<

⋯ < 𝑡_𝑛 ≤𝑠, and𝑣₀∈ (ℝ^𝑑)^(𝑛+1),

(𝑣₀(𝑌, 𝑋_𝑠+𝑡)) =[

_𝑋_𝑠_,𝑡(𝑣₀(𝑌, ⋅ ))]

, (1)

where𝑌 ∶= (𝑋_𝑡₁, … , 𝑋_𝑡_𝑛)and_𝑖,𝑡(𝑢₀) ∶=_𝑖(𝑢₀(𝑋_𝑡))for all𝑢₀∈ ℝ^𝑑and𝑖 ∈ {1, … , 𝑑}. We say that the Markov chain (Ω,,, (𝑋_𝑡)_𝑡≥0) is linear or sublinear if the mapping ∶

^∞(Ω,) → ℝ^𝑑is, additionally, linear, or sublinear, respectively.

Notice that the properties(𝑖)–(𝑖𝑖𝑖) in the previous definition are a one-to-one translation of (M1)–(M3) to a convex setup. The Markov property given in(𝑖𝑣)of the previous definition is the nonlinear analog of the classical Markov property (M4) without using conditional expectations.

Due to the nonlinearity of the expectation, the definition and, in particular, the existence of a conditional (nonlinear) expectation are quite involved, which is why we avoid to introduce this concept. In order to get the idea behind the formulation in (iv), choose𝑣₀= 𝑢(𝑋_𝑠+𝑡)1_𝐵(𝑌)for a measurable function𝑢 ∶ {1, … , 𝑑} → ℝand arbitrary𝐵 ⊂ {1, … , 𝑑}^𝑛. Then, ifis linear, Equation (1) reads as

(𝑢(𝑋_𝑠+𝑡)1_𝐵(𝑌)) =(

_𝑋_𝑠_,𝑡(𝑢)1_𝐵(𝑌)) ,

which is equivalent to (M4). On the other hand, for every linear Markov chain, Property (M4) implies Property(𝑖𝑣). Hence, in the linear case, Definition2.2is consistent with the classical definition of a Markov chain.

In line with Denk et al. (2018, Definition 5.1), we say that a (possibly nonlinear) map ∶ ℝ^𝑑→ ℝ^𝑑is akernel, if ismonotone, that is,(𝑢)≤ (𝑣)for all𝑢, 𝑣 ∈ ℝ^𝑑with𝑢≤𝑣, andpreserves constants, that is,(𝛼) = 𝛼for all𝛼 ∈ ℝ.

(9)

Definition 2.3. A family S= (S(𝑡))_𝑡≥0of (possibly nonlinear) operatorsS(𝑡) ∶ ℝ^𝑑→ ℝ^𝑑 is called asemigroupif

(i) S(0) = 𝐼, where𝐼 = 𝐼_𝑑is the𝑑-dimensional identity matrix, (ii) S(𝑠 + 𝑡) =S(𝑠)S(𝑡)for all𝑠, 𝑡≥0.

Here and throughout, we make use of the notationS(𝑠)S(𝑡) ∶=S(𝑠) ◦S(𝑡). If, additionally, S(ℎ) → 𝐼uniformly on compact sets asℎ ↘ 0, we say that the semigroupSisuniformly con- tinuous. We callSMarkovianifS(𝑡)is akernelfor all𝑡≥0. We say thatSislinear,sublinear, or convexifS(𝑡)is linear, sublinear, or convex for all𝑡≥0, respectively.

Definition 2.4. Let⊂ ℝ^𝑑×𝑑 be a set ofQ-matrices and𝑓 = (𝑓_𝑞)_𝑞∈ a family of vectors with sup_𝑞∈𝑓_𝑞= 𝑓_𝑞₀ = 0for some𝑞₀∈, that is,𝑓_𝑞≤0for all𝑞 ∈ and there exists some𝑞₀∈ with𝑓_𝑞₀ = 0. We denote by

𝑆_𝑞(𝑡)𝑢₀∶= 𝑒^𝑞𝑡𝑢₀+∫

𝑡 0

𝑒^𝑞𝑠𝑓_𝑞d𝑠 = 𝑢₀+∫

𝑡 0

𝑒^𝑠𝑞(

𝑞𝑢₀+ 𝑓_𝑞) d𝑠

for𝑡≥0,𝑢₀∈ ℝ^𝑑and𝑞 ∈. Then,𝑆_𝑞= (𝑆_𝑞(𝑡))_𝑡≥0is an affine linear semigroup. We call a semi- groupSthe(upper) semigroup envelope(later alsoNisio semigroup) of(, 𝑓)if

(i) S(𝑡)𝑢₀≥𝑆_𝑞(𝑡)𝑢₀for all𝑡≥0,𝑢₀∈ ℝ^𝑑and𝑞 ∈,

(ii) for any other semigroup T satisfying (i) we have that S(𝑡)𝑢₀≤T(𝑡)𝑢₀ for all 𝑡≥0 and 𝑢₀∈ ℝ^𝑑.

That is, the semigroup envelope S is the smallest semigroup that dominates all semigroups (𝑆_𝑞)_𝑞∈.

The following main theorem gives a full characterization of convexQ-operators.

Theorem 2.5. Let∶ ℝ^𝑑→ ℝ^𝑑be a mapping. Then, the following statements are equivalent:

(i) is a convexQ-operator.

(ii) is convex, satisfies the positive maximum principle, and𝛼 = 0for all𝛼 ∈ ℝ, where𝛼 ∶=

(𝛼, … , 𝛼)^𝑇∈ ℝ^𝑑.

(iii) There exists a set ⊂ ℝ^𝑑×𝑑 ofQ-matrices and a family 𝑓 = (𝑓_𝑞)_𝑞∈ ⊂ ℝ^𝑑 of vectors with 𝑓𝑞≤0for all𝑞 ∈and𝑓𝑞₀ = 0for some𝑞0∈, such that

𝑢₀= sup

𝑞∈

(𝑞𝑢₀+ 𝑓_𝑞)

(2)

for all𝑢0∈ ℝ^𝑑, where the supremum is to be understood componentwise.

(iv) There exists a uniformly continuous convex Markovian semigroupSwith

𝑢₀= lim

ℎ↘0

S(ℎ)𝑢₀− 𝑢₀ ℎ

(10)

for all𝑢₀∈ ℝ^𝑑.

(v) There is a convex Markov chain(Ω,,, (𝑋_𝑡)_𝑡≥0)such that

𝑢₀= lim

ℎ↘0

(𝑢0(𝑋_ℎ)) − 𝑢₀ ℎ

for all𝑢0∈ ℝ^𝑑.

In this case, for each initial value𝑢₀∈ ℝ^𝑑, the function𝑢 ∶ [0, ∞) → ℝ^𝑑, 𝑡 ↦(𝑢₀(𝑋_𝑡))is the unique classical solution𝑢 ∈ 𝐶¹([0, ∞); ℝ^𝑑)to the initial value problem

𝑢^′(𝑡) =𝑢(𝑡) = sup

𝑞∈

(𝑞𝑢(𝑡) + 𝑓_𝑞)

, 𝑡≥0, (3)

𝑢(0) = 𝑢₀.

Moreover, the Markovian semigroupSfrom (iv) is the (upper) semigroup envelope of(, 𝑓), and 𝑢(𝑡) =S(𝑡)𝑢₀for all𝑡≥0.

Remark2.6. Consider the situation of Theorem2.5.

(a) The dual representation in(𝑖𝑖𝑖)gives a model uncertainty interpretation toQ-operators. The set can be seen as the set of all plausible rate matrices, when considering theQ-operator

. For every𝑞 ∈, the vector𝑓𝑞≤0can be interpreted as a penalization, which measures how much importance we give to each rate matrix𝑞. The requirement that there exists some 𝑞₀∈ with𝑓_𝑞₀ = 0can be interpreted in the following way: There exists at least one rate matrix𝑞₀ within the set of all plausible rate matrices to which we assign the maximal importance, which is the minimal penalization.

(b) The semigroup envelope S of(, 𝑓)can be constructed more explicitly, in particular, an explicit (in terms of(, 𝑓)) dual representation can be derived. For details, we refer to Sec- tion4(Definition4.2and Remark4.18). Moreover, we would like to highlight that the semigroup envelopeScan be constructed w.r.t. any dual representation(, 𝑓)as in (𝑖𝑖𝑖) and results in the unique classical solution to (3) independent of the choice of the dual representation(, 𝑓)of. This gives, in some cases, the opportunity to efficiently compute the semigroup envelope numerically via its primal/dual representation (see Remark3.3and Exam- ple5.2).

(c) The same equivalence as in Theorem2.5holds if convexity is replaced by sublinearity in(𝑖), (𝑖𝑖),(𝑖𝑣), and(𝑣)and𝑓𝑞= 0for all𝑞 ∈in(𝑖𝑖𝑖). In this case, the setin(𝑖𝑖𝑖)can be chosen to be compact as we will see in the proof of Theorem2.5.

(d) Theorem2.5extends and includes the well-known relation between (linear) Markov chains, Q-matrices, and ordinary differential equations.

(e) A remarkable consequence of Theorem2.5is that every convex Markovian semigroup, which is differentiable at time𝑡 = 0, is the semigroup envelope with respect to the Fenchel–Legendre transformation (or any other dual representation as in(𝑖𝑖𝑖)of its generator, which is a convex Q-operator.

(11)

(f) Althoughhas an unbounded convex conjugate, the convex initial value problem

𝑢^′(𝑡) =𝑢(𝑡) for all𝑡≥0, 𝑢(0) = 𝑢₀, (4) has a unique global solution.

(g) Solutions to (4) remain bounded. Therefore, a Picard iteration or Runge–Kutta methods, such as the explicit Euler method, can be used for numerical computations, and the convergence rate (depending on the size of the initial value𝑢₀) can be derived from the a priori estimate in Banach’s fixed point theorem.

(h) As in the linear case, by solving the differential equation (4), one can (numerically) compute expressions of the form

𝑢(𝑡) =(𝑢₀(𝑋_𝑡)).

We illustrate this computation procedure in Example5.1.

3 PROOF OF (𝒗) ⇒ (𝒊𝒊) ⇒ (𝒊) ⇒ (𝒊𝒊𝒊)

We say that a set ⊂ ℝ^𝑑×𝑑of matrices isrow-convexif, for any diagonal matrix𝜃 ∈ ℝ^𝑑×𝑑with 𝜃_𝑖 ∶= 𝜆_𝑖𝑖∈ [0, 1]for all𝑖 ∈ {1, … , 𝑑},

𝜃𝑝 + (𝐼 − 𝜃)𝑞 ∈ for all𝑝, 𝑞 ∈,

where𝐼 = 𝐼_𝑑∈ ℝ^𝑑×𝑑 is the𝑑-dimensional identity matrix. Notice that, for all𝑖 ∈ {1, … , 𝑑}, the 𝑖th row of the matrix𝜃𝑝 + (𝐼 − 𝜃)𝑞is the convex combination of the𝑖th row of𝑝and𝑞with𝜃_𝑖. Notice that a set⊂ ℝ^𝑑×𝑑is row-convex if and only if it is convex and, for arbitrary𝑝, 𝑞 ∈, the matrix that results from replacing the𝑖th row of𝑝by the𝑖th row of𝑞is again an element of.

For example, the set of allQ-matrices is row-convex.

Remark3.1. Letbe a convexQ-operator. For every matrix𝑞 ∈ ℝ^𝑑×𝑑, let

^∗(𝑞) ∶= sup

𝑢∈ℝ^𝑑

(𝑞𝑢 −(𝑢)) ∈ [0, ∞]^𝑑

be theconjugate functionof. Notice that0≤ ^∗(𝑞)for all𝑞 ∈ ℝ^𝑑×𝑑, since(0) = 0. Let

^∗∶={

𝑞 ∈ ℝ^𝑑×𝑑|||^∗(𝑞) ∈ [0, ∞)^𝑑}

and𝑓_𝑞^∗∶= −^∗(𝑞)for all𝑞 ∈^∗. Then, the following facts are well-known results from convex duality theory inℝ^𝑑.

(a) The set^∗is row-convex and the mapping^∗→ ℝ^𝑑, 𝑞 ↦^∗(𝑞)is lower semicontinuous.

(12)

(b) Let 𝑀≥0 and _𝑀^∗ ∶= {𝑞 ∈ ℝ^𝑑×𝑑|^∗(𝑞)≤𝑀}. Then, _𝑀^∗ ⊂ ℝ^𝑑×𝑑 is compact and row- convex. Therefore,

_𝑀 ∶ ℝ^𝑑 → ℝ^𝑑, 𝑢 ↦ max

𝑞∈^∗_𝑀

(𝑞𝑢 + 𝑓_𝑞^∗)

(5)

defines a convex operator, which is Lipschitz continuous. Notice that the maximum in (5) is to be understood componentwise. However, for fixed𝑢₀∈ ℝ^𝑑, the maximum can be attained, simultaneously in every component, by a single element of_𝑀^∗, that is, for all𝑢0∈ ℝ^𝑑, there exists some𝑞₀∈_𝑀^∗ with

_𝑀𝑢₀= 𝑞₀𝑢₀+ 𝑓_𝑞^∗₀.

This is due to the fact that_𝑀^∗ is row convex and that, for𝑞 ∈^∗, the𝑖th component of the vector𝑓^∗_𝑞only depends on the𝑖th row of𝑞.

(c) Let𝑅≥0. Then, there exists some𝑀≥0, such that

𝑢₀= max

𝑞∈_𝑀^∗

(𝑞𝑢₀+ 𝑓_𝑞^∗)

=_𝑀𝑢₀

for all𝑢₀∈ ℝ^𝑑with‖𝑢₀‖_∞≤𝑅. In particular,is locally Lipschitz continuous and

𝑢₀= max

𝑞∈^∗

(𝑞𝑢₀+ 𝑓_𝑞^∗)

for all𝑢₀∈ ℝ^𝑑,

where, for fixed𝑢₀∈ ℝ^𝑑, the maximum can be attained, simultaneously in every component, by a single element of^∗. In particular, there exists some𝑞₀∈^∗with𝑓^∗_𝑞₀ = sup_𝑞∈∗𝑓^∗_𝑞=

(0) = 0.

Proof of Theorem2.5.(𝑣) ⇒ (𝑖𝑖): As_𝑖is a convex expectation for all𝑖 ∈ {1, … , 𝑑}, it follows that the operatoris convex with𝛼 = 0for all𝛼 ∈ ℝ. Now, let𝑢0∈ ℝ^𝑑and𝑖 ∈ {1, … , 𝑑}with𝑢0,𝑖 ≥𝑢0,𝑗

for all𝑗 ∈ {1, … , 𝑑}. Let𝛼 > 0be such that

‖𝑢₀+ 𝛼‖_∞= (𝑢₀+ 𝛼)_𝑖= 𝑢_0,𝑖+ 𝛼,

and define𝑣₀∶= 𝑢₀+ 𝛼. Then,

𝑣₀= lim

ℎ↘0

(𝑢₀(𝑋_ℎ) + 𝛼) − 𝑣₀

ℎ = lim

ℎ↘0

(𝑢₀(𝑋_ℎ)) − 𝑢₀

ℎ =𝑢₀. Assume that(𝑢₀)_𝑖> 0. Then, there exists someℎ > 0such that

_𝑖(𝑣₀(𝑋_ℎ)) − 𝑣_0,𝑖 > 0.

Hence,

‖‖‖(𝑣0(𝑋ℎ))‖‖‖∞≥ _𝑖(𝑣0(𝑋ℎ)) > 𝑣0,𝑖 =‖𝑣₀‖_∞,

(13)

which is a contradiction to

‖‖‖(𝑣₀(𝑋_ℎ))‖‖‖∞≤‖𝑣₀‖_∞. This shows thatsatisfies the positive maximum principle.

(𝑖𝑖) ⇒ (𝑖): This follows directly from the positive maximum principle, considering the vectors 𝜆𝑒_𝑖and−𝜆𝑒_𝑖for all𝜆 > 0and𝑖 ∈ {1, … , 𝑑}.

(𝑖) ⇒ (𝑖𝑖𝑖): Letbe a convexQ-operator. Moreover, let^∗and𝑓^∗= (𝑓_𝑞^∗)_𝑞∈∗be as in Remark5.

Then, by Remark5(c), it only remains to show that every𝑞 ∈^∗is aQ-matrix. To this end, fix an arbitrary𝑞 ∈^∗. Then, for all𝛼 ∈ ℝ,

𝑞𝛼 = 1

𝜆𝑞(𝜆𝛼)≤ 1

𝜆((𝜆𝛼) +^∗(𝑞)) = 1

𝜆^∗(𝑞) → 0 as𝜆 → ∞.

Therefore,𝑞𝛼≤0for all𝛼 ∈ ℝ. Since𝑞 is linear, it follows that𝑞1 = 0. Now, let𝑖 ∈ {1, … , 𝑑}. Then, by definition of aQ-operator, we obtain that

𝑞_𝑖𝑖≤ 1

𝜆((𝜆𝑒_𝑖) +^∗(𝑞))_𝑖≤ 1

𝜆(^∗(𝑞))_𝑖→ 0 as𝜆 → ∞,

that is,𝑞_𝑖𝑖≤0. Now, let𝑖, 𝑗 ∈ {1, … , 𝑑}with𝑖≠𝑗. Then, again by definition of aQ-operator, it follows that

−𝑞_𝑖𝑗 ≤ 1

𝜆((−𝜆𝑒_𝑖) +^∗(𝑞))_𝑗≤ 1

𝜆(^∗(𝑞))_𝑗→ 0 as𝜆 → ∞, that is,𝑞_𝑖𝑗 ≥0. Therefore,𝑞is aQ-matrix.

It remains to show the implications (𝑖𝑖𝑖) ⇒ (𝑖𝑣) ⇒ (𝑣), which is done in the entire next

section. □

Before we start with the proof of the remaining implications(𝑖𝑖𝑖) ⇒ (𝑖𝑣) ⇒ (𝑣), we would like to point out how, in the sublinear case, the set^∗ofQ-matrices from Remark3.1can be reduced to certain “corner points.” This can be done using the concept of row convexity, introduced at the beginning of this section, together with Minkowski’s theorem on extremal points of convex sets inℝ^𝑑. Let⊂ ℝ^𝑑×𝑑be a nonempty set of matrices. Then, we define therow-convex hullof by

rch() ∶=

{∑𝑛 𝑖=1

𝜃^𝑖𝑞^𝑖||

||𝑛 ∈ ℕ , 𝜃¹, … , 𝜃^𝑛 ∈ [0, ∞)^𝑑×𝑑,

∑𝑛 𝑖=1

𝜃^𝑖= 𝐼, 𝑞¹, … 𝑞^𝑛 ∈ }

.

For a convex set𝐶 ⊂ ℝ^𝑑, we denote the set of all extreme points of𝐶 by𝐸(𝐶). Recall that an extreme point of a convex set𝐶 ⊂ ℝ^𝑑 is an element𝑥 ∈ 𝐶such that𝑥 = 𝜆𝑦 + (1 − 𝜆)𝑧, for𝜆 ∈ (0, 1)and𝑦, 𝑧 ∈ 𝐶, implies that𝑥 = 𝑦 = 𝑧. For a matrix𝑞 ∈ ℝ^𝑑×𝑑and𝑖 ∈ {1, … , 𝑑}, we denote by

𝑞𝑖∶= (𝑞𝑖1, … , 𝑞𝑖𝑑) ∈ ℝ^𝑑

(14)

the𝑖th row of𝑞. Let⊂ ℝ^𝑑×𝑑be a nonempty compact row-convex set of matrices. Then, we say that a set⊂is-row-extremeif

{𝑞_𝑖|𝑞 ∈} = 𝐸({𝑞_𝑖|𝑞 ∈}) for all𝑖 ∈ {1, … , 𝑑}.

That is, the set of all𝑖th rows ofis the set of all extreme points of the𝑖th rows of. We say that a set⊂isminimal-row-extreme, ifis row-extreme for and⊂implies= for any-row-extreme set⊂.

Proposition 3.2. Let ⊂ ℝ^𝑑×𝑑be nonempty, compact, and row-convex. Then, there exists a min- imal-row-extreme set⊂. Moreover, = rch()is the row-convex hull of any (minimal)

-row-extreme set⊂and

max𝑞∈ 𝑞𝑢₀= max

𝑞∈𝑞𝑢₀ for all𝑢₀∈ ℝ^𝑑, (6)

where the maxima are to be understood componentwise.

Proof. By Minkowski’s theorem, the set of all-row-extreme sets is nonempty, and one readily verifies that the latter together with the partial order⪯, given by₁⪯₂if and only if₁⊃₂, has the chain property. Hence, by Zorn’s lemma, there exists a maximal elementwithin the set of all-row-extreme sets, which, by definition, is a minimal-row-extreme set. Now, let be an arbitrary-row-extreme set and𝑢₀∈ ℝ^𝑑. Then,

max𝑞∈ (𝑞𝑢₀)_𝑖 = max

𝑞∈ (𝑞_𝑖⋅ 𝑢₀) = max

𝑞∈ (𝑞_𝑖⋅ 𝑢₀) = max

𝑞∈ (𝑞𝑢₀)_𝑖.

□

Remark3.3. Let∶ ℝ^𝑑→ ℝ^𝑑be a sublinearQ-operator, and^∗as in Remark3.1. Then,

^∗={

𝑞 ∈ ℝ^𝑑|||𝑓_𝑞^∗=^∗(𝑞) = 0}

is a nonempty, compact, and row-convex set. By the previous proposition, there exists a minimal

^∗-row-extreme set⊂^∗, and, for all𝑢₀∈ ℝ^𝑑,

𝑢₀= max

𝑞∈𝑞𝑢0,

where the maximum is to be understood componentwise. Since𝑓^∗= (𝑓_𝑞^∗)_𝑞∈∗ = 0, it follows that (, 0)is a dual representation as in Theorem2.5(iii). Notice that, in many cases, the cardinality ofis way smaller than the cardinality of^∗. Therefore, concerning computational aspects, the dual representation(, 0)is often way more tractable than the dual representation(^∗, 0), and, by Theorem2.5, both representations result in the same semigroup envelope, and thus, the same solution to the ODE (3).

(15)

Example 3.4. Let𝑞₀, 𝑞 ∈ ℝ^𝑑×𝑑be two fixedQ-matrices and𝜆_𝑙, 𝜆_ℎ∈ ℝwith𝜆_𝑙≤𝜆_ℎ. We define the sublinearQ-operator∶ ℝ^𝑑→ ℝ^𝑑by

𝑢₀∶= 𝑞₀𝑢₀+ max

𝜆∈[𝜆_𝑙,𝜆_ℎ]𝜆𝑞𝑢₀ for all𝑢₀∈ ℝ^𝑑.

We consider the maximal row-convex set^∗⊂ ℝ^𝑑×𝑑representing, defined as in Remark3.1.

Then,

^∗={

𝑝₀+ 𝜆𝑝|||𝜆 ∈diag([𝜆𝑙, 𝜆_ℎ])} ,

where diag([𝜆_𝑙, 𝜆_ℎ])denotes the set of all diagonal matrices𝜆 ∈ ℝ^𝑑×𝑑with diagonal entries𝜆_𝑖𝑖∈ [𝜆_𝑙, 𝜆_ℎ]for all𝑖 ∈ {1, … , 𝑑}. Now, let

∶= {𝑞₀+ 𝜆_𝑙𝑞, 𝑞₀+ 𝜆_ℎ𝑞}.

Then,is a minimal^∗-row-extreme set, and thus,^∗= rch(). In particular, by the previous remark, the tuple

({𝑞0+ 𝜆_𝑙𝑞, 𝑞₀+ 𝜆_ℎ𝑞}, (0, 0))

is a dual representation as in Theorem2.5(iii), which is way more tractable than the dual representation(^∗, 0).

4 PROOF OF (𝒊𝒊𝒊) ⇒ (𝒊𝒗) ⇒ (𝒗)

Throughout, let⊂ ℝ^𝑑×𝑑be a set ofQ-matrices and𝑓 = (𝑓𝑞)_𝑞∈ ⊂ ℝ^𝑑with𝑓𝑞≤0for all𝑞 ∈ and𝑓_𝑞₀ = 0for some𝑞₀∈, such that the map

∶ ℝ^𝑑→ ℝ^𝑑, 𝑢 ↦ sup

𝑞∈

(𝑞𝑢 + 𝑓_𝑞)

is well-defined. For every𝑞 ∈, we consider the linear ODE

𝑢^′(𝑡) = 𝑞𝑢(𝑡) + 𝑓_𝑞, for𝑡≥0, (7)

with𝑢(0) = 𝑢₀∈ ℝ^𝑑. Then, by a variation of constant, the solution to (7) is given by

𝑢(𝑡) = 𝑒^𝑞𝑡𝑢0+∫

𝑡 0

𝑒^𝑞𝑠𝑓𝑞d𝑠 = 𝑢0+∫

𝑡 0

𝑒^𝑠𝑞(

𝑞𝑢0+ 𝑓𝑞

)d𝑠 =∶ 𝑆𝑞(𝑡)𝑢0 (8)

for𝑡≥0, where𝑒^𝑡𝑞∈ ℝ^𝑑×𝑑 is the matrix exponential of𝑡𝑞 for all𝑡≥0. Then, the family𝑆_𝑞= (𝑆_𝑞(𝑡))_𝑡≥0defines a uniformly continuous semigroup of affine linear operators (see Definition2.3).

Remark4.1. Note that, for all𝑞 ∈and𝑡≥0, the matrix exponential𝑒^𝑡𝑞∈ ℝ^𝑑×𝑑is astochastic matrix, that is,

(16)

(i) (𝑒^𝑡𝑞)𝑖𝑗≥0for all𝑖, 𝑗 ∈ {1, … , 𝑑}, (ii) 𝑒^𝑡𝑞1 = 1.

Therefore,𝑒^𝑡𝑞∈ ℝ^𝑑×𝑑is a linear kernel, that is,𝑒^𝑡𝑞𝑢₀≤𝑒^𝑡𝑞𝑣₀for all𝑢₀, 𝑣₀∈ ℝ^𝑑with𝑢₀≤𝑣₀and 𝑒^𝑡𝑞𝛼 = 𝛼for all𝛼 ∈ ℝ, which implies that𝑆_𝑞(𝑡)is monotone for all𝑞 ∈and𝑡≥0.

For the family(𝑆_𝑞)_𝑞∈or, more precisely, for(, 𝑓), we will now construct theNisio semigroup, and show that it gives rise to the unique classical solution to the nonlinear ODE (3). To this end, we consider the set of finite partitions

𝑃 ∶={

𝜋 ⊂ [0, ∞)|||0 ∈ 𝜋,|𝜋|< ∞} .

The set of partitions with end point𝑡≥0will be denoted by𝑃_𝑡, that is,𝑃_𝑡∶= {𝜋 ∈ 𝑃|max 𝜋 = 𝑡}. Notice that

𝑃 =⋃

𝑡≥0

𝑃_𝑡.

For allℎ≥0and𝑢₀∈ ℝ^𝑑, we define

_ℎ𝑢₀∶= sup

𝑞∈𝑆_𝑞(ℎ)𝑢₀,

where the supremum is taken componentwise. Note that_ℎis well-defined since 𝑆_𝑞(ℎ)𝑢₀= 𝑒^ℎ𝑞𝑢₀+∫

ℎ 0

𝑒^𝑠𝑞𝑓_𝑞d𝑠≤𝑒^ℎ𝑞𝑢₀≤‖𝑢₀‖_∞

for all𝑞 ∈,ℎ≥0and𝑢0∈ ℝ^𝑑, where we used the fact that𝑒^ℎ𝑞is a kernel. Moreover,_ℎis a convex kernel, for allℎ≥0, as it is monotone and

_ℎ𝛼 = 𝛼 + sup

𝑞∈∫

ℎ 0

𝑒^𝑠𝑞𝑓_𝑞d𝑠 = 𝛼

for all𝛼 ∈ ℝ, where we used the fact that there is some𝑞₀∈with𝑓_𝑞₀ = 0. For a partition𝜋 = {𝑡₀, 𝑡₁, … , 𝑡_𝑚} ∈ 𝑃with𝑚 ∈ ℕand0 = 𝑡₀< 𝑡₁< ⋯ < 𝑡_𝑚, we set

_𝜋∶=_𝑡₁_−𝑡₀…_𝑡_𝑚_−𝑡_𝑚−1.

Moreover, we set_{0}∶=₀. Then,_𝜋is a convex kernel for all𝜋 ∈ 𝑃since it is a concatenation of convex kernels.

Definition 4.2. TheNisio semigroupS= (S(𝑡))_𝑡≥0of(, 𝑓)is defined by S(𝑡)𝑢₀∶= sup

𝜋∈𝑃_𝑡_𝜋𝑢₀ for all𝑢₀∈ ℝ^𝑑and𝑡≥0.

(17)

Notice thatS(𝑡) ∶ ℝ^𝑑 → ℝ^𝑑is well-defined and a convex kernel for all𝑡≥0since_𝜋is a convex kernel for all𝜋 ∈ 𝑃. In many of the subsequent proofs, we will first concentrate on the case, where the family𝑓is bounded and then use an approximation of the Nisio semigroup by means of other Nisio semigroups. This approximation procedure is specified in the following remark.

Remark 4.3. Let 𝑀≥0, _𝑀 ∶= {𝑞 ∈| ‖𝑓_𝑞‖_∞≤𝑀} and 𝑓_𝑀∶= (𝑓_𝑞)_𝑞∈_𝑀. Notice that, by assumption, there exists some𝑞₀∈with𝑓_𝑞₀ = 0, which implies that𝑞₀∈_𝑀. Since_𝑀⊂ (and by definition of𝑓_𝑀), the operator

_𝑀 ∶ ℝ^𝑑 → ℝ^𝑑, 𝑣 ↦ sup

𝑞∈𝑀

(𝑞𝑣 + 𝑓𝑞

)

is well-defined. LetS_𝑀be the Nisio semigroup w.r.t.(_𝑀, 𝑓𝑀)for all𝑀≥0. Since

⋃

𝑀≥0

_𝑀=,

it follows that_𝑀 ↗andS_𝑀(𝑡) ↗S(𝑡), for all𝑡≥0, as𝑀 → ∞. Moreover, for all𝑞 ∈_𝑀, 𝑢₀∈ ℝ^𝑑with‖𝑢₀‖_∞= 1, and𝑖 ∈ {1, … , 𝑑},

(𝑞𝑢₀)_𝑖≤(

𝑢₀− 𝑓_𝑞)

𝑖≤‖𝑢₀‖_∞+‖𝑓_𝑞‖_∞≤𝑀 + max

𝑣∈𝕊^𝑑−1‖𝑣‖_∞,

where𝕊^𝑑−1∶= {𝑣 ∈ ℝ^𝑑| ‖𝑣‖_∞= 1} and, in the last step, we used the fact that ∶ ℝ^𝑑→ ℝ^𝑑 is convex and therefore continuous. This implies that the set_𝑀 is bounded in the sense that sup_𝑞∈

𝑀‖𝑞‖< ∞. In particular,

𝑞∈sup𝑀

‖𝑞𝑢₀+ 𝑓_𝑞‖_∞≤ sup

𝑞∈𝑀

(‖𝑞‖‖𝑢₀‖_∞+‖𝑓_𝑞‖_∞)

≤𝑀 + sup

𝑞∈𝑀

‖𝑞‖‖𝑢₀‖_∞< ∞ (9)

for all𝑢₀∈ ℝ^𝑑.

Lemma 4.4.Assume that the family𝑓is bounded, that is,(, 𝑓) = (_𝑀, 𝑓_𝑀)for some𝑀≥0. Then, for all𝑢₀∈ ℝ^𝑑, the mapping[0, ∞) → ℝ^𝑑, ℎ ↦_ℎ𝑢₀is Lipschitz continuous.

Proof. Let𝑢₀∈ ℝ^𝑑and0≤ℎ₁< ℎ₂. Then, by (8), for all𝑞 ∈, we have that

‖𝑆_𝑞(ℎ₂)𝑢₀− 𝑆_𝑞(ℎ₁)𝑢₀‖_∞≤ ∫_ℎ^ℎ²

1

‖‖‖𝑒^𝑞𝑠(𝑞𝑢₀+ 𝑓_𝑞)‖‖‖∞d𝑠≤(ℎ₂− ℎ₁)‖𝑞𝑢₀+ 𝑓_𝑞‖_∞,

which implies that

‖_ℎ₂𝑢₀−_ℎ₁𝑢₀‖_∞≤sup

𝑞∈‖𝑆_𝑞(ℎ₂)𝑢₀− 𝑆_𝑞(ℎ₁)𝑢₀‖_∞≤(ℎ₂− ℎ₁) (

sup𝑞∈‖𝑞𝑢₀+ 𝑓_𝑞‖_∞ )

. (10)

Note thatsup_𝑞∈‖𝑞𝑢₀+ 𝑓_𝑞‖_∞< ∞by (9). □