• Keine Ergebnisse gefunden

A Differential Model for a 2x2-Evolutionary Game Dynamics

N/A
N/A
Protected

Academic year: 2022

Aktie "A Differential Model for a 2x2-Evolutionary Game Dynamics"

Copied!
55
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

A Differential Model for a

2 × 2-Evolutionary Game Dynamics

A. M. Tarasyev

WP-94-63 August 1994

IIASA

International Institute for Applied Systems Analysis A-2361 Laxenburg Austria Telephone: 43 2236 807 Fax: 43 2236 71313 E-Mail: info@iiasa.ac.at

(2)

A Differential Model for a

2 × 2-Evolutionary Game Dynamics

A. M. Tarasyev

WP-94-63 August 1994

Working Papers are interim reports on work of the International Institute for Applied Systems Analysis and have received only limited review. Views or opinions expressed herein do not necessarily represent those of the Institute, its National Member Organizations, or other organizations supporting the work.

IIASA

International Institute for Applied Systems Analysis A-2361 Laxenburg Austria Telephone: 43 2236 807 Fax: 43 2236 71313 E-Mail: info@iiasa.ac.at

(3)

Foreword

A dynamical model for an evolutionary nonantagonistic (nonzero sum) game between two populations is considered. A scheme of a dynamical Nash equilibrium in the class of feedback (discontinuous) controls is proposed. The construction is based on solutions of auxiliary antagonistic (zero-sum) differential games. A method for approximating the correspondingvalue functions is developed. The method uses approximation schemes for constructinggeneralized (minimax, viscosity) solutions of first order partial differential equations of Hamilton-Jacobi type. A numerical realization of a grid procedure is de- scribed. Questions of convergence of approximate solutions to the generalized one (the value function) are discussed, and estimates of convergence are pointed out. The method provides equilibrium feedbacks in parallel with the value functions. Implementation of grid approximations for feedback control is justified. Coordination of long- and short-term interests of populations and individuals is indicated. A possible relation of the proposed game model to the classical replicator dynamics is outlined.

(4)
(5)

Contents

1 The Model of Game Dynamics 2

1.1 Dynamics, Payoffs, Player’s Preferences . . . 2

1.2 Properties of Dynamical System . . . 5

1.3 Reduced Dynamical Systems . . . 6

2 The Model of ”Short-Term” and ”Long-Term” Payoffs 8 2.1 ”Short-Term” Payoffs . . . 8

2.2 ”Long-Term” Payoffs . . . 8

3 Nash Equilibria in Dynamical System 10 3.1 Feedback Controls, Trajectories of Dynamical System . . . 10

3.2 Nash Equilibria . . . 11

3.3 Stability Properties of Dynamical System . . . 12

4 Construction of Nash Equilibria 13 4.1 Auxiliary Antagonistic (Zero-Sum) Games . . . 13

4.2 Equilibrium Feedback Controls . . . 13

5 Value Functions of Differential Games 14 5.1 Value Functions for Games with Infinite Horizon . . . 14

5.2 Properties of Value Functions . . . 14

6 Value Functions and Minimax (Viscosity) Solutions of Hamilton-Jacobi Equations 16 6.1 Hamilton-Jacobi Equations . . . 16

6.2 Generalized Derivatives, Differential Inequalities . . . 17

6.3 Piecewise Smooth Value Function . . . 18

6.4 Example . . . 19

7 Approximation Operators and Method of Contraction Mappings for Construction of Generalized Solutions of Hamilton-Jacobi Equations 21 7.1 Discrete Approximation of Hamilton-Jacobi Equations . . . 21

7.2 Method of Successive Approximations . . . 22

7.3 Discrete Approximations for the Second Differential Game . . . 23

7.4 Optimal Feedback Controls . . . 24

7.5 Conjecture on the Structure of Optimal Synthesis . . . 24

8 Numerical Realization of Iterative Procedure for Construction of Value Functions and Synthesis of Controls 25 8.1 Grid Schemes for Construction of Value Functions . . . 25

8.2 Grid Approximation of Control Synthesis . . . 27

(6)

9 Alliance of ”Long-Term” and ”Short-Term” Interests of Populations and

Individuals 28

9.1 ”Short-Term” Interests of Individuals and Constraints on Control Param- eters in the Game Problem for ”Long-Term” Interests of Populations . . . 28 9.2 Replicator Dynamics and Constraints on Control Parameters . . . 29

(7)

A Differential Model for a

2 × 2-Evolutionary Game Dynamics

A. M. Tarasyev

Introduction

We consider a nonantagonistic dynamical game of two large groups (populations) of indi- viduals. The dynamical system describingpopulation evolution is motivated by differen- tial (see [Isaacs, 1965]) and evolutionary game-theoretical models (see [Friedman, 1991], [Young, 1993]) relevant to problems of economic change (see [Nelson and Winter, 1982]) and population dynamics (see [Hofbauer, Sigmund, 1988]). For a particular class of 2×2 deterministic evolutionary game dynamics, an approach to analyse populations’ behaviors via methods of the theory of differential games was proposed in [Kryazhimskii, 1994]. In the present paper we develop some aspects of this approach for a stochastic 2×2 evolu- tionary game dynamics. Namely, we focus on finding equilibrium populations’ behaviors within the totally centralized regulation pattern. The model is reduced to a closed-loop differential game ([Krasovskii, Subbotin, 1988], [Krasovskii, 1985], [Kleimenov, 1993]) and analysed via methods of the theory of generalized (minimax, viscosity) solutions of Hamilton-Jacobi equations ([Crandall, Lions, 1983, 1984], [Subbotin, 1980, 1991]).

It is supposed that at each time instant, individuals of each population are divided into two parts playingdifferent strategies. Individuals from different populations meet pairwise randomly and get their current payoffs determined by a combination of their strategies. Populations’ goals are to maximize “long-term” payoffs represented as intergals of mathematical expectations for current payoffs, with an appropriate discount.

The right-hand side of the considered dynamical system depends on control parameters making individuals change their current strategies in accordance with a chosen feedback.

The nonantagonistic game in question consists in constructing Nash equilibrium feed- backs with respect to the “long-term” dynamical payoff functionals.

We consider the problem within the framework of the theory of positional differential games ([Krasovskii, Subbotin, 1988]). Following to [Kleimenov, 1993] we compose a Nash equilibrium with the help of solutions of auxiliary antagonistic (zero-sum) differential games. Solutions of these antagonistic games are based on algorithms of building the value functions. It is known ([Crandall, Lions, 1983, 1984], [Subbotin, 1980, 1991]) that the value function is the generalized solution of the Bellman-Isaacs equation being a first-order partial differential equation of Hamilton-Jacobi type. To construct value functions we use appropriate approximation schemes ([Dolcetta, 1983], [Souganidis, 1985], [Tarasyev, 1993], [Adiatulina, Tarasyev, 1987], [Bardi, Osher], [Subbotin, Tarasyev, Ushakov]). The correspondingnumerical procedure is reduced to the method of contraction operators.

Alongwith information of the value functions, the method provides equilibrium feedbacks.

Stress once again that a solution is obtained within the centralized scheme implying that long-term-equilibrium behaviors can, generally, contradict to short-term interests

Instinute of Mathematics and Mechanics, Russian Acad. Sci., Ekaterinburg, Russia

(8)

of individuals. We conclude the paper with a discussion of possible problem settings combininglong- and short-term principles for constructingdynamical Nash equilibria. In particular, we consider the possibility of linkingthe proposed dynamical Nash equilibrium approach with the classical replicator dynamics (see [Hofbauer, Sigmund, 1988]).

1 The Model of Game Dynamics

1.1 Dynamics, Payoffs, Player’s Preferences

We consider the followingdynamical system which describes game interaction of two populations of individuals. We can assume, for example, that one of this population is an aggregate of sellers and another is an aggregate of buyers. For clearness of arguments suppose that individuals of populations can choose at each moment of time one of two simple actions (strategies): buyers can ”buy” or ”not buy”, sellers can sell at ”high price”

or ”low price”. Actions of individuals of the first population are denoted by index i:

index i = 1 corresponds to action ”buy”, index i = 2 corresponds to action ”not buy”.

Analogously, actions of individuals of the second population are denoted by indexj: index j = 1 corresponds to ”high price”, indexj = 2 corresponds to ”low price”.

Let us consider an arbitrary pair composed by individuals of different populations.

This pair is interpreted as a situation (i, j) in the current game generated by strategyi of a player from the first population and by strategyj of a player from the second population.

Assume that payoff of players of the first population is determined by coefficients aij of the payoff matrix A = {aij}. Analogously, payoff of players of the second population in situation (i, j) is determined by coefficientsbij of the payoff matrix B ={bij}.

Let us assume that the first population consists of N individuals and at the instant of time t one part of them N1(t) plays the first strategy and another part N2(t) plays the second strategy. Of course, N = N1(t) +N2(t). Similarly, assume that the second population consists of M individuals and at the momentt one part M1(t) plays the first strategy and another part M2(t) plays the second strategy, M =M1(t) +M2(t).

Let us suppose that dynamics of the process in which individuals change their strate- gies from one to another is described by the multistep system of equations

N1(t+δ) = N1(t)−n12(t)δ+n21(t)δ N2(t+δ) = N2(t) +n12(t)δ−n21(t)δ M1(t+δ) = M1(t)−m12(t)δ+m21(t)δ

M2(t+δ) = M2(t) +m12(t)δ−m21(t)δ (1.1) The peculiarity of such dynamics consists in the fact that the number of individuals in populations which can change their strategies at the momenttis proportional to the time step δ. More precisely,

(9)

n12(t)δ is the number of individuals of the first population which change their strategies from the first to the second, 0≤n12(t)≤N1(t);

n21(t)δ is the number of individuals of the first population which change their strategies from the second to the first, 0≤n21(t)≤N2(t);

m12(t)δ is the number of individuals of the second population which change their strategies from the first to the second, 0≤m12(t)≤M1(t);

m21(t)δ is the number of individuals of the second population which change their strategies from the second to the first, 0≤m21(t)≤M2(t).

The fact that at the moment t only a part of individuals in population proportional to the time step δ can change their strategies has the following interpretations. For example, such inertia of behaviour of population can be explained if we assume that only

”small” part of individuals is active in changing of their behaviour. We can give another explanation if assume that there are some restrictions (”queues”) in case when ”large”

group of individuals change actions.

On the other hand we make rather natural assumption when suppose that the number nik(t) ormjl(t) of individuals which potentially may wish to change their actions (but not obligatory change, because the number of those who change is equal tonik(t)δormjl(t)δ) satisfies the restrictions

0≤nik ≤Ni(t), i, k= 1,2, i =k 0≤mjl ≤Mj(t), j, l = 1,2, j =l

Let us suppose that at the moment t players of different populations compose pairs randomly with equal probabilities. The probability of the fact that the randomly chosen pair plays the situation (i, j) is determined by the formula

pij(t) = Ni(t)Mj(t)

N M (1.2)

It is easy to verify standard relations for probabilities pij(t) pij(t)≥0,

i,j

pij(t) = 1, i, j = 1,2 (1.3) Let us pass from the multistep dynamical system (1.1) which connects quantities Ni(t+δ) and Mj(t +δ) with quantities Ni(t) and Mj(t) to the system which connects probabilities pij(t+δ) and pij(t).

Let us compose, for example, the correspondingdynamical equation for the probability p11(t+δ). We have

p11(t+δ) = N1(t+δ)M1(t+δ) N M

= (N1(t)−n12(t)δ+n21(t)δ)(M1(t)−m12(t)δ+m21(t)δ) N M

= N1(t)M1(t)

N M −

(10)

N1(t)M1(t) N M

m12(t)

M1(t)δ+N1(t)M2(t) N M

m21(t) M2(t)δ− N1(t)M1(t)

N M

n12(t)

N1(t)δ+ N2(t)M1(t) N M

n21(t) N2(t)δ+ (−n12(t) +n21(t))(−m12(t) +m21(t))

N M δ2

Takinginto account notations for probabilities pij(t) we obtain the equation

p11(t+δ)−p11(t) =−p11(t)v1δ+p12(t)v2δ−p11(t)u1δ+p21(t)u2δ+φ(t)δ2 (1.4) Here

u1 =u1(t) = n12(t) N1(t) u2 =u2(t) = n21(t) N2(t) v1 =v1(t) = m12(t)

M1(t) v2 =v2(t) = m21(t)

M2(t) (1.5)

0≤ui≤ 1, i= 1,2

0≤vj ≤ 1, j = 1,2 (1.6)

|φ(t)| ≤1

Dividingequation (1.4) into δ > 0 and passingto limit when δ ↓ 0 we come to the differential equation

˙

p11(t) = −p11(t)u1(t) +p21(t)u2(t)−p11(t)v1(t) +p12(t)v2(t) Analogously one can deduce differential equations for ˙p12(t), ˙p21(t), ˙p22(t).

Let us write differential equations which describe the motion of the considered dy- namical system usingstandard notations

x1 =p11, x2 =p12, x3 =p21, x4 =p22 (1.7) We obtain the followingbilinear system of differential equations with respect to prob- abilities x1, x2, x3, x4

˙

x1 = −x1u1+x3u2−x1v1+x2v2 =f1(x, u, v)

˙

x2 = −x2u1+x4u2+x1v1−x2v2 =f2(x, u, v)

˙

x3 = x1u1−x3u2−x3v1+x4v2 =f3(x, u, v)

˙

x4 = x2u1−x4u2+x3v1−x4v2 =f4(x, u, v)

(1.8)

Here

x= (x1, x2, x3, x4), u= (u1, u2), v = (v1, v2)

(11)

1.2 Properties of Dynamical System

Let us turn our attention to some properties of control dynamical system (1.8). This dynamics conserves the followingproperties of probabilities.

Lemma 1.1 If

x1(0) +x2(0) +x3(0) +x4(0) = 1 (1.9) then

x1(t) +x2(t) +x3(t) +x4(t) = 1 ∀t (1.10)

Proof. Actually

f1(x, u, v) +f2(x, u, v) +f3(x, u, v) +f4(x, u, v) = 0 and, hence,

˙

x1(t) + ˙x2(t) + ˙x3(t) + ˙x4(t) = 0 ∀t We obtain

x1(t) +x2(t) +x3(t) +x4(t) =c ∀t From (1.9) we have c= 1.

Lemma 1.2 If

x1(0)x4(0)−x2(0)x3(0) = 0 (1.11) then

x1(t)x4(t)−x2(t)x3(t) = 0 ∀t (1.12)

Proof. Let us note that (1.11) takes place for our model because x1(0)x4(0)−x2(0)x3(0) = N1M1N2M2

N M − N1M2N2M1

N M = 0

Let

z(t) =x1(t)x4(t)−x2(t)x3(t), z(0) = 0 We have

˙

z = ˙x1x4 +x14−x˙2x3−x23= (−x1x4+x2x3)(u1 +u2+v1+v2) =

−z(u1+u2+v1+v2) Hence

z(t) =z(0) exp(− t

0

(u1(s) +u2(s) +v1(s) +v2(s))ds) Since z(0) = 0 then z(t)≡0.

Thus the relations

x1+x2+x3+x4 = 1 (1.13)

x1x4−x2x3 = 0 (1.14)

are first integrals for dynamical system (1.8).

One can prove also that system (1.8) conserves the followingproperties of probabilities.

(12)

Lemma 1.3 If

0≤xi(0)≤1 (1.15)

then

0≤xi(t)≤1 ∀t i = 1,2,3,4 (1.16)

Proof. We prove this fact below for the reduced system.

1.3 Reduced Dynamical Systems

Let us note that since there are exist two first integrals (1.13),(1.14) for dynamical system (1.8) then its order can be reduced from the fourth to the second. We shall make this reduction in convenient way for us. To this end we introduce the followingvariables

y1 =x1+x2 is the probability of the fact that a player from the first population holds the first strategy y2 =x3+x4 is the probability of the fact that a player from

the first population holds the second strategy y3 =x1+x3 is the probability of the fact that a player from

the second population holds the first strategy y4 =x2+x4 is the probability of the fact that a player from

the second population holds the second strategy It is obvious that

y1+y2 = 1, y3+y4 = 1 (1.17)

Using(1.13),(1.14) one can prove also that

y1y3 =x1, y1y4 =x2, y2y3 =x3, y2y4 =x4 (1.18) Actually, we have, for example,

y1y3 = (x1+x2)(x1+x3) =x1(x1 +x2+x3) +x2x3=x1(x1+x2+x3+x4) = x1 From dynamical system (1.8) we obtain the followingcontrol system with respect to probabilities y1, y2, y3, y4

˙

y1 = −y1u1+y2u2

˙

y2 = y1u1−y2u2

˙

y3 = −y3v1+y4v2

˙

y4 = y3v1−y4v2

(1.19) Introducingnotations y1 = x, y3 = y we obtain from (1.17) that y2 = 1−x, y4 = 1−y. Substitutingthese relations to (1.19) we come to the followingbilinear system of differential equations with respect to probabilities x and y

˙

x = −xu1+ (1−x)u2

˙

y = −yv1+ (1−y)v2 (1.20)

Let us remind that controlsu1, u2, v1, v2 are pure numbers here. They are determined by relations (1.5) and satisfies restrictions (1.6). The extreme values of controlsui = 0 or ui = 1, i = 1,2 and vj = 0 or vj = 1, j = 1,2 can be enterpreted as signals to individuals of corresponding populations to change or not to change their actions. For example, the

(13)

value u1 = 0 signals to individuals of the first population to hold the first strategy, not to change actions, and the value u1 = 1 signals about necessity to alter the first strategy for the second.

In reality the system (1.20) can be replaced by the equivalent system of more simple type. Let us consider, for example, the first equation of the system (1.20) and transform it in the followingway

˙

x=−xu1+ (1−x)u2 =−x+x(1−u1) + (1−x)u2 Let us introduce the new control parameter

u =x(1−u1) + (1−x)u2 (1.21)

We determine now the restrictions for the control parameter u. We have u∈P(x), P(x) =P1(x) +P2(x)

P1(x) ={x(1−u1) : 0≤u1≤1, 0≤x≤1}= [0, x]

P2(x) = {(1−x)u2 : 0≤u2 ≤1, 0≤x≤1}= [0,1−x]

Hence, the set P(x) = [0, x] + [0,1−x] is the seg ment [0,1] and it does not depend on x. Thus, the first equation of the system (1.20) can be replaced by the equation

˙

x=−x+u, 0≤u≤1 Analogously, if we introduce the new control parameter

v=y(1−v1) + (1−y)v2 (1.22)

we obtain the differential equation with respect to the probability y

˙

y=−y+v, 0≤v≤1

Thus, we have the followingsystem of differential equations with respect to probabil- ities xand y

˙

x = −x+u, 0≤u≤1

˙

y = −y+v, 0≤v ≤1 (1.23)

Finally, let us verify that system (1.23) conserves properties of probabilities x and y.

Proof of Lemma 1.3.

Let u(s) : [0,+∞)→[0,1] be a control function measurable in the sense of Lebesgue.

Then accordingto Cauchy formula we have x(t) = (x0+

t 0

exp(s)u(s)ds) exp(−t)

If x(0) =x0 ≥0 then it is obvious that x(t)≥0. Let x(0) = x0 ≤1. Then x(t)≤(1 +

t

0

exp(s)u(s)ds) exp(−t)≤ exp(−t) +

0

t

exp(s)u(s+t)ds≤ exp(−t) +

0

texp(s)ds≤1 (1.24)

Since system (1.23) conserves properties of probabilities then equivalent systems (1.8) and (1.20) also conserve these properties.

(14)

2 The Model of ”Short-Term” and ”Long-Term”

Payoffs

2.1 ”Short-Term” Payoffs

Let us pass now to the question about evaluation of payoffs of populations. It is naturally to assume that the mathematical expectation connected with the correspondingpayoff matrix is the payoff of population at the moment of time t. Namely, quality of the state x(t) = (x1(t), x2(t), x3(t), x4(t)) of dynamical process (1.8) is evaluated for the first population by the mathematical expectation

EA(t) =a11x1(t) +a12x2(t) +a21x3(t) +a22x4(t) (2.1) and for the second population - by the mathematical expectation

EB(t) =b11x1(t) +b12x2(t) +b21x3(t) +b22x4(t) (2.2) Takinginto account the notations of the equivalent system (1.23) we can rewrite formulas (2.1),(2.2) by means of probabilities x(t), y(t) in the followingway

EA(t) =a11x(t)y(t) +a12x(t)(1−y(t)) + a21(1−x(t))y(t) +a22(1−x(t))(1−y(t)) = (a11−a12−a21+a22)x(t)y(t)−(a22−a12)x(t)−

(a22−a21)y(t) +a22 (2.3) EB(t) =b11x(t)y(t) +b12x(t)(1−y(t)) +

b21(1−x(t))y(t) +b22(1−x(t))(1−y(t)) = (b11−b12−b21+b22)x(t)y(t)−(b22−b12)x(t)−

(b22−b21)y(t) +b22 (2.4) Let us note that in the theory of static bimatrix games (see, for example [Vorobjev, 1984]) there are special notations for the coefficients of formulas (2.3),(2.4)

CA=a11−a12−a21+a22 α1 =a22−a12

α2 =a22−a21

(2.5) CB =b11−b12−b21+b22

β1=b22−b12

β2=b22−b21

(2.6) It is naturally to assume that formulas (2.3)-(2.4) describe the ”short-term” (calculated at the fixed moment of timet) payoffs of populations.

2.2 ”Long-Term” Payoffs

Let us consider now the dynamical system (1.23) on the infinite interval of time [0,+∞).

Infinity of the interval of time means that we are interested namely in the evolutionary character of behaviour of trajectories generated by the dynamical system.

(15)

Let (x(·), y(·)) = {(x(t), y(t)) : t∈ [0,+∞)} be an arbitrary trajectory of the system (1.23). We shall estimate the quality of this trajectory by the integral functionals with discount coefficients. For the first population we determine the quality of a trajectory by the functional

J1 =J1(x(·), y(·)) =

+ 0

exp(−λt)EA(t)dt=

+ 0

exp(−λt)g1(x(t), y(t))dt=

+

0

exp(−λt)(CAx(t)y(t)−α1x(t)−α2y(t) +a22)dt (2.7) and for the second population - by the functional

J2 =J2(x(·), y(·)) =

+

0

exp(−λt)EB(t)dt =

+

0

exp(−λt)g2(x(t), y(t))dt=

+

0 exp(−λt)(CBx(t)y(t)−β1x(t)−β2y(t) +b22)dt (2.8) Hereλ >0 is the so-called coefficient of discount (which means discountingof the process with growth of time). Functionals of such type are rather traditional for mathematical models in economics (see, for example, references in [Dolcetta, 1983]). Let us note that integrals (2.7), (2.8) always exist since 0≤x(t)≤1, 0≤y(t)≤1.

Functionals (2.7),(2.8) determine ”long-term” (on the infinite interval of time) payoffs of populations in contrast to ”short-term” payoffs (2.1)-(2.4).

Integrals (2.7),(2.8) can be interpreted also in terms of ”average” mathematical ex- pectations. Indeed, let us, for example, consider integral (2.7). We normalize it by multiplyingon the discount coefficientλ. We have

λ

+

0

exp(−λt)(CAx(t)y(t)−α1x(t)−α2y(t) +a22)dt= a11

+ 0

λexp(−λt)x1(t)dt+a12

+ 0

λexp(−λt)x2(t)dt+ a21

+

0

λexp(−λt)x3(t)dt+a22

+

0

λexp(−λt)x4(t)dt=

i,j

aij

+

0

λexp(−λt)pij(t)dt= a11p11+a12p12+a21p21+a22p22=

EA(x(·), y(·)) (2.9) Here

pij =

+

0

λexp(−λt)pij(t)dt, i, j = 1,2 (2.10) It is easy to verify that 0≤pij ≤1 andi,jpij = 1. Hence, one can interpret numbers pij as some special averaging (2.10) of probabilities pij(t), t ∈ [0,+∞) on the infinite interval of time. Therefore, it is naturally to regard number pij as ”average” probability of the fact that random pairs of individuals play the situation (i, j) on the infinite interval of time. The functional (2.9) is interpreted then as ”average” mathematical expectation EA(x(·), y(·)) of payoff for the first population on the infinite interval of time.

(16)

Analogously, the normalized functional (2.8) can be considered as ”average” mathe- matical expectation EB(x(·), y(·)) of payoff for the second population in infinite horizon

λ

+

0

exp(−λt)(CBx(t)y(t)−β1x(t)−β2y(t) +b22)dt = b11

+

0

λexp(−λt)x1(t)dt+b12

+

0

λexp(−λt)x2(t)dt+ b21

+ 0

λexp(−λt)x3(t)dt+b22

+ 0

λexp(−λt)x4(t)dt =

i,j

bij

+

0

λexp(−λt)pij(t)dt = b11p11+b12p12+b21p21+b22p22 =

EB(x(·), y(·)) (2.11)

3 Nash Equilibria in Dynamical System

3.1 Feedback Controls, Trajectories of Dynamical System

Let us assume that controls u and v for the first and the second populations in sys- tem (1.23) can be formed on the feedback principle. We suppose also that feedback controls (strategies) U = u(t, x, y, ε) and V = v(t, x, y, ε) can be discontinuous func- tions of phase variables (x, y). For definition of motions of the system generated by discontinuous controls U = u(t, x, y, ε), V = v(t, x, y, ε) we shall use the approach proposed in [Krasovskii, Subbotin, 1988]. Namely, let [0, T] be an interval of time,

∆ = {t0 = 0 < t1 < t2 < ... < tN = T} be its partition with an instant time-step δ = tk+1−tk and ε > 0 be a parameter of accuracy 0 < δ < β(ε) where β(ε) ↓ 0 when ε ↓ 0. Consider piecewise differentiable trajectory (x(·), y(·)) which is called ”Euler spline” and is actually the step-by-step solution of the followingdifferential equation

˙

x(t) =−x(t) +u(tk, x(tk), y(tk), ε) (3.1)

˙

y(t) =−y(t) +v(tk, x(tk), y(tk), ε) t∈[tk, tk+1), k = 0,1, ..., N−1

x(0) =x0, y(0) =y0

The uniformly continuous limits of ”Euler splines” when N → ∞, ε ↓ 0, δ ↓ 0 are called limit motions or simply motions of the system. The set of all these motions will be denoted by the symbol XT(x0, y0, U, V). This set is a compactum in the space of continuous functions determined on [0, T].

Definition 3.1 A continuous function (x(t), y(t)) : [0,+∞) →R2 is called trajectory of the system (1.23) on the infinite interval of time generated by strategies U and V from the initial position (x0, y0) if for any moment T,0 < T < ∞ there exists a trajectory (xT(t), yT(t))∈XT(x0, y0, U, V) which satisfies the condition (x(t), y(t)) = (xT(t), yT(t)), t ∈ [0, T]. The set of all these trajectories(x(t), y(t)) : [0,+∞)→ R2 will be denoted by the symbol X(x0, y0, U, V).

(17)

3.2 Nash Equilibria

Let us introduce definiton of Nash equilibria for pairs of feedback controls (U =u(t, x, y, ε), V =v(t, x, y, ε)).

Definition 3.2 A pair of feedback controls(U0, V0) is called optimal (equilibrium) in the sense of Nash for the fixed initial position (x0, y0)∈[0,1]×[0,1]if for any other feedback controls U and V the following condition holds. For all trajectories

(x0(·), y0(·))∈X(x0, y0, U0, V0), (x1(·), y1(·))∈X(x0, y0, U, V0) (x2(·), y2(·))∈X(x0, y0, U0, V) inequalities

J1(x0(·), y0(·))≥J1(x1(·), y1(·))

J2(x0(·), y0(·))≥J2(x2(·), y2(·)) (3.2) take place.

Remark 3.1 For construction of Nash equilibria we shall use the scheme proposed in [Kleimenov, 1993]. We shall give the short description of this scheme below in Section 4.

We need to modify slightly Definition 3.2 because in reality we construct some ε- approximations of optimal feedback controls. Therefore, let us introduce the notion of ε-equilibria.

Definition 3.3 Let ε >0and (x0, y0)∈[0,1]×[0,1]. A pair of feedback controls (Uε, Vε) is called ε-optimal (ε-equilibrium) in the sense of Nash for the fixed initial position(x0, y0) if for any other feedback controls U and V the following condition holds. For all trajecto- ries

(x0(·), y0(·))∈X(x0, y0, Uε, Vε), (x1(·), y1(·))∈X(x0, y0, U, Vε) (x2(·), y2(·))∈X(x0, y0, Uε, V) inequalities

J1(x0(·), y0(·))≥J1(x1(·), y1(·))−ε

J2(x0(·), y0(·))≥J2(x2(·), y2(·))−ε (3.3) take place.

Remark 3.2 Nash equilibria which will be constructed below possess indeed the more strong properties than the properties indicated in Definitions 3.2, 3.3. These strong prop- erties can be interpreted as dynamical stability of equilibria. Namely, we say that the Nash equilibrium has the property of dynamical stability if it is not profitable for populations to deviate from equilibrium feedback controls even along the whole equilibrium trajectory (x0(·), y0(·)), i.e. the following enequalities hold for all t >0

+ t

exp(−λt)g1(x0(t), y0(t))dt≥ +

t

exp(−λt)g1(x1(t), y1(t))dt−ε

+ t

exp(−λt)g2(x0(t), y0(t))dt≥ +

t

exp(−λt)g2(x2(t), y2(t))dt−ε (3.4) with ε = 0for Definition 3.2 and with ε >0 for Definition 3.3.

(18)

3.3 Stability Properties of Dynamical System

Let us formulate the property of stability for dynamical system (1.23).

Lemma 3.1 Let u(t) : [0,+∞) →[0,1], v(t) : [0,+∞)→ [0,1] be measurable controls and (x1(·), y1(·)),(x2(·), y2(·)) be two trajectories of the system (1.23) generated by these controls from different initial positions

(x1(0), y1(0)) = (x1, y1), (x2(0), y2(0)) = (x2, y2) Then

|x1(t)−x2(t)| ≤ |x1−x2|exp(−t)

|y1(t)−y2(t)| ≤ |y1−y2|exp(−t) (3.5) Moreover, there exists a constant C >0 depending only on coefficients of matrixes A and B such that the following estimates take place

|Jk(x1(·), y1(·))−Jk(x2(·), y2(·))| ≤ C

1 +λmax{|x1−x2|,|y1−y2|}, k = 1,2 (3.6) Proof. Consider, for example, differential equations for x1(·) andx2(·)

˙

x1(t) =−x1(t) +u(t), x1(0) =x1

˙

x2(t) =−x2(t) +u(t), x2(0) =x2 Subtractingthe second equation from the first we obtain

∆ ˙x(t) =−∆x(t), ∆x(0) =x1−x2

∆x(t) =x1(t)−x2(t) Hence

∆x(t) = ∆x(0) exp(−t) Analogously, we can obtain

∆y(t) = ∆y(0) exp(−t) For the difference of functionals we have the estimate

|J1(x1(·), y1(·))−J1(x2(·), y2(·))| ≤

+ 0

exp(−λt)|CA(x1(t)y1(t)−x2(t)y2(t))−α1(x1(t)−x2(t))−α2(y1(t)−y2(t))|dt≤ C

+

0

exp(−λt) max{|x1(t)−x2(t)|,|y1(t)−y2(t)|}dt≤ Cmax{|x1−x2|,|y1−y2|} +

0 exp(−(1 +λ)t)dt= C

1 +λmax{|x1−x2|,|y1−y2|}

Finally we give the following corollary.

Corollary 3.1 For integral functionals with finite horizonT, 0≤T <+∞the estimate similar to (3.6) takes place

| T

0

exp(−λt)gk(x1(t), y1(t))dt− T

0

exp(−λt)gk(x2(t), y2(t))dt| ≤ C

1 +λmax{|x1−x2|,|y1−y2|} (3.7)

(19)

4 Construction of Nash Equilibria

4.1 Auxiliary Antagonistic (Zero-Sum) Games

In order to construct equilibrium feedback controls we use the approach proposed in the theory of differential games (see, for example, [Kleimenov, 1993]).

Let us consider auxiliary antagonistic (zero-sum) differential games Γ1 and Γ2 with the functionals J1 (2.7) and J2 (2.8) respectively. In the game Γ1 the first population tries to maximize the functional J1(x(·), y(·)) usingfeedback controls U = u(t, x, y, ε).

The second population has the opposite aim, it tries to minimize this functional using feedback controlsV =v(t, x, y, ε). Conversely in the game Γ2 the first population aims for minimization of the functionalJ2(x(·), y(·)) and the second population wishes to maximize it.

By the symbols w1(x, y) and w2(x, y) we denote value functions of auxiliary antag- onistic games Γ1 and Γ2. It is known (see [Krasovskii, Subbotin, 1988], [Krasovskii, 1985]) that optimal feedback controls (control synthesis) Ui =u0i(t, x, y, ε), i= 1,2 and Vj = vj0(t, x, y, ε), j = 1,2 of the first and the second population in this antagonistic game can be constructed on the information of value functions wk(·), k = 1,2.

Strategies u01(t, x, y, ε) and v02(t, x, y, ε) can be interpreted as strategies which have positive nature (we shall call them ”positive” strategies) because they are aimed for maximization of their own quality functional. Let us mention that these strategies are cautious (guaranteed) feedback controls. Strategies u02(t, x, y, ε) and v10(t, x, y, ε) can be considered as strategies of ”punishment” because they minimize the payoff functional of another population.

4.2 Equilibrium Feedback Controls

Let us construct now the pair of feedback strategies which forms Nash equilibrium by past- ing together ”positive” and ”punishment” strategiesu0i(t, x, y, ε) andvj0(t, x, y, ε), i, j = 1,2 of two populations.

Let (x0, y0) ∈ [0,1]×[0,1] be an arbitrary initial position, ε > 0 be an accuracy parameter and (x(·), y(·))∈X(x0, y0, u1(·), v2(·)) be a trajectory generated by ”positive”

strategies u01(t, x, y, ε) and v20(t, x, y, ε). Let Tε >0 be such a moment of time that

+

Tε

exp(−λt)|gi(x(t), y(t))|dt < ε

By the symbols uε(t) : [0, Tε) → [0,1], vε(t) : [0, Tε) → [0,1] we denote step-by-step realizations of strategiesu01(t, x, y, ε), v02(t, x, y, ε) such that the correspondingstep-by- step motion (xε(·), yε(·)) satisfies the condition

tmax[0,Tε](x(t), y(t))−(xε(t), yε(t))< ε

It is affirmed that the followingpair of feedback controls U0 = u0(t, x, y, ε), V0 = v0(t, x, y, ε) pasted with the help of ”positive” strategies u01(t, x, y, ε), v02(t, x, y, ε) and

”punishment” strategies u02(t, x, y, ε), v10(t, x, y, ε) forms an ε-equilibrium situation in the sense of Nash

U0 =u0(t, x, y, ε) =

uε(t) if (x, y)−(xε(t), yε(t))< ε

u02(t, x, y, ε) otherwise (4.1)

(20)

V0 =v0(t, x, y, ε) =

vε(t) if (x, y)−(xε(t), yε(t))< ε

v01(t, x, y, ε) otherwise (4.2)

Let us remind that programming controls uε(t), vε(t) are realizations of ”positive”

strategiesu01(t, x, y, ε),v20(t, x, y, ε). In other words the “acceptable” trajectory (xε(·), yε(·)) is generated by “positive interests” of populations. The numberε can be interpreted as a parameter of “reliance” of populations to each other or a level of “risk” which populations allow in the game.

5 Value Functions of Differential Games

5.1 Value Functions for Games with Infinite Horizon

Let us consider now the auxiliary antagonistic (zero-sum) game Γ1 the dynamics of which is described by equations

˙

x=−x+u

˙

y=−y+v (5.1)

and the payoff functional is determined by relation J1(x(·), y(·)) =

+ 0

exp(−λt)g1(x(t), y(t))dt=

+

0

exp(−λt)(CAx(t)y(t)−α1x(t)−α2y(t) +a22)dt (5.2) The aim of the first population is to maximize the functional J1 (5.2) on trajectories (x(·), y(·)) of the system (5.1) by disposingof control parameterU =u(t, x, y, ε). The aim of the second population is opposite: to minimize the functional J1 (5.2) on trajectories (x(·), y(·)) of the system (5.1) by disposingof control parameter V =v(t, x, y, ε).

Accordingto formalization proposed in [Krasovskii, Subbotin, 1988] the antagonistic game (5.1),(5.2) has the value function (x0, y0)→w1(x0, y0)

w1(x0, y0) = sup

U

inf

(x(·),y(·))X(x0,y0,U)J1(x(·), y(·)) = infV sup

(x(·),y(·))X(x0,y0,V)

J1(x(·), y(·)) (5.3) Here symbolsX(x0, y0, U), X(x0, y0, V) denote trajectories of dynamical system (5.1) gen- erated by feedback controls U =u(t, x, y, ε) and V =v(t, x, y, ε).

5.2 Properties of Value Functions

Let us indicate some properties of value functionw1 : [0,1]×[0,1]→R.

Property 5.1 Letw1(t, x, y)be the value function for the differential game with dynamics (5.1) and payoff functional

J1(t, x(·), y(·)) =

+ t

exp(−λs)g1(x(s), y(s))ds (5.4) x(t) =x, y(t) =y

Then value functions w1(x, y) and w1(t, x, y) are connected by relation

w1(t, x, y) = exp(−λt)w1(x, y) (5.5)

(21)

Property 5.2 Let w1(t, T, x, y) be the value function for the differential game with dy- namics (5.1) and payoff functional

J1(t, T, x(·), y(·)) =

T t

exp(−λs)g1(x(s), y(s))ds (5.6) x(t) =x, y(t) =y

Then value functions w1(t, x, y)and w1(t, T, x, y)are connected by inequality max

(t,x,y)[0,T]×[0,1]×[0,1]|w1(t, x, y)−w1(t, T, x, y)| ≤ K

λ exp(−λT) (5.7) Here parameter K depends only on coefficients aij of matrix A={aij}.

Property 5.3 Value function w1 is bounded max

(x,y)[0,1]×[0,1]|w1(x, y)| ≤ K

λ (5.8)

Property 5.4 Value function w1 satisfies the Lipschitz condition

|w1(x1, y1)−w1(x2, y2)| ≤ K

1 +λ(|x1−x2|+|y1−y2|)≤

√2K

1 +λ((x1 −x2)2+ (y1−y2)2)12 (5.9)

Proof. Let us choose arbitrarily ε >0. Determine a moment of time T, 0≤T <+∞ from the relation

exp(−λT)K < ε i.e.

T > 1 λlnK

ε

Consider value function w1(0, T, x, y). Accordingto Property 5.2 we have

|w1(x, y)−w1(0, T, x, y)| ≤ K

λ exp(−λT)≤ Kε λ

LetU0 and V0 be feedback controls realizingexternal extremum in relations which deter- mine value function w1(0, T, x, y), i.e.

w1(0, T, x, y) = maxU min

(x(·),y(·))X(x,y,U)J1(0, T, x(·), y(·)) = min

(x(·),y(·))X(x,y,U0)J1(0, T, x(·), y(·)) = minV max

(x(·),y(·))X(x,y,V)J1(0, T, x(·), y(·)) =

(x(·),y(·))∈Xmax(x,y,V0)

J1(0, T, x(·), y(·))

(22)

Consider the difference

w1(0, T, x1, y1)−w1(0, T, x2, y2) =

(x(·),y(·))minX(x,y,U0)J1(0, T, x(·), y(·))− max

(x(·),y(·))X(x,y,V0)J1(0, T, x(·), y(·)) = J1(0, T, x0(·), y0(·))−J1(0, T, x0(·), y0(·))≤ J1(0, T, x1(·), y1(·))−J1(0, T, x2(·), y2(·)) +ε

Here (x1(·), y1(·)) and (x2(·), y2(·)) are ”Euler splines” which are close enough to trajec- tories (x0(·), y0(·)) and (x0(·), y0(·)) realizingcorrespondingextremum.

Accordingto the stability property of dynamical system (see Corollary 3.1) we have the followingestimate

J1(0, T, x1(·), y1(·))−J1(0, T, x2(·), y2(·))≤ K

1 +λ(|x1−x2|+|y1−y2|) Let us note that the last estimate does not depend on T.

Combiningall estimates together we obtain one-sided inequality for the Lipschitz condition (5.9). Changing places of equal ”maxmin” and ”minmax” in the previous arguments we come to the complementary estimate of the Lipschitz condition. Thus, condition (5.9) is proven.

Remark 5.1 Properties 5.1 - 5.4 are valid also for the value function w2

w2(x0, y0) = sup

V

inf

(x(·),y(·))X(x0,y0,V)J2(x(·), y(·)) = infU sup

(x(·),y(·))X(x0,y0,U)

J2(x(·), y(·)) (5.10) of the second auxiliary differential game.

6 Value Functions and Minimax (Viscosity) Solu- tions of Hamilton-Jacobi Equations

6.1 Hamilton-Jacobi Equations

The most principal properties of value functions are so-called properties of stability (u and v stability [Krasovskii, Subbotin, 1988]) which express the principle of optimality (suboptimality, superoptimality) of dynamical programming. At points where value func- tion w1(x, y) is differentiable these properties convert to the first order partial differential equation of Hamilton-Jacobi type which is called Bellman-Isaacs equation or the basic equation for optimal control problems. For our optimal guaranteed control problem (dif- ferential game) (5.1),(5.2) the corresponding Bellman-Isaacs equation has the following form

−λw(x, y)− ∂w

∂xx− ∂w

∂yy+

0maxu1

∂w

∂xu+ min

0v1

∂w

∂yv= 0 (6.1)

(23)

Remark 6.1 Equation (6.1) does not depend on time t. It is an equation of stationary type.

It is easy to see that

0maxu1

∂w

∂xu= max

0,∂w

∂x

0minv1

∂w

∂yv= min

0,∂w

∂y

(6.2) Therefore, equation (6.1) can be rewritten in the form

−λw(x, y)−∂w

∂xx−∂w

∂yy+ max

0,∂w

∂x

+ min

0,∂w

∂y

= 0 (6.3)

6.2 Generalized Derivatives, Differential Inequalities

Usually value function w1(x, y) is not differentiable everywhere. It satisfies only the Lipschitz condition (5.9), i.e. it is only almost everywhere differentiable accordingto Rademaher theorem.

It is shown in the theory of minimax (viscosity) solutions [Subbotin, 1980, 1991], [Crandall, Lions, 1983, 1984] that value function w1 must satisfy generalized differential inequalities at points where it is not differentiable (measure of this set is equal to zero).

These inequalities generalize Bellman-Isaacs equation and express the optimality principle of dynamical programming in infinitesimal form.

In order to write the principle of dynamical programming in infintesimal form let us introduce the notions of directional derivatives and conjugate derivatives for functions which satisfy the Lipschitz condition.

Let function w(x, y) : [0,1]×[0,1]→R satisfy the Lipschitz condition.

Definition 6.1 Lower and upper derivatives of functionwat a point(x, y)∈(0,1)×(0,1) in a direction h= (h1, h2)∈R2 are determined by relations

w(x, y)|(h) = lim inf

δ0

w(x+δh1, y+δh2)−w(x, y) δ

+w(x, y)|(h) = lim sup

δ0

w(x+δh1, y+δh2)−w(x, y)

δ (6.4)

Definition 6.2 Lower and upper conjugate derivatives of function w at a point (x, y)∈ (0,1)×(0,1) are determined by equalities

Dw(x, y)|(s) = sup

hR2

(s, h −∂w(x, y)|(h)) Dw(x, y)|(s) = inf

hR2(s, h −∂+w(x, y)|(h)) (6.5)

(24)

It is proven (see, for example, [Subbotin, 1980, 1991], [Crandall, Lions, 1983, 1984], [Subbotin, Tarasyev, 1985], [Dolcetta, 1983], [Adiatulina, Tarasyev, 1987]) in the theory of minimax (viscosity) solution that value functionw1(x, y) is uniquely determined by the pair of differential inequalities which connect conjugate derivatives with the Hamiltonian of dynamical system. Let us give this result for differential game (5.1),(5.2).

Theorem 6.1 For a Lipschitz continuous function w: [0,1]×[0,1]→R to be the value function of differential game (5.1),(5.2) it is necessary and sufficient that the following differential inequalities hold for all (x, y, s)∈(0,1)×(0,1)×R2

Dw(x, y)|(s)≥ −λw(x, y) +H(x, y, s) (6.6) Dw(x, y)|(s)≤ −λw(x, y) +H(x, y, s) (6.7)

Here the symbol H(x, y, s) denotes the Hamiltonian of dynamical system (5.1)

H(x, y, s) =−s1x−s2y+ max{0, s1}+ min{0, s2}+g1(x, y) (6.8) s= (s1, s2)∈R2

g1(x, y) =CAxy−α1x−α2y+a22

Remark 6.2 Inequalities (6.6),(6.7) turn into Bellman-Isaacs equation (6.3) at points where function w is differentiable.

Remark 6.3 Differential inequality (6.7) expresses the so-called property of u-stability of the value function w which implies the existence of directions of nondecrease. Similarly, differential inequality (6.6) expresses property of v-stability which means that there exist directions of nonincrease for the value function w. Thus, relations (6.6),(6.7) can be interpreted as infinitesimal form of the dynamical programming principle.

6.3 Piecewise Smooth Value Function

The prevalent situation is the piecewise smooth construction for the value function w. In this case smooth components of the value function must satisfy Bellman-Isaacs (Hamilton- Jacobi) equation (6.3) and on surfaces of continuous contraction of these smooth com- ponents differential inequalities (6.6),(6.7) must hold. Realization of differential inequal- ities (6.6),(6.7) on surfaces of contraction is essential. There exist numerous examples demonstratingthat there exist piecewise smooth functions which satisfy Hamilton-Jacobi equation at points of their differentiability but these functions are not the value function because they don’t satisfy relations (6.6),(6.7).

For piecewise smooth functions directional derivatives and conjugate derivatives can be calculated in the framework of nonsmooth and convex analysis. Let us give corre- spondingformulas. Assume that for function w the followingequalities are valid in some neighborhood Oε(x, y) of point (x, y)∈(0,1)×(0,1)

w(x, y) = min

iI max

jJ ϕij(x, y) = max

jJ min

iI ϕij(x, y) (6.9) w(x, y) =ϕij(x, y), i∈I, j ∈J

Referenzen

ÄHNLICHE DOKUMENTE

The purpose of this paper is to characterize classical and lower semicon- tinuous solutions to the Hamilton-Jacobi-Isaacs partial differential equa- tions associated with

In this paper the author makes stronger assumptions which elim- inate the discrimination of t h e control v ( t ) and make it possible to define the

For the game it is important to have forecasts of how the costs of extracting coal will develop in various countries over time, dependent on both the actual and the

This correspondence motivates a simple way of valuing the players (or factors): the players, or factor re- presentatives, set prices on themselves in the face of a market

A host of researchers in the last 15 years [8] have suggested another way to explain software architectures: Instead of pre- senting an architectural model as a

Model: an abstract representation of a system created for a specific purpose.... A very popular model:

The second sub-process describes the load processing decision: a player receives a load order and has to process the load within the limits of capacity or deliver load

Since contracts are based on ongoing payoffs, the notion of contract conformance is defined as a guarantee of an all-time non-negative accumulated payoff in the games induced by