Topics in Algorithmic Game Theory and Economics

(1)

Topics in Algorithmic Game Theory and Economics

Pieter Kleer

Max Planck Institute for Informatics Saarland Informatics Campus

December 16, 2020

Lecture 6

Finite games III - Computation of CE and CCE

1 / 32

(2)

Hierarchy of equilibrium concepts

Finite (cost minimization) gameΓ = (N,(S_i)i∈N,(C_i)i∈N)consists of:

Finite setN ofplayers.

Finitestrategy setS_ifor every playeri ∈N.

Cost functionC_i :×_jS_j →Rfor everyi∈N.

PNE

Exists in any congestion game MNE

Exists in any finite game, but hard to compute CE

Computationally tractable CCE

(3)

Two-player games with mixed strategies (recap)

Two-player game(A,B)given by matricesA,B ∈R^m×n.

Row player Alice and column player Bobindependentlychoose strategyx ∈∆_Aandy ∈∆_B.

Givesproduct distributionσx,y :S_A× S_B →[0,1]over strategy profiles:

σx,y(a_k,b_`) =σ_k` =x_ky_` fork =1, . . . ,mand`=1, . . . ,n.

Example

Distribution over strategy profiles is given by x₁y₁ x₁y₂ x₁y₃ x₂y₁ x₂y₂ x₂y₃

b1 b2 b3

a1 (0,2) (1,0) (2,1) a2 (3,0) (0,1) (1,4)

Thenexpected cost(for Alice)C_A(σx,y)is x^TAy =E(ak,b`)∼σx,y[C_A(a_k,b_`)] = X

(ak,b`)∈SA×SB

σk`C_A(a_k,b_`)

Remember thatA_k` =C_A(a_k,b_`).

3 / 32

(4)

Beyond mixed strategies

(5)

Equilibrium concepts as distributions over S

_A

× S

_B

We have seen the following equilibrium concepts:

PNE: Strategy profiles= (s_A,s_B)∈ S_A× S_B. Givesindicator distributionσoverS_A× S_Bwith

σ(t) =

1 t =s 0 t 6=s . MNE: Mixed strategies(x,y)withx ∈∆_A,y ∈∆_B.

Givesproduct distributionσoverS_A× S_B, where σ(ak,b`) =σ_k`=xky`.

(C)CE: (Coarse) correlated equilibrium will be defined as general distributionσ overS_A× S_B.

I.e., not induced by specific player actions.

5 / 32

(6)

Game of Chicken

Alice and Bob both approach an intersection.

Bob

Stop Go

Alice Stop (0,0) (3,−1) Go (−1,3) (4,4) Two PNEs: (Stop, Go), (Go, Stop).

One MNE: Both players randomize over Stop and Go.

Distributions over strategy profiles(a,b)for these equilibria are 0 1

0 0

,

0 0 1 0

and ₁

4 1 1 4 4

1 4

.

(7)

Sensible ‘equilibrium’ would be the strategy profile distribution

σ =

0 ¹₂

1

2 0

.

Cannot be achieved as mixed equilibrium.

There arenox ∈∆_A,y ∈∆_Bsuch thatσ_k`=x_ky`for all k, `∈ {1,2}.

Idea is to introducetraffic light(mediator or trusted third party).

Traffic light samples/draws one of the two strategy profiles from distribution.

Gives realization as recommendation to the players.

Tells Alice to go and Bob to stop (or vice versa)

Conditioned on this recommendation, the best thing for a player to do is to follow it.

7 / 32

(8)

Correlated equilibrium (CE), informal

Correlated equilibriumσ :S_A× S_B →[0,1]can be seen as follows.

Mediator (third party) draws samplex = (x_A,x_B)∼σ.

σis known to Alice and Bob, but notx.

Givesprivate recommendationx_Ato Alice, andx_B to Bob.

Alice and Bob do not know each other’s recommendation!

Game of Chicken is the exception to the rule.

Recommendations give players some info on whichx was drawn.

Player assumes all other players play private recommendation, i.e., Alice assumes Bob follows his recommendation (and vice versa).

In CE, no player has incentive to deviate given its recommendation.

Remark

We will later seeno-regretalgorithms whose output is a coarse correlated equilibrium (similar algorithms exist converging to CE).

Therefore, for (C)CE, it’s not always necessary that all players know the distributionσ up front, nor that there is an actual third party that samples from it.

(9)

Example

Distribution over strategy profiles is given by

σ =

σ₁₁ σ₁₂ σ₁₃ σ21 σ22 σ23

=

0 1/8 1/8

2/8 1/8 3/8

b1 b2 b3

a1 (0,2) (1,0) (2,1) a2 (3,0) (0,1) (1,4)

Suppose Alice getssecond rowa2as recommendation.

This gives Alice a (conditional) probability distributionρfor column privately recommended to Bob:

Columnb₁with probability

2 8 2

8+¹₈+³₈ = ²₆. Columnb2with probability

1 8 2

8+¹₈+³₈ = ¹₆. Columnb3with probability

3 8 2

8+¹₈+³₈ = ³₆.

Assuming distributionρover Bob’s recommendation, notion of CE says Alice should have no incentive to deviate to first rowa₁(in expectation).

Eρ[Rowa₂] =3×2/6+0×1/6+1×3/6=9/6.

Eρ[Rowa1] =0×2/6+1×1/6+2×3/6=7/6.

σas above is not a CE!

9 / 32

(10)

(Coarse) correlated equilibrium

Definition (Correlated equilibrium (CE))

A distributionσon×_iS_i is acorrelated equilibriumif for everyi ∈N and t_i ∈ S_i, and every unilateral deviationt_i⁰ ∈ S_i, it holds that

Ex∼σ[C_i(x)|x_i =t_i]≤Ex∼σ

C_i(t_i⁰,x_−i)|x_i =t_i .

Set-up for coarse correlated equilibrium is similar, but you do not get private recommendation from mediator.

Definition (Coarse correlated equilibrium (CCE))

A distributionσ on×_iS_i is acoarse correlated equilibriumif for every i ∈N, and every unilateral deviationt_i⁰ ∈ S_i, it holds that

Ex∼σ[C_i(x)]≤Ex∼σ

C_i(t_i⁰,x_−i) .

Exercise: Prove that every CE is also CCE.

(Hint: Use “Law of total expectation”, i.e.,E[X] =E

E[X|Y] .)

(11)

Final remark

Remember MNE is pair of mixed strategies(x,y)that yieldsproduct distributionσover strategy profiles.

MNE through the lens of CE

MNE is special case of CE, where recommendation of mediator gives no extra information.

Conditional distributionρthat Alice constructs for Bob’s private recommendation is the same for every row recommended to her.

It is just the mixed strategyy of Bob!

That is, the recommendation is not relevant for Alice.

Exercise: Check this yourself!

11 / 32

(12)

Computation of correlated equilibrium

(13)

Linear program for computing CE

Once again, linear programming comes to the rescue...

Theorem

For a given finite gameΓ, there is a linear program that computes a correlated equilibriumσ :×_iS_i →[0,1]in time polynomial in| ×_iS_i|and the input size of the cost functions.

LP has one variableσs for every strategy profiles∈ ×_iS_i.

Polynomial number of variables if number of players|N|is assumed to be constant.

For two-player games, note that|S_A× S_B|=mn.

Conditions in definition CE can be modeled as linear program.

We will do the 2-player case, and focus on Alice.

13 / 32

(14)

Definition (Correlated equilibrium (for Alice))

DistributionσonS_A× S_B iscorrelated equilibriumif for every

“recommendation”a_k ∈ S_Aand deviationa_k⁰ it holds, withx = (x_A,x_B), that

Ex∼σ[C_A(x_A,x_B)|x_A=a_k]≤Ex∼σ[C_A(a_k⁰,x_B)|x_A=a_k]. LP will have variablesσ_k` fork =1, . . . ,m,`=1, . . . ,n.

Linear constraints for Alice

Fix “recommended row”a_k ∈ S_A and “deviation”a_k⁰ ∈ S_A. Now, Ex∼σ[C_A(x_A,x_B)|x_A=a_k] = X

`=1,...,n

C_A(a_k,b`)P[x = (x_A,x_B)|x_A=a_k]

= X

`=1,...,n

C_A(a_k,b`) σ_k`

P

rσ_kr

= 1

P

rσ_kr X

`=1,...,n

CA(ak,b`)σ_k`

Ex∼σ[C_A(a_k⁰,x_B)|x_A=a_k] = X

`=1,...,n

C_A(a_k⁰,b`)P[x = (x_A,x_B)|x_A=a_k]

= 1

P

rσ_kr X

`=1,...,n

CA(ak⁰,b`)σ_k`

(15)

Conditions in (1) for Alice and Bob are equivalent to X

`=1,...,n

C_A(a_k,b_`)σ_k` ≤ X

`=1,...,n

C_A(a_k⁰,b_`)σ_k` ∀a_k,a_k⁰ ∈ S_A X

k=1,...,m

C_B(a_k,b_`)σ_k_`≤ X

k=1,...,m

C_B(a_k,b_`⁰)σ_k` ∀b_`,b_`⁰ ∈ S_B

Note that these are linear constraints in variablesσk`. Linear program is now as follows

max 0

s.t. X

`=1,...,n

C_A(a_k,b`)σ_k`≤ X

`=1,...,n

C_A(a_k⁰,b`)σ_k` ∀a_k,a_k⁰ ∈ S_A X

k=1,...,m

CB(ak,b`)σ_k`≤ X

k=1,...,m

CB(ak,b`⁰)σ_k` ∀b`,b`⁰∈ S_B X

k,`

σ_k`=1

σ_k`≥0 ∀a_k ∈ S_A,b`∈ S_B

15 / 32

(16)

max 0

s.t. X

`=1,...,n

CA(ak,b`)σ_k`≤ X

`=1,...,n

CA(ak⁰,b`)σ_k` ∀a_k,ak⁰ ∈ S_A X

k=1,...,m

C_B(a_k,b`)σ_k`≤ X

k=1,...,m

C_B(a_k,b`⁰)σ_k_` ∀b_`,b`⁰∈ S_B X

k,`

σ_k`=1

σ_k`≥0 ∀a_k ∈ S_A,b`∈ S_B

This is afeasiblity LP, i.e., the goal is to find a feasible solution of the linear system above.

We know at least one solution exists by Nash’s theorem Remember that every MNE is also CE.

Why not use the LP for computing MNE? We would need additional non-linearconstraint enforcing thatσ is product distribution.

(17)

For general finite gameΓ = (N,(S_i),(C_i)), linear program is as follows.

max 0

s.t. X

s−i∈S−i

Ci(si,s−i)σ(si,s−i)

≤ X

s−i∈S−i

Ci(s⁰_i,s−i)σ(si,s−i) ∀i∈Nandsi,s_i⁰∈ S_i X

s∈×iSi

σ(s) =1

σ(s)≥0 ∀s∈ ×_iS_i.

We use notationS_−i =S₁× · · · × S_i−1× S_i+1× · · · × S_|N|. Ands−i = (s₁, . . . ,si−1,s_i+1, . . . ,s_|N|)∈ S−i.

17 / 32

(18)

No-regret dynamics

(19)

The model

Alice, with strategy setS_A={a₁, . . . ,a_m}, plays “game” against adversary.

Adversary will be used to represent other players later on.

(Looking ahead:Players will converge to CCE.)

The game dynamics

Game is repeated forT rounds. In every roundt=1, . . . ,T: Alice picks prob. distr. p^(t)= (p^(t₁⁾, . . . ,p^(t_m⁾)over{a₁, . . . ,a_m}.

Adversary picks cost vectorc^(t⁾:{a₁, . . . ,am} →[0,1].

Strategya^(t)is drawn according top^(t⁾.

Alice incurs costc^(t)(a^(t))and gets to know cost vectorc^(t).

Goal of Alice is to minimizeaverage cost 1 T

T

X

t=1

c^(t⁾(a^(t)) against a benchmark.What should the benchmark be?

19 / 32

(20)

Best choices in hindsight

Would be natural to compare againstbest choices in hindsight, i.e., 1

T

X

t=1 a∈SminA

c^(t)(a^(t)).

Alice’s cost if she would have put all prob. mass on strategy minimizing cost vectorc^(t), in every stept.

Said differently, Alice’sbest choiceif adversary would have to announce cost vector first.

“Regret” of Alice, for given realizationa⁽¹⁾, . . . ,a^(T), would then be defined as

α(T) = 1 T

T

X

t=1

c^(t)(a^(t))−

T

X

t=1

a∈Smin_Ac^(t)(a)

!

Alice hasno (or vanishing) regretif, in expectation,α(T)→0 when T → ∞.

We next illustrate that, under the definitionα(T), vanishing regret cannot be achieved. (We will give an alternative definition afterwards.)

(21)

α(T) = 1 T

T

X

t=1

c^(t⁾(a^(t))−

T

X

t=1

a∈Smin_Ac^(t)(a)

!

Example

Suppose Alice has two actionsaandb. In every round, when Alice choosesp^(t⁾= (p^(t_a⁾,p^(t)_b ), adversary sets

c^(t)= (c^(t)(a),c^(t)(b)) = (

(1,0) p_a^(t)≥1/2 (0,1) p_b^(t)>1/2 .

Expected cost in roundt is at least 1/2.

Best choice in hindsight gives cost of zero in roundt.

Expected regretα(T)is at least1/2for every T .

Is there another “sensible” regret definition yielding non-trivial results?21 / 32

(22)

Best fixed strategy in hindsight

Another possibility is to compare withbest fixed strategy in hindsight, i.e.,

a∈SminA

1 T

T

X

t=1

c^(t)(a).

We interchange “minimum” and “summation”.

Cost if Alice would have been allowed to switch to (fixed) strategyain every step.

This is still w.r.t. to adversarial cost vectors chosen by adversary based on prob. distributionsp^(t).

Definition (Regret)

For given prob. distr.p⁽¹⁾, . . . ,p^(T) and adversarial cost vectorsc⁽¹⁾, . . . ,c^(T), the(time-averaged) regretof Alice is defined as

ρ(T) = 1 T

T

X

t=1

c^(t)(a^(t))−min

a∈S_A T

X

t=1

c^(t)(a)

! ,

wherea^(t)is sample according to distributionp^(t). Alice hasno regret(w.r.t.

chosen distributions) ifρ(T)→0 whenT → ∞, in expectation.

(23)

ρ(T) = 1 T

T

X

t=1

c^(t⁾(a^(t))− min

a∈S_A T

X

t=1

c^(t)(a)

!

More general, “Alice” is called anonline decision making algorithm.

Such an algorithm can use cost vectorsc⁽¹⁾, . . . ,c^(t), distributions p⁽¹⁾, . . . ,p^(t), and realizationsa^(t⁾, to define distributionp^(t+1). Adversary can use the same information, including the chosen p^(t+1), to define adversarial cost vectorc^(t⁺¹⁾.

Theorem

There exist no-regret (online decision making) algorithms with ρ(T) =Op

log(m)/T

where m is the number of strategies.

T =O(log(m)/²)steps enough to get regret below.

Will later seeMultiplicative Weights (MW)algorithm achieving this.

23 / 32

(24)

No-regret dynamics

Convergence to (approximate) CCE

(25)

No-regret player dynamics

LetΓ = (N,(S_i),(Ci)), withCi :×_jS_j →[0,1], and assume everyi ∈Nis equipped with no-regret algorithmA_i.

At this point, consider theA_i as “blackbox” algorithms.

We writemi =|S_i|for number of strategies of playeri ∈N.

No-regret (player) dynamics

In every roundt=1, . . . ,T, every playeri∈N does the following:

UseA_i to compute prob. distr.p^(t)_i = (p_i^(t)_,1, . . . ,p^(t)_i,m_i)overS_i. The adversarial cost vectorc^(t)_i :S_i →[0,1]is defined as

c_i^(t)(a) =E_s^(t)

−i∼σ_−i^(t)Ci

a,s^(t)_−i

∀a∈ S_i

whereσ_−i^(t)isproduct distributionformed by thep_j^(t)withj∈N\ {i}.

Strategya^(t)∼p^(t)_i is drawn, and playeri incurs corresponding cost.

That is,σ^(t)_−i :S_−i →[0,1]is prob. distribution given byσ_−i^(t)(s−i) =Q

j6=ip_j,s^(t)_j, wheres_−i = (s₁, . . . ,s_i₋₁,s_i+1, . . . ,s_|N|)∈ S_−i.

RememberS−i =S₁× · · · × Si−1× S_i₊₁× · · · × S_|N|.

25 / 32

(26)

UseA_i to compute prob. distr.p^(t)_i = (p_i^(t)_,1, . . . ,p^(t)_i,m

i)overS_i.

The adversarial cost vectorc^(t)_i :S_i →[0,1]is defined as c_i^(t)(a) =E_s^(t)

−i∼σ_−i^(t)C_i a,s^(t)_−i

∀a∈ S_i

Setσ^(t)=Q

jp^(t)_j and letσ_T = _T¹PT

t=1σ^(t) be thetime averageof all production distributions obtained in stepst =1, . . . ,T.

Theorem

The time averageσ_T is aρ_i(T)-approximate CCE, i.e., it satisfies Es∼σT[C_i(s)]≤Es∼σT[C_i(s⁰_i,s−i)] +ρ_i(T) for i ∈N and fixed s_i⁰∈ S_i.

(27)

Theorem

The time averageσT is aρi(T)-approximate CCE, i.e., it satisfies Es∼σ_T [C_i(s)]≤Es∼σ_T

C_i(s⁰_i,s−i)

+ρ_i(T) for i ∈N and fixed s⁰_i ∈ S_i.

Proof (sketch):First note that expected costE_t

i∼p^(t)_i

h c^(t)_i (ti)i

incurred by playeri in roundt boils down toEs∼σ^(t)[C_i(s)]. Then

Es∼σT[Ci(s)] = 1 T

T

X

t=1

E_a∼p^(t)

i

h c_i^(t)(a)i

(time average)

= min

si∈Si

1 T

T

X

t=1

c_i^(t)(s_i) +ρ_i(T) (definition ofρ_i(T))

= min

si∈Si

1 T

T

X

t=1

E_s^(t)

si,s_−i^(t)

+ρ_i(T) (definition of c_i^(t))

≤ 1 T

T

X

t=1

Es^(t)∼σ^(t)C_i s⁰_i,s_−i^(t)

+ρ_i(T) (plugging in fixed s_i⁰)

=Es∼σ_TC_i(s⁰_i,s−i) +ρ_i(T) (time average)

27 / 32

(28)

No-regret dynamics

Multiplicative Weights algorithm

(29)

No-regret dynamics

UseA_i to compute prob. distr.p^(t)_i = (p_i^(t)_,1, . . . ,p^(t)_i,m

i)overS_i.

The adversarial cost vectorc^(t)_i :S_i →[0,1]is defined as c_i^(t)(a) =E_s^(t)

a,s^(t)_−i

∀a∈ S_i

We next give promised MW algorithm that can be used for theA_i, and that has the no-regret property. That is, in expectation,

ρ_i(T) = 1 T

T

X

t=1

c_i^(t)(a^(t))−min

a∈S_i T

X

t=1

c^(t_i ⁾(a)

!

→0

wherea^(t)∼p^(t)_i .

29 / 32

(30)

Multiplicative Weights (MW) algorithm

Fix playeri∈N. The MW algorithm maintains weightw_a^(t)for everya∈ S_i and chooses distribution for roundt as

p^(t)_i,a= w_a^(t) P

r∈Siw_r^(t).

Weight update procedure

Given is input parameterη∈(0,1/2].

Initial weights are set atw_a⁽¹⁾=1 fora∈ S_i (uniform distribution overS_i).

For roundt=1, . . . ,T:

After seeing cost vectorc_i^(t), weights for roundt+1 are defined as w_a^(t+1)= (1−η)^c^(t)ⁱ ^(a)·w_a^(t)

High cost strategies get smaller (relative) weight in next round.

Theorem (Littlestone and Warmuth, 1994)

MW algorithm, withη=p

log(mi)/T , has regretρ_i(T)≤2p

log(mi)/T

(31)

Overview

31 / 32

(32)

Hierarchy of equilibrium concepts

PNE

Exists in any congestion game MNE

Exists in any finite game, but hard to compute CE

Computationally tractable CCE

Easily computable with no-regret dynamics

Final remarks

CE can also be obtained through certain player dynamics.

See, e.g., Chapter 18 [R2016].

Recall that PoA bounds, that we derived for PNE, extend to CCE by means of thesmoothness framework.