Topics in Algorithmic Game Theory and Economics
Pieter Kleer
Max Planck Institute for Informatics Saarland Informatics Campus
December 16, 2020
Lecture 6
Finite games III - Computation of CE and CCE
1 / 32
Hierarchy of equilibrium concepts
Finite (cost minimization) gameΓ = (N,(Si)i∈N,(Ci)i∈N)consists of:
Finite setN ofplayers.
Finitestrategy setSifor every playeri ∈N.
Cost functionCi :×jSj →Rfor everyi∈N.
PNE
Exists in any congestion game MNE
Exists in any finite game, but hard to compute CE
Computationally tractable CCE
Two-player games with mixed strategies (recap)
Two-player game(A,B)given by matricesA,B ∈Rm×n.
Row player Alice and column player Bobindependentlychoose strategyx ∈∆Aandy ∈∆B.
Givesproduct distributionσx,y :SA× SB →[0,1]over strategy profiles:
σx,y(ak,b`) =σk` =xky` fork =1, . . . ,mand`=1, . . . ,n.
Example
Distribution over strategy profiles is given by x1y1 x1y2 x1y3 x2y1 x2y2 x2y3
b1 b2 b3
a1 (0,2) (1,0) (2,1) a2 (3,0) (0,1) (1,4)
Thenexpected cost(for Alice)CA(σx,y)is xTAy =E(ak,b`)∼σx,y[CA(ak,b`)] = X
(ak,b`)∈SA×SB
σk`CA(ak,b`)
Remember thatAk` =CA(ak,b`).
3 / 32
Beyond mixed strategies
Equilibrium concepts as distributions over S
A× S
BWe have seen the following equilibrium concepts:
PNE: Strategy profiles= (sA,sB)∈ SA× SB. Givesindicator distributionσoverSA× SBwith
σ(t) =
1 t =s 0 t 6=s . MNE: Mixed strategies(x,y)withx ∈∆A,y ∈∆B.
Givesproduct distributionσoverSA× SB, where σ(ak,b`) =σk`=xky`.
(C)CE: (Coarse) correlated equilibrium will be defined as general distributionσ overSA× SB.
I.e., not induced by specific player actions.
5 / 32
Game of Chicken
Game of Chicken
Alice and Bob both approach an intersection.
Bob
Stop Go
Alice Stop (0,0) (3,−1) Go (−1,3) (4,4) Two PNEs: (Stop, Go), (Go, Stop).
One MNE: Both players randomize over Stop and Go.
Distributions over strategy profiles(a,b)for these equilibria are 0 1
0 0
,
0 0 1 0
and 1
4 1 1 4 4
1 4
.
Sensible ‘equilibrium’ would be the strategy profile distribution
σ =
0 12
1
2 0
.
Cannot be achieved as mixed equilibrium.
There arenox ∈∆A,y ∈∆Bsuch thatσk`=xky`for all k, `∈ {1,2}.
Idea is to introducetraffic light(mediator or trusted third party).
Traffic light samples/draws one of the two strategy profiles from distribution.
Gives realization as recommendation to the players.
Tells Alice to go and Bob to stop (or vice versa)
Conditioned on this recommendation, the best thing for a player to do is to follow it.
7 / 32
Correlated equilibrium (CE), informal
Correlated equilibriumσ :SA× SB →[0,1]can be seen as follows.
Mediator (third party) draws samplex = (xA,xB)∼σ.
σis known to Alice and Bob, but notx.
Givesprivate recommendationxAto Alice, andxB to Bob.
Alice and Bob do not know each other’s recommendation!
Game of Chicken is the exception to the rule.
Recommendations give players some info on whichx was drawn.
Player assumes all other players play private recommendation, i.e., Alice assumes Bob follows his recommendation (and vice versa).
In CE, no player has incentive to deviate given its recommendation.
Remark
We will later seeno-regretalgorithms whose output is a coarse correlated equilibrium (similar algorithms exist converging to CE).
Therefore, for (C)CE, it’s not always necessary that all players know the distributionσ up front, nor that there is an actual third party that samples from it.
Example
Distribution over strategy profiles is given by
σ =
σ11 σ12 σ13 σ21 σ22 σ23
=
0 1/8 1/8
2/8 1/8 3/8
b1 b2 b3
a1 (0,2) (1,0) (2,1) a2 (3,0) (0,1) (1,4)
Suppose Alice getssecond rowa2as recommendation.
This gives Alice a (conditional) probability distributionρfor column privately recommended to Bob:
Columnb1with probability
2 8 2
8+18+38 = 26. Columnb2with probability
1 8 2
8+18+38 = 16. Columnb3with probability
3 8 2
8+18+38 = 36.
Assuming distributionρover Bob’s recommendation, notion of CE says Alice should have no incentive to deviate to first rowa1(in expectation).
Eρ[Rowa2] =3×2/6+0×1/6+1×3/6=9/6.
Eρ[Rowa1] =0×2/6+1×1/6+2×3/6=7/6.
σas above is not a CE!
9 / 32
(Coarse) correlated equilibrium
Definition (Correlated equilibrium (CE))
A distributionσon×iSi is acorrelated equilibriumif for everyi ∈N and ti ∈ Si, and every unilateral deviationti0 ∈ Si, it holds that
Ex∼σ[Ci(x)|xi =ti]≤Ex∼σ
Ci(ti0,x−i)|xi =ti .
Set-up for coarse correlated equilibrium is similar, but you do not get private recommendation from mediator.
Definition (Coarse correlated equilibrium (CCE))
A distributionσ on×iSi is acoarse correlated equilibriumif for every i ∈N, and every unilateral deviationti0 ∈ Si, it holds that
Ex∼σ[Ci(x)]≤Ex∼σ
Ci(ti0,x−i) .
Exercise: Prove that every CE is also CCE.
(Hint: Use “Law of total expectation”, i.e.,E[X] =E
E[X|Y] .)
Final remark
Remember MNE is pair of mixed strategies(x,y)that yieldsproduct distributionσover strategy profiles.
MNE through the lens of CE
MNE is special case of CE, where recommendation of mediator gives no extra information.
Conditional distributionρthat Alice constructs for Bob’s private recommendation is the same for every row recommended to her.
It is just the mixed strategyy of Bob!
That is, the recommendation is not relevant for Alice.
Exercise: Check this yourself!
11 / 32
Computation of correlated equilibrium
Linear program for computing CE
Once again, linear programming comes to the rescue...
Theorem
For a given finite gameΓ, there is a linear program that computes a correlated equilibriumσ :×iSi →[0,1]in time polynomial in| ×iSi|and the input size of the cost functions.
LP has one variableσs for every strategy profiles∈ ×iSi.
Polynomial number of variables if number of players|N|is assumed to be constant.
For two-player games, note that|SA× SB|=mn.
Conditions in definition CE can be modeled as linear program.
We will do the 2-player case, and focus on Alice.
13 / 32
Definition (Correlated equilibrium (for Alice))
DistributionσonSA× SB iscorrelated equilibriumif for every
“recommendation”ak ∈ SAand deviationak0 it holds, withx = (xA,xB), that
Ex∼σ[CA(xA,xB)|xA=ak]≤Ex∼σ[CA(ak0,xB)|xA=ak]. LP will have variablesσk` fork =1, . . . ,m,`=1, . . . ,n.
Linear constraints for Alice
Fix “recommended row”ak ∈ SA and “deviation”ak0 ∈ SA. Now, Ex∼σ[CA(xA,xB)|xA=ak] = X
`=1,...,n
CA(ak,b`)P[x = (xA,xB)|xA=ak]
= X
`=1,...,n
CA(ak,b`) σk`
P
rσkr
= 1
P
rσkr X
`=1,...,n
CA(ak,b`)σk`
Ex∼σ[CA(ak0,xB)|xA=ak] = X
`=1,...,n
CA(ak0,b`)P[x = (xA,xB)|xA=ak]
= 1
P
rσkr X
`=1,...,n
CA(ak0,b`)σk`
Conditions in (1) for Alice and Bob are equivalent to X
`=1,...,n
CA(ak,b`)σk` ≤ X
`=1,...,n
CA(ak0,b`)σk` ∀ak,ak0 ∈ SA X
k=1,...,m
CB(ak,b`)σk`≤ X
k=1,...,m
CB(ak,b`0)σk` ∀b`,b`0 ∈ SB
Note that these are linear constraints in variablesσk`. Linear program is now as follows
max 0
s.t. X
`=1,...,n
CA(ak,b`)σk`≤ X
`=1,...,n
CA(ak0,b`)σk` ∀ak,ak0 ∈ SA X
k=1,...,m
CB(ak,b`)σk`≤ X
k=1,...,m
CB(ak,b`0)σk` ∀b`,b`0∈ SB X
k,`
σk`=1
σk`≥0 ∀ak ∈ SA,b`∈ SB
15 / 32
max 0
s.t. X
`=1,...,n
CA(ak,b`)σk`≤ X
`=1,...,n
CA(ak0,b`)σk` ∀ak,ak0 ∈ SA X
k=1,...,m
CB(ak,b`)σk`≤ X
k=1,...,m
CB(ak,b`0)σk` ∀b`,b`0∈ SB X
k,`
σk`=1
σk`≥0 ∀ak ∈ SA,b`∈ SB
This is afeasiblity LP, i.e., the goal is to find a feasible solution of the linear system above.
We know at least one solution exists by Nash’s theorem Remember that every MNE is also CE.
Why not use the LP for computing MNE? We would need additional non-linearconstraint enforcing thatσ is product distribution.
For general finite gameΓ = (N,(Si),(Ci)), linear program is as follows.
max 0
s.t. X
s−i∈S−i
Ci(si,s−i)σ(si,s−i)
≤ X
s−i∈S−i
Ci(s0i,s−i)σ(si,s−i) ∀i∈Nandsi,si0∈ Si X
s∈×iSi
σ(s) =1
σ(s)≥0 ∀s∈ ×iSi.
We use notationS−i =S1× · · · × Si−1× Si+1× · · · × S|N|. Ands−i = (s1, . . . ,si−1,si+1, . . . ,s|N|)∈ S−i.
17 / 32
No-regret dynamics
The model
Alice, with strategy setSA={a1, . . . ,am}, plays “game” against adversary.
Adversary will be used to represent other players later on.
(Looking ahead:Players will converge to CCE.)
The game dynamics
Game is repeated forT rounds. In every roundt=1, . . . ,T: Alice picks prob. distr. p(t)= (p(t1), . . . ,p(tm))over{a1, . . . ,am}.
Adversary picks cost vectorc(t):{a1, . . . ,am} →[0,1].
Strategya(t)is drawn according top(t).
Alice incurs costc(t)(a(t))and gets to know cost vectorc(t).
Goal of Alice is to minimizeaverage cost 1 T
T
X
t=1
c(t)(a(t)) against a benchmark.What should the benchmark be?
19 / 32
Best choices in hindsight
Would be natural to compare againstbest choices in hindsight, i.e., 1
T
T
X
t=1 a∈SminA
c(t)(a(t)).
Alice’s cost if she would have put all prob. mass on strategy minimizing cost vectorc(t), in every stept.
Said differently, Alice’sbest choiceif adversary would have to announce cost vector first.
“Regret” of Alice, for given realizationa(1), . . . ,a(T), would then be defined as
α(T) = 1 T
T
X
t=1
c(t)(a(t))−
T
X
t=1
a∈SminAc(t)(a)
!
Alice hasno (or vanishing) regretif, in expectation,α(T)→0 when T → ∞.
We next illustrate that, under the definitionα(T), vanishing regret cannot be achieved. (We will give an alternative definition afterwards.)
α(T) = 1 T
T
X
t=1
c(t)(a(t))−
T
X
t=1
a∈SminAc(t)(a)
!
Example
Suppose Alice has two actionsaandb. In every round, when Alice choosesp(t)= (p(ta),p(t)b ), adversary sets
c(t)= (c(t)(a),c(t)(b)) = (
(1,0) pa(t)≥1/2 (0,1) pb(t)>1/2 .
Expected cost in roundt is at least 1/2.
Best choice in hindsight gives cost of zero in roundt.
Expected regretα(T)is at least1/2for every T .
Is there another “sensible” regret definition yielding non-trivial results?21 / 32
Best fixed strategy in hindsight
Another possibility is to compare withbest fixed strategy in hindsight, i.e.,
a∈SminA
1 T
T
X
t=1
c(t)(a).
We interchange “minimum” and “summation”.
Cost if Alice would have been allowed to switch to (fixed) strategyain every step.
This is still w.r.t. to adversarial cost vectors chosen by adversary based on prob. distributionsp(t).
Definition (Regret)
For given prob. distr.p(1), . . . ,p(T) and adversarial cost vectorsc(1), . . . ,c(T), the(time-averaged) regretof Alice is defined as
ρ(T) = 1 T
T
X
t=1
c(t)(a(t))−min
a∈SA T
X
t=1
c(t)(a)
! ,
wherea(t)is sample according to distributionp(t). Alice hasno regret(w.r.t.
chosen distributions) ifρ(T)→0 whenT → ∞, in expectation.
ρ(T) = 1 T
T
X
t=1
c(t)(a(t))− min
a∈SA T
X
t=1
c(t)(a)
!
More general, “Alice” is called anonline decision making algorithm.
Such an algorithm can use cost vectorsc(1), . . . ,c(t), distributions p(1), . . . ,p(t), and realizationsa(t), to define distributionp(t+1). Adversary can use the same information, including the chosen p(t+1), to define adversarial cost vectorc(t+1).
Theorem
There exist no-regret (online decision making) algorithms with ρ(T) =Op
log(m)/T
where m is the number of strategies.
T =O(log(m)/2)steps enough to get regret below.
Will later seeMultiplicative Weights (MW)algorithm achieving this.
23 / 32
No-regret dynamics
Convergence to (approximate) CCE
No-regret player dynamics
LetΓ = (N,(Si),(Ci)), withCi :×jSj →[0,1], and assume everyi ∈Nis equipped with no-regret algorithmAi.
At this point, consider theAi as “blackbox” algorithms.
We writemi =|Si|for number of strategies of playeri ∈N.
No-regret (player) dynamics
In every roundt=1, . . . ,T, every playeri∈N does the following:
UseAi to compute prob. distr.p(t)i = (pi(t),1, . . . ,p(t)i,mi)overSi. The adversarial cost vectorc(t)i :Si →[0,1]is defined as
ci(t)(a) =Es(t)
−i∼σ−i(t)Ci
a,s(t)−i
∀a∈ Si
whereσ−i(t)isproduct distributionformed by thepj(t)withj∈N\ {i}.
Strategya(t)∼p(t)i is drawn, and playeri incurs corresponding cost.
That is,σ(t)−i :S−i →[0,1]is prob. distribution given byσ−i(t)(s−i) =Q
j6=ipj,s(t)j, wheres−i = (s1, . . . ,si−1,si+1, . . . ,s|N|)∈ S−i.
RememberS−i =S1× · · · × Si−1× Si+1× · · · × S|N|.
25 / 32
No-regret (player) dynamics
In every roundt=1, . . . ,T, every playeri∈N does the following:
UseAi to compute prob. distr.p(t)i = (pi(t),1, . . . ,p(t)i,m
i)overSi.
The adversarial cost vectorc(t)i :Si →[0,1]is defined as ci(t)(a) =Es(t)
−i∼σ−i(t)Ci a,s(t)−i
∀a∈ Si
whereσ−i(t)isproduct distributionformed by thepj(t)withj∈N\ {i}.
Strategya(t)∼p(t)i is drawn, and playeri incurs corresponding cost.
Setσ(t)=Q
jp(t)j and letσT = T1PT
t=1σ(t) be thetime averageof all production distributions obtained in stepst =1, . . . ,T.
Theorem
The time averageσT is aρi(T)-approximate CCE, i.e., it satisfies Es∼σT[Ci(s)]≤Es∼σT[Ci(s0i,s−i)] +ρi(T) for i ∈N and fixed si0∈ Si.
Theorem
The time averageσT is aρi(T)-approximate CCE, i.e., it satisfies Es∼σT [Ci(s)]≤Es∼σT
Ci(s0i,s−i)
+ρi(T) for i ∈N and fixed s0i ∈ Si.
Proof (sketch):First note that expected costEt
i∼p(t)i
h c(t)i (ti)i
incurred by playeri in roundt boils down toEs∼σ(t)[Ci(s)]. Then
Es∼σT[Ci(s)] = 1 T
T
X
t=1
Ea∼p(t)
i
h ci(t)(a)i
(time average)
= min
si∈Si
1 T
T
X
t=1
ci(t)(si) +ρi(T) (definition ofρi(T))
= min
si∈Si
1 T
T
X
t=1
Es(t)
−i∼σ−i(t)Ci
si,s−i(t)
+ρi(T) (definition of ci(t))
≤ 1 T
T
X
t=1
Es(t)∼σ(t)Ci s0i,s−i(t)
+ρi(T) (plugging in fixed si0)
=Es∼σTCi(s0i,s−i) +ρi(T) (time average)
27 / 32
No-regret dynamics
Multiplicative Weights algorithm
No-regret dynamics
No-regret (player) dynamics
In every roundt=1, . . . ,T, every playeri∈N does the following:
UseAi to compute prob. distr.p(t)i = (pi(t),1, . . . ,p(t)i,m
i)overSi.
The adversarial cost vectorc(t)i :Si →[0,1]is defined as ci(t)(a) =Es(t)
−i∼σ−i(t)Ci
a,s(t)−i
∀a∈ Si
whereσ−i(t)isproduct distributionformed by thepj(t)withj∈N\ {i}.
Strategya(t)∼p(t)i is drawn, and playeri incurs corresponding cost.
We next give promised MW algorithm that can be used for theAi, and that has the no-regret property. That is, in expectation,
ρi(T) = 1 T
T
X
t=1
ci(t)(a(t))−min
a∈Si T
X
t=1
c(ti )(a)
!
→0
wherea(t)∼p(t)i .
29 / 32
Multiplicative Weights (MW) algorithm
Fix playeri∈N. The MW algorithm maintains weightwa(t)for everya∈ Si and chooses distribution for roundt as
p(t)i,a= wa(t) P
r∈Siwr(t).
Weight update procedure
Given is input parameterη∈(0,1/2].
Initial weights are set atwa(1)=1 fora∈ Si (uniform distribution overSi).
For roundt=1, . . . ,T:
After seeing cost vectorci(t), weights for roundt+1 are defined as wa(t+1)= (1−η)c(t)i (a)·wa(t)
High cost strategies get smaller (relative) weight in next round.
Theorem (Littlestone and Warmuth, 1994)
MW algorithm, withη=p
log(mi)/T , has regretρi(T)≤2p
log(mi)/T
Overview
31 / 32
Hierarchy of equilibrium concepts
PNE
Exists in any congestion game MNE
Exists in any finite game, but hard to compute CE
Computationally tractable CCE
Easily computable with no-regret dynamics
Final remarks
CE can also be obtained through certain player dynamics.
See, e.g., Chapter 18 [R2016].
Recall that PoA bounds, that we derived for PNE, extend to CCE by means of thesmoothness framework.