• Keine Ergebnisse gefunden

Topics in Algorithmic Game Theory and Economics

N/A
N/A
Protected

Academic year: 2021

Aktie "Topics in Algorithmic Game Theory and Economics"

Copied!
32
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Topics in Algorithmic Game Theory and Economics

Pieter Kleer

Max Planck Institute for Informatics Saarland Informatics Campus

December 16, 2020

Lecture 6

Finite games III - Computation of CE and CCE

1 / 32

(2)

Hierarchy of equilibrium concepts

Finite (cost minimization) gameΓ = (N,(Si)i∈N,(Ci)i∈N)consists of:

Finite setN ofplayers.

Finitestrategy setSifor every playeri ∈N.

Cost functionCijSj →Rfor everyi∈N.

PNE

Exists in any congestion game MNE

Exists in any finite game, but hard to compute CE

Computationally tractable CCE

(3)

Two-player games with mixed strategies (recap)

Two-player game(A,B)given by matricesA,B ∈Rm×n.

Row player Alice and column player Bobindependentlychoose strategyx ∈∆Aandy ∈∆B.

Givesproduct distributionσx,y :SA× SB →[0,1]over strategy profiles:

σx,y(ak,b`) =σk` =xky` fork =1, . . . ,mand`=1, . . . ,n.

Example

Distribution over strategy profiles is given by x1y1 x1y2 x1y3 x2y1 x2y2 x2y3

b1 b2 b3

a1 (0,2) (1,0) (2,1) a2 (3,0) (0,1) (1,4)

Thenexpected cost(for Alice)CAx,y)is xTAy =E(ak,b`)∼σx,y[CA(ak,b`)] = X

(ak,b`)∈SA×SB

σk`CA(ak,b`)

Remember thatAk` =CA(ak,b`).

3 / 32

(4)

Beyond mixed strategies

(5)

Equilibrium concepts as distributions over S

A

× S

B

We have seen the following equilibrium concepts:

PNE: Strategy profiles= (sA,sB)∈ SA× SB. Givesindicator distributionσoverSA× SBwith

σ(t) =

1 t =s 0 t 6=s . MNE: Mixed strategies(x,y)withx ∈∆A,y ∈∆B.

Givesproduct distributionσoverSA× SB, where σ(ak,b`) =σk`=xky`.

(C)CE: (Coarse) correlated equilibrium will be defined as general distributionσ overSA× SB.

I.e., not induced by specific player actions.

5 / 32

(6)

Game of Chicken

Game of Chicken

Alice and Bob both approach an intersection.

Bob

Stop Go

Alice Stop (0,0) (3,−1) Go (−1,3) (4,4) Two PNEs: (Stop, Go), (Go, Stop).

One MNE: Both players randomize over Stop and Go.

Distributions over strategy profiles(a,b)for these equilibria are 0 1

0 0

,

0 0 1 0

and 1

4 1 1 4 4

1 4

.

(7)

Sensible ‘equilibrium’ would be the strategy profile distribution

σ =

0 12

1

2 0

.

Cannot be achieved as mixed equilibrium.

There arenox A,y Bsuch thatσk`=xky`for all k, `∈ {1,2}.

Idea is to introducetraffic light(mediator or trusted third party).

Traffic light samples/draws one of the two strategy profiles from distribution.

Gives realization as recommendation to the players.

Tells Alice to go and Bob to stop (or vice versa)

Conditioned on this recommendation, the best thing for a player to do is to follow it.

7 / 32

(8)

Correlated equilibrium (CE), informal

Correlated equilibriumσ :SA× SB →[0,1]can be seen as follows.

Mediator (third party) draws samplex = (xA,xB)∼σ.

σis known to Alice and Bob, but notx.

Givesprivate recommendationxAto Alice, andxB to Bob.

Alice and Bob do not know each other’s recommendation!

Game of Chicken is the exception to the rule.

Recommendations give players some info on whichx was drawn.

Player assumes all other players play private recommendation, i.e., Alice assumes Bob follows his recommendation (and vice versa).

In CE, no player has incentive to deviate given its recommendation.

Remark

We will later seeno-regretalgorithms whose output is a coarse correlated equilibrium (similar algorithms exist converging to CE).

Therefore, for (C)CE, it’s not always necessary that all players know the distributionσ up front, nor that there is an actual third party that samples from it.

(9)

Example

Distribution over strategy profiles is given by

σ =

σ11 σ12 σ13 σ21 σ22 σ23

=

0 1/8 1/8

2/8 1/8 3/8

b1 b2 b3

a1 (0,2) (1,0) (2,1) a2 (3,0) (0,1) (1,4)

Suppose Alice getssecond rowa2as recommendation.

This gives Alice a (conditional) probability distributionρfor column privately recommended to Bob:

Columnb1with probability

2 8 2

8+18+38 = 26. Columnb2with probability

1 8 2

8+18+38 = 16. Columnb3with probability

3 8 2

8+18+38 = 36.

Assuming distributionρover Bob’s recommendation, notion of CE says Alice should have no incentive to deviate to first rowa1(in expectation).

Eρ[Rowa2] =3×2/6+0×1/6+1×3/6=9/6.

Eρ[Rowa1] =0×2/6+1×1/6+2×3/6=7/6.

σas above is not a CE!

9 / 32

(10)

(Coarse) correlated equilibrium

Definition (Correlated equilibrium (CE))

A distributionσon×iSi is acorrelated equilibriumif for everyi ∈N and ti ∈ Si, and every unilateral deviationti0 ∈ Si, it holds that

Ex∼σ[Ci(x)|xi =ti]≤Ex∼σ

Ci(ti0,x−i)|xi =ti .

Set-up for coarse correlated equilibrium is similar, but you do not get private recommendation from mediator.

Definition (Coarse correlated equilibrium (CCE))

A distributionσ on×iSi is acoarse correlated equilibriumif for every i ∈N, and every unilateral deviationti0 ∈ Si, it holds that

Ex∼σ[Ci(x)]≤Ex∼σ

Ci(ti0,x−i) .

Exercise: Prove that every CE is also CCE.

(Hint: Use “Law of total expectation”, i.e.,E[X] =E

E[X|Y] .)

(11)

Final remark

Remember MNE is pair of mixed strategies(x,y)that yieldsproduct distributionσover strategy profiles.

MNE through the lens of CE

MNE is special case of CE, where recommendation of mediator gives no extra information.

Conditional distributionρthat Alice constructs for Bob’s private recommendation is the same for every row recommended to her.

It is just the mixed strategyy of Bob!

That is, the recommendation is not relevant for Alice.

Exercise: Check this yourself!

11 / 32

(12)

Computation of correlated equilibrium

(13)

Linear program for computing CE

Once again, linear programming comes to the rescue...

Theorem

For a given finite gameΓ, there is a linear program that computes a correlated equilibriumσ :×iSi →[0,1]in time polynomial in| ×iSi|and the input size of the cost functions.

LP has one variableσs for every strategy profiles∈ ×iSi.

Polynomial number of variables if number of players|N|is assumed to be constant.

For two-player games, note that|SA× SB|=mn.

Conditions in definition CE can be modeled as linear program.

We will do the 2-player case, and focus on Alice.

13 / 32

(14)

Definition (Correlated equilibrium (for Alice))

DistributionσonSA× SB iscorrelated equilibriumif for every

“recommendation”ak ∈ SAand deviationak0 it holds, withx = (xA,xB), that

Ex∼σ[CA(xA,xB)|xA=ak]≤Ex∼σ[CA(ak0,xB)|xA=ak]. LP will have variablesσk` fork =1, . . . ,m,`=1, . . . ,n.

Linear constraints for Alice

Fix “recommended row”ak ∈ SA and “deviation”ak0 ∈ SA. Now, Ex∼σ[CA(xA,xB)|xA=ak] = X

`=1,...,n

CA(ak,b`)P[x = (xA,xB)|xA=ak]

= X

`=1,...,n

CA(ak,b`) σk`

P

rσkr

= 1

P

rσkr X

`=1,...,n

CA(ak,b`k`

Ex∼σ[CA(ak0,xB)|xA=ak] = X

`=1,...,n

CA(ak0,b`)P[x = (xA,xB)|xA=ak]

= 1

P

rσkr X

`=1,...,n

CA(ak0,b`k`

(15)

Conditions in (1) for Alice and Bob are equivalent to X

`=1,...,n

CA(ak,b`k` ≤ X

`=1,...,n

CA(ak0,b`k` ∀ak,ak0 ∈ SA X

k=1,...,m

CB(ak,b`k`≤ X

k=1,...,m

CB(ak,b`0k` ∀b`,b`0 ∈ SB

Note that these are linear constraints in variablesσk`. Linear program is now as follows

max 0

s.t. X

`=1,...,n

CA(ak,b`k` X

`=1,...,n

CA(ak0,b`k` ∀ak,ak0 ∈ SA X

k=1,...,m

CB(ak,b`k` X

k=1,...,m

CB(ak,b`0k` ∀b`,b`0∈ SB X

k,`

σk`=1

σk`0 ∀ak ∈ SA,b`∈ SB

15 / 32

(16)

max 0

s.t. X

`=1,...,n

CA(ak,b`k` X

`=1,...,n

CA(ak0,b`k` ∀ak,ak0 ∈ SA X

k=1,...,m

CB(ak,b`k` X

k=1,...,m

CB(ak,b`0k` ∀b`,b`0∈ SB X

k,`

σk`=1

σk`0 ∀ak ∈ SA,b`∈ SB

This is afeasiblity LP, i.e., the goal is to find a feasible solution of the linear system above.

We know at least one solution exists by Nash’s theorem Remember that every MNE is also CE.

Why not use the LP for computing MNE? We would need additional non-linearconstraint enforcing thatσ is product distribution.

(17)

For general finite gameΓ = (N,(Si),(Ci)), linear program is as follows.

max 0

s.t. X

s−i∈S−i

Ci(si,s−i)σ(si,s−i)

X

s−i∈S−i

Ci(s0i,s−i)σ(si,s−i) ∀iNandsi,si0∈ Si X

s∈×iSi

σ(s) =1

σ(s)0 ∀s∈ ×iSi.

We use notationS−i =S1× · · · × Si−1× Si+1× · · · × S|N|. Ands−i = (s1, . . . ,si−1,si+1, . . . ,s|N|)∈ S−i.

17 / 32

(18)

No-regret dynamics

(19)

The model

Alice, with strategy setSA={a1, . . . ,am}, plays “game” against adversary.

Adversary will be used to represent other players later on.

(Looking ahead:Players will converge to CCE.)

The game dynamics

Game is repeated forT rounds. In every roundt=1, . . . ,T: Alice picks prob. distr. p(t)= (p(t1), . . . ,p(tm))over{a1, . . . ,am}.

Adversary picks cost vectorc(t):{a1, . . . ,am} →[0,1].

Strategya(t)is drawn according top(t).

Alice incurs costc(t)(a(t))and gets to know cost vectorc(t).

Goal of Alice is to minimizeaverage cost 1 T

T

X

t=1

c(t)(a(t)) against a benchmark.What should the benchmark be?

19 / 32

(20)

Best choices in hindsight

Would be natural to compare againstbest choices in hindsight, i.e., 1

T

T

X

t=1 a∈SminA

c(t)(a(t)).

Alice’s cost if she would have put all prob. mass on strategy minimizing cost vectorc(t), in every stept.

Said differently, Alice’sbest choiceif adversary would have to announce cost vector first.

“Regret” of Alice, for given realizationa(1), . . . ,a(T), would then be defined as

α(T) = 1 T

T

X

t=1

c(t)(a(t))

T

X

t=1

a∈SminAc(t)(a)

!

Alice hasno (or vanishing) regretif, in expectation,α(T)0 when T → ∞.

We next illustrate that, under the definitionα(T), vanishing regret cannot be achieved. (We will give an alternative definition afterwards.)

(21)

α(T) = 1 T

T

X

t=1

c(t)(a(t))−

T

X

t=1

a∈SminAc(t)(a)

!

Example

Suppose Alice has two actionsaandb. In every round, when Alice choosesp(t)= (p(ta),p(t)b ), adversary sets

c(t)= (c(t)(a),c(t)(b)) = (

(1,0) pa(t)≥1/2 (0,1) pb(t)>1/2 .

Expected cost in roundt is at least 1/2.

Best choice in hindsight gives cost of zero in roundt.

Expected regretα(T)is at least1/2for every T .

Is there another “sensible” regret definition yielding non-trivial results?21 / 32

(22)

Best fixed strategy in hindsight

Another possibility is to compare withbest fixed strategy in hindsight, i.e.,

a∈SminA

1 T

T

X

t=1

c(t)(a).

We interchange “minimum” and “summation”.

Cost if Alice would have been allowed to switch to (fixed) strategyain every step.

This is still w.r.t. to adversarial cost vectors chosen by adversary based on prob. distributionsp(t).

Definition (Regret)

For given prob. distr.p(1), . . . ,p(T) and adversarial cost vectorsc(1), . . . ,c(T), the(time-averaged) regretof Alice is defined as

ρ(T) = 1 T

T

X

t=1

c(t)(a(t))min

a∈SA T

X

t=1

c(t)(a)

! ,

wherea(t)is sample according to distributionp(t). Alice hasno regret(w.r.t.

chosen distributions) ifρ(T)0 whenT → ∞, in expectation.

(23)

ρ(T) = 1 T

T

X

t=1

c(t)(a(t))− min

a∈SA T

X

t=1

c(t)(a)

!

More general, “Alice” is called anonline decision making algorithm.

Such an algorithm can use cost vectorsc(1), . . . ,c(t), distributions p(1), . . . ,p(t), and realizationsa(t), to define distributionp(t+1). Adversary can use the same information, including the chosen p(t+1), to define adversarial cost vectorc(t+1).

Theorem

There exist no-regret (online decision making) algorithms with ρ(T) =Op

log(m)/T

where m is the number of strategies.

T =O(log(m)/2)steps enough to get regret below.

Will later seeMultiplicative Weights (MW)algorithm achieving this.

23 / 32

(24)

No-regret dynamics

Convergence to (approximate) CCE

(25)

No-regret player dynamics

LetΓ = (N,(Si),(Ci)), withCi :×jSj [0,1], and assume everyi Nis equipped with no-regret algorithmAi.

At this point, consider theAi as “blackbox” algorithms.

We writemi =|Si|for number of strategies of playeri N.

No-regret (player) dynamics

In every roundt=1, . . . ,T, every playeriN does the following:

UseAi to compute prob. distr.p(t)i = (pi(t),1, . . . ,p(t)i,mi)overSi. The adversarial cost vectorc(t)i :Si [0,1]is defined as

ci(t)(a) =Es(t)

−i∼σ−i(t)Ci

a,s(t)−i

∀a∈ Si

whereσ−i(t)isproduct distributionformed by thepj(t)withjN\ {i}.

Strategya(t)p(t)i is drawn, and playeri incurs corresponding cost.

That is,σ(t)−i :S−i [0,1]is prob. distribution given byσ−i(t)(s−i) =Q

j6=ipj,s(t)j, wheres−i = (s1, . . . ,si−1,si+1, . . . ,s|N|)∈ S−i.

RememberS−i =S1× · · · × Si−1× Si+1× · · · × S|N|.

25 / 32

(26)

No-regret (player) dynamics

In every roundt=1, . . . ,T, every playeriN does the following:

UseAi to compute prob. distr.p(t)i = (pi(t),1, . . . ,p(t)i,m

i)overSi.

The adversarial cost vectorc(t)i :Si [0,1]is defined as ci(t)(a) =Es(t)

−i∼σ−i(t)Ci a,s(t)−i

∀a∈ Si

whereσ−i(t)isproduct distributionformed by thepj(t)withjN\ {i}.

Strategya(t)p(t)i is drawn, and playeri incurs corresponding cost.

Setσ(t)=Q

jp(t)j and letσT = T1PT

t=1σ(t) be thetime averageof all production distributions obtained in stepst =1, . . . ,T.

Theorem

The time averageσT is aρi(T)-approximate CCE, i.e., it satisfies Es∼σT[Ci(s)]Es∼σT[Ci(s0i,s−i)] +ρi(T) for i N and fixed si0∈ Si.

(27)

Theorem

The time averageσT is aρi(T)-approximate CCE, i.e., it satisfies Es∼σT [Ci(s)]≤Es∼σT

Ci(s0i,s−i)

i(T) for i ∈N and fixed s0i ∈ Si.

Proof (sketch):First note that expected costEt

i∼p(t)i

h c(t)i (ti)i

incurred by playeri in roundt boils down toEs∼σ(t)[Ci(s)]. Then

Es∼σT[Ci(s)] = 1 T

T

X

t=1

Ea∼p(t)

i

h ci(t)(a)i

(time average)

= min

si∈Si

1 T

T

X

t=1

ci(t)(si) +ρi(T) (definition ofρi(T))

= min

si∈Si

1 T

T

X

t=1

Es(t)

−i∼σ−i(t)Ci

si,s−i(t)

+ρi(T) (definition of ci(t))

1 T

T

X

t=1

Es(t)∼σ(t)Ci s0i,s−i(t)

+ρi(T) (plugging in fixed si0)

=Es∼σTCi(s0i,s−i) +ρi(T) (time average)

27 / 32

(28)

No-regret dynamics

Multiplicative Weights algorithm

(29)

No-regret dynamics

No-regret (player) dynamics

In every roundt=1, . . . ,T, every playeriN does the following:

UseAi to compute prob. distr.p(t)i = (pi(t),1, . . . ,p(t)i,m

i)overSi.

The adversarial cost vectorc(t)i :Si [0,1]is defined as ci(t)(a) =Es(t)

−i∼σ−i(t)Ci

a,s(t)−i

∀a∈ Si

whereσ−i(t)isproduct distributionformed by thepj(t)withjN\ {i}.

Strategya(t)p(t)i is drawn, and playeri incurs corresponding cost.

We next give promised MW algorithm that can be used for theAi, and that has the no-regret property. That is, in expectation,

ρi(T) = 1 T

T

X

t=1

ci(t)(a(t))−min

a∈Si T

X

t=1

c(ti )(a)

!

→0

wherea(t)∼p(t)i .

29 / 32

(30)

Multiplicative Weights (MW) algorithm

Fix playeriN. The MW algorithm maintains weightwa(t)for everya∈ Si and chooses distribution for roundt as

p(t)i,a= wa(t) P

r∈Siwr(t).

Weight update procedure

Given is input parameterη(0,1/2].

Initial weights are set atwa(1)=1 fora∈ Si (uniform distribution overSi).

For roundt=1, . . . ,T:

After seeing cost vectorci(t), weights for roundt+1 are defined as wa(t+1)= (1η)c(t)i (a)·wa(t)

High cost strategies get smaller (relative) weight in next round.

Theorem (Littlestone and Warmuth, 1994)

MW algorithm, withη=p

log(mi)/T , has regretρi(T)2p

log(mi)/T

(31)

Overview

31 / 32

(32)

Hierarchy of equilibrium concepts

PNE

Exists in any congestion game MNE

Exists in any finite game, but hard to compute CE

Computationally tractable CCE

Easily computable with no-regret dynamics

Final remarks

CE can also be obtained through certain player dynamics.

See, e.g., Chapter 18 [R2016].

Recall that PoA bounds, that we derived for PNE, extend to CCE by means of thesmoothness framework.

Referenzen

ÄHNLICHE DOKUMENTE

We will later see no-regret algorithms whose output is a coarse correlated equilibrium (similar algorithms exist converging to CE).. Therefore, for (C)CE, it’s not always necessary

In general, we assume to have an all-knowing, adaptive adversary Can choose which element to present in step i, based on.. Choices of online algorithm in

Side note: Original version of secretary problem asks for maximizing probability with which best element is selected.. If one picks maximum weight element

Sponsored search auctions (e.g., Google) Online selling platforms (e.g., eBay) (Stable) matching problems.. Matching children

Sponsored search auctions (e.g., Google) Online selling platforms (e.g., eBay) (Stable) matching problems.. Matching children

There exists a strategyproof 1 e -approximation for the online bipartite matching problem with uniform random arrivals of the bidders.. Strategyproof

There is a poly(n, m)-time algorithm for solving the (offline) maximum weight bipartite matching problem, where n = |Z | and m = |Y |.. Parameters n and m are

In general, for arbitrary downward-closed set systems, no constant-factor approximation exists.. 22