Structure and Variation of Signaling Conventions
Roland M ¨uhlenbernd
November 7, 2014
T
ABLEO
FC
ONTENTSI Signaling Games
I Reinforcement Learning
I Simulations on Grid Structures
I Simulations on Scale-Free Networks
T
HEC
ONVENTION OFS
EMANTICM
EANING”A name is a spoken sound significant by convention... I say ’by convention’ because no name is a name naturally but only when it has become a symbol.”
Aristotle, De Interpretatione
”[w]e can hardly suppose a parliament of hitherto speechless elders meeting together and agreeing to call a cow a cow and a wolf a wolf.”
Russell, The Analysis of Mind
→ Paradox: language is needed for language to emerge
→ Lewis: semantic convention can emerge in ways dif- ferent to verbal agreements
Signaling Game
:SG=h{S,R},T,M,A,Pr,UiN S
R
1 0
R
1 0
S R
0 1
R
0 1
.5 .5
tL tS
m1 m2 m1 m2
aL aS aL aS aL aS aL aS
I T={tL,tS}
I M={m1,m2}
I A={aL,aS}
I Pr(tL) =Pr(tS) =.5
I U(ti,aj) =
1 ifi=j
0 else aL aS
tL 1 0 tS 0 1
P
URE STRATEGIESPure strategies are contingency plans, players act according to.
I sender strategy:σ :T→M
I receiver strategy:ρ:M→A
σ1: tL m1
tS m2
σ2: tL
m2 tS
m1
σ3: tL m1
tS m2
σ4: tL
m2 tS
m1
ρ1: m1 aL
m2 aS
ρ2: m1
aS m2
aL
ρ3: m1 aL
m2 aS
ρ4: m1
aS m2
aL
S
IGNALINGS
YSTEMS...
I are combinations of pure strategies. The Lewis game has two:L1 =hσ1, ρ1iandL2=hσ2, ρ2i
L1: tL tS
m1 m2
aL aS L2:
tL tS
m1 m2
aL aS
I are strictNash equilibriaof theEU-table:
ρ1 ρ2 ρ3 ρ4
σ1 1 0 .5 .5
σ2 0 1 .5 .5
σ3 .5 .5 .5 .5 σ4 .5 .5 .5 .5
I associate messages to states in an unique way
I areevolutionary stable states
C
ONCLUSIONI signaling systems explain stability of semantic meaning...
I but not how it might emergewithout verbal agreement
I idea: participantlearna signaling system
I they start withunbiasedbehavior
I the update their behavior after each encounter
I Huttegger and Zollman (2011) asked: “How little cognitive ability is needed to learn a signaling system?”
R
EINFORCEMENTL
EARNING INS
IGNALINGG
AMESS R
tlts
m1
m2
al
as
f f
f f
I the sender has an urnftfor each statet∈T
I each urn contains balls of each messagem∈M
I the sender decides by drawing from urnft
I the receiver has an urnfm
for each messagem∈M
I each urn contains balls of each actiona∈A
I the receiver decides by drawing from urnfm I successful communication→urn update
R
EINFORCEMENTL
EARNINGU
PDATER
ULESCommunication viaht,m,aiis successful
I Roth-Erev reinforcement:increase successful balls in appropriate urn byα∈R
I lateral inhibition:additionally decrease all other balls by γ ∈R
I negative reinforcement: if communication viaht,m,aiis not successful, decrease appropriate balls byβ ∈R
I Bush-Mosteller reinforcement:reinforce and scale down urn contents toΩ∈R
I RL=hα, β, γ,Ω, φi
T
HET
ROUBLEW
ITHR
ICHERG
AMES What happens in differentn×k-signaling games?(n=|A|=|T|,k=|M|)
Game RFR
2×2 0%
3×3 9.6%
4×4 21.9%
8×8 59.4%
Table: Barrett’s results for Roth-Erev reinf. learning:
run failure rates (RFR) for diverse games (1000 runs)
RFR
0%
20%
40%
60%
3×3 4×4 8×8
B-M + LI + NRL Bush-Mosteller Roth-Erev
Figure: Run failure rates for different reinforcement learning accounts Source: Barret, J. A. (2009): The Evolution of Coding in Signaling Games.
Theory and Decision67, 223–237
T
HET
ROUBLEW
ITHR
ICHERG
AMES2×2 3×3
4×4 5×5
6×6 7×7
8×8 0%
20%
40%
60%
80%
100%
01 2 3
01 2 3
0 1 2
3 01 2 3
0 1
2 3
0
1
23 0
1 23 1 no learners
1 1 learner
2 2 learners
3 3 learners
Figure: Experiments with 3 agents:
percentage of particular number of signaling system learners (averaged over 1000 simulation runs)
2×2 3×3
4×4 5×5
6×6 7×7
8×8 0%
20%
40%
60%
80%
100%
5 agents 3 agents
Figure: Experiments with 3 and 5 agents: percentage of signaling system learners (averaged over 1000 simulation runs)
→ What feature supports the emergence of signaling sys- tems in richer games and larger populations?
I
NNOVATIONI each sender urn containsblack ballsthat – if drawn – produce a new message
I result: all agents learn a signaling system, even in rich games
I BUT: only in small populations
I solution: black ball induces a random message from the (fixed) message set
I to keep innovative nature, choose signaling games with
|M|>>|T|
T
ESTRUN WITH3 A
GENTS AND A3 × 9-G
AMEI communicative success (CS): utility value (population average)
I force of innovation (FOI): # black balls (population average) FOI CS
100 150 200 250 300 350
0 .01 .02 .03 .04 .05
-.3-.2 -.1.1.2.3.4.5.6.7.8.901
Figure: Simulation run of a 3×9 signaling game with innovation in a 3-agents population:
I
NNOVATIONVS. C
OMMUNICATIVES
UCCESSFOI
CS
0 .01 .02 .03 .04 .05
-.6 -.4 -.2 0 .2 .4 .6 .8 1
Figure: 40,000 data points of 10 simulation runs: CS and FOI reveal a negativePearsoncorrelation of−.6
C
ONCLUSIONI agents learn signaling systems by repeated play and simple update mechanismreinforcement learning
I but not necessarily for rich games and/or large populations
I idea: participant can beinnovative
I by sending a random message from setM
I when drawing the specialblack ballfrom a sender urn
I result: agents learn signaling systems for richer games in large populations.
E
XPERIMENTS ONG
RIDS
TRUCTURESSettings:
I network: 10,000 agents on a 100×100 toroid lattice
I game type: 3×30 game
I update: Bush-Mosteller reinforcement learning with negative reinforcement, lateral inhibition and innovation (α=1,β =1,γ =0.5,Ω =20,φ: all balls uniformly distributed over all urns)
I break condition: after 50,000 simulation steps
R
ESULTSO
NA T
OROIDL
ATTICEFigure: Structure after 2,000 simulation steps
CS:≈.8,#RC:≈500
Figure: Structure after 50,000 simulation steps
CS:≥.9,#RC:≈170
R
ESULTSO
NA T
OROIDL
ATTICEL55: t1 t2 t3
m8 m21 m22
a1 a2
a3
L72: t1 t2 t3
m23 m21 m22
a1 a2 a3
L139: t1 t2 t3
m7 m8 m15
a1 a2 a3
S
IMILARITYM
EASURES ANDR
EGIONALD
ISTANCELexical SimilarityLS:
LS(L1,L2) = |{m∈M|∃t∈T:m=s1(t)} ∩ {m∈M|∃t∈T:m=s2(t)}|
|T|
Mutual IntelligibilityMI:
MI(L1,L2) = P
t(Ux(t,s1,r2) +Ux(t,s2,r1)) 2× |T|
Regional DistanceRD:
RD(G1,G2) = P
ni∈N1
P
nj∈N2SP(ni,nj)
|N1| × |N2|
R
ESULTSO
NA T
OROIDL
ATTICEMutual Intelligibility (MI) Lexical Similarity (LS)
regional distance
0 5 10 15 20 25 30 35 40
0 1 2 3 4
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Figure: LS and MI between two language regions in dependence of the distance between them, averaged over all pairs of language regions of 10 simulation runs (10×1702= 289,000 data points).
E
XPERIMENTS ONS
CALE-F
REEN
ETWORKSSettings:
I network: 500 agents on a scale-free network (Holme-Kim algorithm withm=2,p=.8)
I game type: 3×9 game
I update: Bush-Mosteller reinforcement learning with negative reinforcement, lateral inhibition and innovation (α=1,β =1,γ =0.5,Ω =20,φ: all balls uniformly distributed over all urns)
I break condition: after 100,000 simulation steps
R
ESULTS ONS
CALE-F
REEN
ETWORKSFigure: After 100,000 simulation steps 50 regions of different sizes emerged.
Figure: Histogram of region sizes bins for 10 simulation runs (ca. 500 data points).
Negative Pearson Correlation of−.4 between the size of a language region and its members’ degree centralityDC(n) = N−1d(n)
Figure: Data plot of agent’s degree centrality (y-axis) in comparison to their region’s size (x-axis) for 5,000 data points.
C
ONCLUSIONI repeated signaling gamesplus update dynamics might simulate the pathes of the emergence of semantic meaning
I reinforcement learningcannot guarantee the emergence of signaling systems for rich games in large populations
I innovationas additional feature overcomes this problematic nature
I onlattice structuresa regional structure similar to a dialect spectrum emerges, whereby regions coalesce with each other bit by bit
I onscale-free networksregions of different sizes emerge that contain agents with anti-proportional degree centralities
I theinnovative natureof the learning dynamics reorganizes structures of communities and realizessemantic change