• Keine Ergebnisse gefunden

43. Monte-Carlo Tree Search: Introduction

N/A
N/A
Protected

Academic year: 2022

Aktie "43. Monte-Carlo Tree Search: Introduction"

Copied!
7
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Foundations of Artificial Intelligence

43. Monte-Carlo Tree Search: Introduction

Malte Helmert

University of Basel

May 17, 2021

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 1 / 27

Foundations of Artificial Intelligence

May 17, 2021 — 43. Monte-Carlo Tree Search: Introduction

43.1 Introduction

43.2 Monte-Carlo Methods 43.3 Monte-Carlo Tree Search 43.4 Summary

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 2 / 27

Board Games: Overview

chapter overview:

I 40. Introduction and State of the Art

I 41. Minimax Search and Evaluation Functions I 42. Alpha-Beta Search

I 43. Monte-Carlo Tree Search: Introduction I 44. Monte-Carlo Tree Search: Advanced Topics

43. Monte-Carlo Tree Search: Introduction Introduction

43.1 Introduction

(2)

43. Monte-Carlo Tree Search: Introduction Introduction

Monte-Carlo Tree Search: Brief History

I Starting in the 1930s: first researchers experiment with Monte-Carlo methods

I 1998: Ginsberg’s GIB player achieves strong performance playing Bridge this chapter

I 2002: Auer et al. present UCB1 action selection for multi-armed bandits Chapter 44

I 2006: Coulom coins the term Monte-Carlo Tree Search (MCTS) this chapter

I 2006: Kocsis and Szepesv´ ari combine UCB1 and MCTS into the most famous MCTS variant, UCT Chapter 44

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 5 / 27

43. Monte-Carlo Tree Search: Introduction Introduction

Monte-Carlo Tree Search: Applications

Examples for successful applications of MCTS in games:

I board games (e.g., Go Chapter 45) I card games (e.g., Poker)

I AI for computer games

(e.g., for Real-Time Strategy Games or Civilization) I Story Generation

(e.g., for dynamic dialogue generation in computer games) I General Game Playing

Also many applications in other areas, e.g., I MDPs (planning with stochastic effects) or I POMDPs (MDPs with partial observability)

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 6 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

43.2 Monte-Carlo Methods

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Monte-Carlo Methods: Idea

I subsume a broad family of algorithms I decisions are based on random samples

I results of samples are aggregated by computing the average

I apart from these points, algorithms differ significantly

(3)

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Aside: Hindsight Optimization vs. the Exam

I As a motivating example for Monte-Carlo methods, we now briefly look at hindsight optimization.

I Hindsight optimization is interesting for settings with randomness and partial observability, which we do not otherwise consider in this course.

I To keep the discussion short, we do not provide formal details for how to model randomness and partial observability.

I Therefore, the slides on hindsight optimization are not relevant for the exam.

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 9 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Monte-Carlo Methods: Example

Bridge Player GIB, based on Hindsight Optimization (HOP) I perform samples as long as resources (deliberation time,

memory) allow:

I sample hands for all players that are consistent with current knowledge about the game state

I for each legal move, compute if fully observable game that starts with executing that move is won or lost I compute win percentage for each move over all samples I play the card with the highest win percentage

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 10 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Example

0% (0/1) 100% (1/1) 0% (0/1) 50% (1/2) 100% (2/2) 0% (0/2) 67% (2/3) 100% (3/3) 33% (1/3)

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Example

0% (0/1) 100% (1/1) 0% (0/1)

50% (1/2)

100% (2/2)

0% (0/2)

67% (2/3)

100% (3/3)

33% (1/3)

(4)

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Example

South to play, three tricks to win, trump suit ♣

0% (0/1) 100% (1/1) 0% (0/1)

50% (1/2) 100% (2/2) 0% (0/2)

67% (2/3) 100% (3/3) 33% (1/3)

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 13 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Example

South to play, three tricks to win, trump suit ♣

0% (0/1) 100% (1/1) 0% (0/1) 50% (1/2) 100% (2/2) 0% (0/2)

67% (2/3) 100% (3/3) 33% (1/3)

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 14 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Restrictions

I HOP well-suited for partially observable games like most card games (Bridge, Skat, Klondike Solitaire) I must be possible to solve or approximate sampled game

efficiently

I often not optimal even if provided with infinite resources

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Methods

Hindsight Optimization: Suboptimality

gamble safe

hit miss

(5)

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

43.3 Monte-Carlo Tree Search

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 17 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

Monte-Carlo Tree Search: Idea

Monte-Carlo Tree Search (MCTS) ideas:

I perform iterations as long as resources (deliberation time, memory) allow:

I build a partial game tree, where nodes n are annotated with I utility estimate u(n) ˆ

I visit counter N(n)

I initially, the tree contains only the root node I each iteration adds one node to the tree

After constructing the tree, play the move that leads to the child of the root with highest utility estimate (as in minimax/alpha-beta).

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 18 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

Monte-Carlo Tree Search: Iterations

Each iteration consists of four phases:

I selection: traverse the tree by applying tree policy

I Stop when reaching terminal node (in this case, set n

child

to that node and p

?

to its position and skip next two phases). . . I . . . or when reaching a node n

parent

for which not all successors

are part of the tree.

I expansion: add a missing successor n child of n parent to the tree I simulation: apply default policy from n child

until a terminal position p ? is reached

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

Monte-Carlo Tree Search

Selection: apply tree policy to traverse tree

11 11

13

12

5

14 14

4

6

1

7

3

4

1

8

1

18

2

18 18

2

2

1

5

1

6

1

(6)

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

Monte-Carlo Tree Search

Expansion: create a node for first position beyond the tree

11

13

12

5

14

4

6

1

7

3

4

1

8

1

18

2

18

2

2

1

5

1

6

1

12

1

?

0

16

1

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 21 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

Monte-Carlo Tree Search

Simulation: apply default policy until terminal position is reached

11

13

12

5

14

4

6

1

7

3

4

1

8

1

18

2

18

2

2

1

5

1

6

1

12

1

?

0

16

1

39

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 22 / 27

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

Monte-Carlo Tree Search

Backpropagation: update utility estimates of visited nodes

13

14

12

5

19 19

55

6

1

7

3

4

1

8

1

18

2

25

3

25

3

2

1

5

1

6

1

12

1

39 39

1

16

1

39

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

Monte-Carlo Tree Search: Pseudo-Code

Monte-Carlo Tree Search n 0 := create root node() while time allows():

visit node(n 0 )

n best := arg max n∈succ(n

0

) u(n) ˆ

return n best .move

(7)

43. Monte-Carlo Tree Search: Introduction Monte-Carlo Tree Search

Monte-Carlo Tree Search: Pseudo-Code

function visit node(n) if is terminal(n.position):

utility := u(n.position) else:

p := n.get unvisited successor() if p is none:

n 0 := apply tree policy(n) utility := visit node(n 0 ) else:

p ? := apply default policy until end(p) utility := u(p ? )

n.add child node(p, utility) update visit count and estimate(n, utility) return utility

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 25 / 27

43. Monte-Carlo Tree Search: Introduction Summary

43.4 Summary

M. Helmert (University of Basel) Foundations of Artificial Intelligence May 17, 2021 26 / 27

43. Monte-Carlo Tree Search: Introduction Summary

Summary

I Monte-Carlo methods compute averages over a number of random samples.

I Simple Monte-Carlo methods like Hindsight Optimization perform well in some games, but are suboptimal

even with unbounded resources.

I Monte-Carlo Tree Search (MCTS) algorithms iteratively build

a search tree, adding one node in each iteration.

Referenzen

ÄHNLICHE DOKUMENTE

Achtung: für Funktionen TFi werden die Zufallszahlen ebenfalls über Histogramme mit einer Anzahl von Punkten

6.1 Average characteristic values taken over all tested problem instances and all numbers of rollouts for problem instances of size 6 × 6 for MCTS using UCT for different

Simple Monte-Carlo methods like Hindsight Optimization perform well in some games, but are suboptimal. even with

default policy simulates a game to obtain utility estimate default policy must be evaluated in many positions if default policy is expensive to compute,. simulations

Monte-Carlo Tree Search: Advanced Topics Optimality of MCTSM. 44.1 Optimality

Like (L)RTDP, MCTS performs trials (also called rollouts) Like Policy Simulation, trials simulate execution of a policy Like other Monte-Carlo methods, Monte-Carlo backups

In [58] wird gezeigt, dass dieser f¨ur das NVT-Ensemble hergeleitete Sch¨atzer auch im NpT- Ensemble verwendet werden kann. Der Mittelwert der mit dem Virialsch¨atzer gewonnenen

Die Abbildung 5.29 zeigt einen Vergleich der Blockdichtehistogramme f¨ur das System mit der Helixstruktur (schwarze Linie) aus dem ersten K¨uhllauf und dem System mit der Ringstruk-