• Keine Ergebnisse gefunden

MDPs and External Value Iteration

N/A
N/A
Protected

Academic year: 2021

Aktie "MDPs and External Value Iteration"

Copied!
7
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

MDPs and

External Value Iteration

Stefan Edelkamp

(2)

I

u1

u2

u3

a; c(a) = 2; p=9/10

c; c(c)=10; p=1

h=3

h=0

h=6 h=1

Action a: 2 + 1/10 x 3 + 9/10 x 0 = 2.3

h=2.3

c: 10 + 1 x 6 = 16 b: 4 + 1 x 0 = 4

MDPs

(3)

Uniform Search Model:

Deterministic

Non-Deterministic

Probabilistic

(4)

Internal Memory Value Iteration

l ε-Optimal for solving MDPs, AND/OR trees…

l Problem:

l Needs to have the whole state space in the main

memory.

(5)

External-Memory Algorithm for Value Iteration

 What makes value iteration different from the usual external-memory search algorithms?

Answer:

 Propagation of information from states to predecessors!

 Edges are more important than the states.

Ext-VI works on Edges:

(6)

External Memory Value Iteration

l Phase I: Generate the edge space by External BFS.

l Open(0) = Init; i = -1

l while (Open(i-1) != empty) l Open(i) = Succ(Open(i-1))

l Externally-Sort-and-Remove-Duplicates(Open(i)) l for loc = 1 to Locality(Graph)

l Open(i) = Open(i) \ Open(i - loc) l i++

l endwhile

l Merge all BFS layers into one edge list on disk!

l Open_t = Open(0) U Open(1) U … U Open(DIAM)

Temp = Open_t

Sort Open_t wrt. the successors; Sort Temp wrt. the predecessors

Remove previous

layers

(7)

{(Ø, 1), (1,2), (1,3), (1,4), (2,3), (2,5), (3,4), (3,8), (4,6), (5,6), (5,7), (6,9), (7,8), (7,10), (9,8), (9,10)}

{(Ø,1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5), (4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9), (7,10), (9,10)}

3 2 2 2 2 1 2 0 1 1 1 1 0 0 0 0

3 2 2 2 2 2 1 1 1 1 0 0 0 1 0 0

3 2 1 1 2 2 2 2 2 1 0 0 0 1 0 0

1

2

3

4

7

8

9 5

6

I T

10

T

h=3

2 2 2

1

1

1

1

0 0

h=

h=

h’=

     

 u

Succ v

v h u

h'  1  min

External Value Iteration

[ICAPS-07]

Referenzen

ÄHNLICHE DOKUMENTE

F4.1 Introduction F4.2 Value Iteration F4.3 Asynchronous VI F4.4 Summary.. R¨ oger (Universit¨ at Basel) Planning and Optimization December 02, 2020 2

Asynchronous VI performs backups for individual states Different approaches lead to different backup orders Can significantly reduce computation. Guaranteed to converge if all

F4.1 Value Iteration F4.2 Linear Programming F4.3 Summary?. Keller (Universit¨ at Basel) Planning and Optimization November 26, 2018 2

Also in this framework, monotonicity has an important role in proving convergence to the viscosity solution and a general result for monotone scheme applied to second order

→ Dazu war nicht erforderlich, einen optimalen Loop Separator zu berechnen :-)))?.

• Destruktive Updates sind nur von Variablen möglich, nicht im Speicher. == ⇒ keine Information, falls Speicher-Objekte nicht vorinitialisiert

• At the same program point, typically different addresses are accessed ... • Storing at an unknown address destroys all information

We prove an affine regularization theorem: these iterations in higher dimensions also deliver generations Q k approaching the affine shape of regular planar