• Keine Ergebnisse gefunden

Planning and Optimization G2. Heuristic Search: AO

N/A
N/A
Protected

Academic year: 2022

Aktie "Planning and Optimization G2. Heuristic Search: AO"

Copied!
21
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Planning and Optimization

G2. Heuristic Search: AO & LAO Part II

Gabriele R¨oger and Thomas Keller

Universit¨at Basel

December 3, 2018

(2)

Content of this Course

Planning

Classical

Tasks Progression/

Regression Complexity Heuristics

Probabilistic

MDPs Blind Methods Heuristic Search

Monte-Carlo Methods

(3)

AO

(4)

From A

with Backward Induction to AO

A with backward induction already very similar to AO Support for uncertain outcomesmissing

We focus on SSPs in these slides Adaption to FH-MDPs simple

Careful: admissible heuristic in reward setting must not underestimatetrue reward

Still two steps ahead:

restrict toacyclic probabilistic tasksAO allowgeneral probabilistic tasksLAO

(5)

Transition Systems

AO distinguishes three transition systems:

The acyclic SSPT =hS,L,c,T,s0,S?i

⇒ given implicitly

The explicated graphTˆt =hSˆt,L,c,Tˆt,s0,S?i

⇒ the part of T explicitly considered during search The partial solution graphTˆt? =hSˆt?,L,c,Tˆt?,s0,S?i

⇒ The part of ˆTt that contains best solution

s0t?t T

(6)

Explicated Graph

Expanding a state s at time stept explicates all outcomes s0 ∈succ(s, `) for all `∈L(s) by adding them to explicated graph:

t=hSˆt−1∪succ(s),L,c,Tˆt,s0,S?}, where ˆTt= ˆTt−1 except that ˆTt(s, `,s0) =T(s, `,s0) for all `∈L(s) ands0 ∈succ(s, `)

Explicated states are annotated withstate-value estimate Vˆt(s) that describesestimated expected cost to goal at step t When state s0 is explicated ands0 ∈/ Sˆt−1, its state-value estimate is initializedto ˆVt(s0) :=h(s0)

We callleaf states of ˆTt fringe states

(7)

Partial Solution Graph

The partial solution graphTˆt? is the subgraph of ˆTt that is spanned by the smallest set of states ˆSt? that satisfies:

s0Sˆt?

ifsSˆt?,s0Sˆt and ˆTt(s,aVˆt(s),s0)>0, thens0 in ˆSt? The partial solution graph forms a partial acyclic policy defined in the initial states0 and allnon-leaf statesthat can be reached by its execution

Leaf states that can be reached by the policy described by the partial solution graph are thestates in the greedy fringe

(8)

Bellman backups

AO does not maintain static open list

State-value estimatesdetermine partial solution graph Partial solution graph determines which state is acandidate for expansion

Different strategies to select among candidates exist (Some) state-value estimates are updated in time step t by Bellman backups:

t(s) = min

l∈L c(l) + X

s0Sˆt

t(s,l,s0)·Vˆt(s0)

(9)

AO

AO for acyclic SSPT explicates0

whilethere is a greedy fringe state not inS?: select a greedy fringe states ∈/ S? expand s

perform Bellman backups of states in ˆTt−1? in reverse order returnTˆt?

(10)

AO

: Example (Blackboard)

s0

5

a1

a2

s1

10 s2

6 s3

3

a3

a4

a5

a6

s4

s5

3 s6

s7

4

a7

a8

s8

1 1

.5 .5 .25 .75

12 12 1 2

.8 .2 .5 .5

5 4

h(s) = 0 for goal states, otherwise inblue above or belows

(11)

Theoretical properties

Theorem

Using an admissible heuristic,AO converges to an optimal solution without (necessarily) explicating all states.

Proof omitted.

(12)

LAO

(13)

LAO

A with backward induction findssequential solutions(a plan) in classical planning tasks

AO findsacyclic solutions with branches(an acyclic policy) in acyclic SSPs

LAO is the generalization of AO tocyclic solutions in cyclic SSPs

(14)

AO LAO Summary

LAO

From plans to acyclic policies, we only changed backup procedure frombackward induction to Bellman backups When solutions may be cyclic, we cannot perform updates in reverse order

iteration

replacing Bellman backups withvalue iterationis LAO variant the original algorithm of Hansen & Zilberstein (1998) uses policy iteration instead

(15)

AO LAO Summary

LAO

From plans to acyclic policies, we only changed backup procedure frombackward induction to Bellman backups When solutions may be cyclic, we cannot perform updates in reverse order

Bellman backups are essentially acyclic version of value iteration

replacing Bellman backups withvalue iterationis LAO variant the original algorithm of Hansen & Zilberstein (1998) uses policy iteration instead

(16)

LAO

From plans to acyclic policies, we only changed backup procedure frombackward induction to Bellman backups When solutions may be cyclic, we cannot perform updates in reverse order

Bellman backups are essentially acyclic version of value iteration

replacing Bellman backups withvalue iterationis LAO variant the original algorithm of Hansen & Zilberstein (1998) uses policy iteration instead

(17)

LAO

LAO for SSP T explicates0

whilethere is a greedy fringe state not inS?: select a greedy fringe states ∈/ S? expand s

perform policy iteration in ˆTt returnTˆt?

(18)

LAO

: Optimizations

Several optimizations for LAO have been proposed:

Usevalue iterationinstead of policy iteration

Terminate VI when the partial solution graphchanges Expandall statesin greedy fringe before backup

Order states (arbitrarily within cycles) and usebackward induction for updates

⇒last two combine to famous variant iLAO

(19)

Theoretical properties

Theorem

Using an admissible heuristic,LAO converges to an optimal solution without (necessarily) explicating all states.

Proof omitted.

(20)

Summary

(21)

Summary

AO finds optimal solutionsfor acyclic SSPs LAO finds optimal solutionsfor SSPs

Both algorithms differ from A with backward induction in waybackups are performed

Unlike previous optimal algorithms, both are able to find optimal solution without explicating all states

Referenzen

ÄHNLICHE DOKUMENTE

Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and North African Studies, Hamburg, 25th-30th Augusl 1986 (ZDMG-Suppl.. century locally

Since change is non-trivial, we focus on A ∗ variant now and generalize later to acyclic probabilistic tasks (AO ∗ ) and probabilistic tasks in general (LAO ∗ ).. Heuristic

I From plans to acyclic policies, we only changed backup procedure from backward induction to Bellman backups. I When solutions may be cyclic, we cannot perform updates in

12.— The redshift-space power spectrum recovered from the combined SDSS main galaxy and LRG sample, optimally weighted for both density changes and luminosity dependent bias

By contrast, when judged by the standard of Friedman (1953), backward induction would be regarded as \useful." A model with boundedly rational agents would be far less

Organizarea contabilităţii de gestiune şi calculaţiei costurilor pe baze ştiinţifice în întreprinderile din industria fierului şi oţelului asigură premisele pentru

We further note that in the case where no noise exists (see Section 3.5.2), the KSW estimator does an excellent job at estimating efficiency scores and rankings.. It is

The results show that with regard to the overall carbon footprint we need to focus on an intelligent mix of powertrains that meets indi- vidual requirements and includes