• Keine Ergebnisse gefunden

Planning and Optimization G1. Heuristic Search: AO

N/A
N/A
Protected

Academic year: 2022

Aktie "Planning and Optimization G1. Heuristic Search: AO"

Copied!
31
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Planning and Optimization

G1. Heuristic Search: AO & LAO Part I

Gabriele R¨oger and Thomas Keller

Universit¨at Basel

December 3, 2018

(2)

Heuristic Search Motivation Awith Backward Induction Summary

Content of this Course

Planning

Classical

Tasks Progression/

Regression Complexity Heuristics

Probabilistic

MDPs Blind Methods Heuristic Search

Monte-Carlo Methods

(3)

Heuristic Search Motivation Awith Backward Induction Summary

Heuristic Search

(4)

Heuristic Search Motivation Awith Backward Induction Summary

Heuristic Search: Recap

Heuristic Search Algorithms

Heuristic search algorithmsuse heuristic functions

to (partially or fully) determine the order of node expansion.

(From Lecture 15 of the AI course last semester)

(5)

Heuristic Search Motivation Awith Backward Induction Summary

Best-first Search: Recap

Best-first Search

Abest-first searchis a heuristic search algorithm

that evaluates search nodes with anevaluation function f and always expands a noden with minimal f(n) value.

(From Lecture 15 of the AI course last semester)

(6)

Heuristic Search Motivation Awith Backward Induction Summary

A

Search: Recap

ASearch

A is the best-first search algorithm with evaluation function f(n) =g(n) +h(n.state).

(From Lecture 15 of the AI course last semester)

(7)

Heuristic Search Motivation Awith Backward Induction Summary

A

Search (With Reopening): Example

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 0+18

s1 8+12

s2 5+14

s5 15+4

s6 23+0

s3 18+12

s4 16+6

s5 12+4

s6 20+0

8 5

108 4 10

10 8

(8)

Heuristic Search Motivation Awith Backward Induction Summary

A

Search (With Reopening): Example

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 0+18

s1 8+12

s2 5+14

s5 15+4

s6 23+0

s3 18+12

s4 16+6

s5 12+4

s6 20+0

8 5

108 4 10

10 8

(9)

Heuristic Search Motivation Awith Backward Induction Summary

Motivation

(10)

Heuristic Search Motivation Awith Backward Induction Summary

From A

to AO

Equivalent of A in (acyclic) probabilistic planning is AO Even though we know A and foundations of probabilistic planning, the generalization is far from straightforward:

e.g., in A,g(n) is cost from rootn0ton equivalent in AO isexpected costfromn0ton

alternative could beexpected costfromn0tongivennis reached

(11)

Heuristic Search Motivation Awith Backward Induction Summary

Expected Cost to Reach State

Consider the following expansion of states0:

s0

a0 a1

s1

100 s2

1

s3

2 s4

2

1 1

.99 .01 .5 .5

Expected cost to reachany of the leaves isinfiniteor undefined (neither is reached with probability 1).

assumingstate-value estimateV(s) :=h(s), a1 is greedy action

(12)

Heuristic Search Motivation Awith Backward Induction Summary

From A

to AO

Equivalent of A in (acyclic) probabilistic planning is AO Even though we know A and foundations of probabilistic planning, the generalization is far from straightforward:

e.g., in A,g(n) is cost from rootn0ton equivalent in AO isexpected costfromn0ton

alternative could beexpected costfromn0tongivennis reached

(13)

Heuristic Search Motivation Awith Backward Induction Summary

Expected Cost to Reach State Given It Is Reached

Consider the following expansion of states0:

s0

a0 a1

s1

100 s2

1

s3

2 s4

2

1 1

.99 .01 .5 .5

Conditional probability ismisleading: s2 would be expanded, which isn’t part of thebest lookingoption

:

withstate-value estimate Vˆ(s) :=h(s), greedy action aVˆ(s) =a1

(14)

Heuristic Search Motivation Awith Backward Induction Summary

The Best Looking Action

Consider the following expansion of states0:

s0

a0 a1

s1

100 s2

1

s3

2 s4

2

1 1

.99 .01 .5 .5

Conditional probability ismisleading: s2 would be expanded, which isn’t part of thebest lookingoption:

withstate-value estimateVˆ(s) :=h(s), greedy action aVˆ(s) =a1

(15)

Heuristic Search Motivation Awith Backward Induction Summary

Expansion in Best Solution Graph

AO uses different idea:

AO keeps track ofbest solution graph

AO expands a state that can bereached from s0 by only applying greedy actions

⇒ nog-value equivalent required

Equivalent version of A built on this idea can be derived

⇒ A with backward induction

Since change is non-trivial, we focus on A variant now and generalize later to acyclic probabilistic tasks (AO) and probabilistic tasks in general (LAO)

(16)

Heuristic Search Motivation Awith Backward Induction Summary

Expansion in Best Solution Graph

AO uses different idea:

AO keeps track ofbest solution graph

AO expands a state that can bereached from s0 by only applying greedy actions

⇒ nog-value equivalent required

Equivalent version of A built on this idea can be derived

⇒ A with backward induction

Since change is non-trivial, we focus on A variant now and generalize later to acyclic probabilistic tasks (AO) and probabilistic tasks in general (LAO)

(17)

Heuristic Search Motivation Awith Backward Induction Summary

A with Backward Induction

(18)

Heuristic Search Motivation Awith Backward Induction Summary

Transition Systems

A with backward induction distinguishesthree transition systems:

The transition systemT =hS,L,c,T,s0,S?i

⇒ given implicitly

The explicated graphTˆt =hSˆt,L,c,Tˆt,s0,S?i

⇒ the part of T explicitly considered during search The partial solution graphTˆt? =hSˆt?,L,c,Tˆt?,s0,S?i

⇒ The part of ˆTt that contains best solution

s0t?t T

(19)

Heuristic Search Motivation Awith Backward Induction Summary

Explicated Graph

Expanding a states at time stept explicates all successors s0 ∈succ(s) by adding them toexplicated graph:

t =hSˆt−1∪succ(s),L,c,Tˆt−1∪ {hs,l,s0i ∈T},s0,S?} Each explicated state is annotated withstate-value estimate Vˆt(s) that describesestimated cost to a goal at time stept When state s0 is explicated ands0 ∈/ Sˆt−1, its state-value estimate is initializedto ˆVt(s0) :=h(s0)

We callleaf states of ˆTt fringe states

(20)

Heuristic Search Motivation Awith Backward Induction Summary

Partial Solution Graph

The partial solution graphTˆt? is the subgraph of ˆTt that is spanned by the smallest set of states ˆSt? that satisfies:

s0Sˆt?

ifsSˆt?,s0Sˆt andhs,aVˆt(s)(s),s0i ∈Tˆt, thens0 in ˆSt? The partial solution graph forms a sequence of states

hs0, . . . ,sni, starting with the initial state s0 and ending in the greedy fringe state sn

(21)

Heuristic Search Motivation Awith Backward Induction Summary

Backward Induction

A with backward induction does not maintainstatic open list State-value estimatesdetermine partial solution graph

Partial solution graph determines which state is expanded (Some) state-value estimates are updated in time step t by backward induction:

t(s) = min

hs,l,s0i∈Tˆt(s)

c(l) + ˆVt(s0)

(22)

Heuristic Search Motivation Awith Backward Induction Summary

A

with backward induction

A with backward induction for classical planning taskT explicates0

whilegreedy fringe states ∈/ S?: expand s

perform backward induction of states in ˆTt−1? in reverse order returnTˆt?

(23)

Heuristic Search Motivation Awith Backward Induction Summary

A

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0

s0

18

s1

12 s2

14 18 18

s3 12

s4 6

s5 8 8

s6 0 0

8 8 55

10 8

4 10

8

(24)

Heuristic Search Motivation Awith Backward Induction Summary

A

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0

s0

19

s1

s1

12 s2

s2

14

18 18

s3 12

s4 6

s5 8 8

s6 0 0

8

8

5

5

10 8

4 10

8

(25)

Heuristic Search Motivation Awith Backward Induction Summary

A

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 19

s1

s1

12 s2

s2

14

18 18

s3 12

s4 6

s5

s5

4

8 8

s6 0 0

8

8

5

5

10 8

4

10

8

(26)

Heuristic Search Motivation Awith Backward Induction Summary

A

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 20

s1

s1

12 s2

14

18

18

s3 12

s4 6

s5

s5

8

8 8

s6 0

0 0

8

8

5

5

10 8

4

10

8

(27)

Heuristic Search Motivation Awith Backward Induction Summary

A

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 20

s1

s1

12 s2

14 18

18

s3 12

s4 6

s5 8

8

s6 0

0

8

8

5

5

10 8

4 10

8

(28)

Heuristic Search Motivation Awith Backward Induction Summary

A

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 20

s1

12 s2

14 18

18

s3 12

s4 6

s5

8

8

s6

s6

0

0

8

8

5

5

10 8

4 10

8

(29)

Heuristic Search Motivation Awith Backward Induction Summary

Equivalence of A

and A

with Backward Induction

Theorem

A andA with Backward Induction expand the same set of states if run with identical admissible heuristic h and identical

tie-breaking criterion.

Proof Sketch.

The proof shows that

there is always a unique states in greedy fringe of A with backward induction

f(s) =g(s) +h(s) is minimal among all fringe states g(s) of fringe node s encoded in greedy action choices h(s) of fringe node equal to ˆVt(s)

(30)

Heuristic Search Motivation Awith Backward Induction Summary

Summary

(31)

Heuristic Search Motivation Awith Backward Induction Summary

Summary

Non-trivial to generalizeA to probabilistic planning

For better understanding of AO, wechangeA towards AO Derived A with backward induction, which is similarto AO and expandsidentical states as A

Referenzen

ÄHNLICHE DOKUMENTE

performance of two strategies: Average, where agents take the average of their best estimates from phases 1 and 2, and Inner-MCS, a heuristic inspired by Koriat’s (2012b)

Using all of these optimizations, the solving time when duplicate nodes are removed from the open list increases to 1,944 seconds for the set of 100 instances and 1,049 seconds for

I The minimum hitting set over all cut landmarks is a perfect heuristic for delete-free planning tasks. I The LM-cut heuristic is an admissible heuristic based on

The minimum hitting set over all cut landmarks is a perfect heuristic for delete-free planning tasks. The LM-cut heuristic is an admissible heuristic based on

in terms of different state variables (Boolean or finite domain) individual state variables induce atomic propositions.. a state is a valuation of state variables n Boolean

Propositional planning tasks compactly represent transition systems and are suitable as inputs for planning algorithms. They are based on concepts from propositional logic, enhanced

The difficulty of such a pointing task is well described by Fitts' Law as the logarithm of the amplitude (distance to the target) of the movement divided by the

Figures 7 and 8 show, for each algorithm, the size of portion of the state transition graph which has been explored to provide a counterexample carrying a certain amount