Planning and Optimization G1. Heuristic Search: AO

(1)

Planning and Optimization

G1. Heuristic Search: AO^∗ & LAO^∗ Part I

Gabriele R¨oger and Thomas Keller

Universit¨at Basel

December 3, 2018

(2)

Heuristic Search Motivation A^∗with Backward Induction Summary

Content of this Course

Planning

Classical

Tasks Progression/

Regression Complexity Heuristics

Probabilistic

MDPs Blind Methods Heuristic Search

Monte-Carlo Methods

(3)

Heuristic Search

(4)

Heuristic Search: Recap

Heuristic Search Algorithms

Heuristic search algorithmsuse heuristic functions

to (partially or fully) determine the order of node expansion.

(From Lecture 15 of the AI course last semester)

(5)

Best-first Search: Recap

Best-first Search

Abest-first searchis a heuristic search algorithm

that evaluates search nodes with anevaluation function f and always expands a noden with minimal f(n) value.

(6)

A

^∗

Search: Recap

A^∗Search

A^∗ is the best-first search algorithm with evaluation function f(n) =g(n) +h(n.state).

(7)

A

^∗

Search (With Reopening): Example

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 0+18

s1 8+12

s2 5+14

s5 15+4

s6 23+0

s3 18+12

s4 16+6

s5 12+4

s6 20+0

8 5

108 4 10

10 8

(8)

A

^∗

Search (With Reopening): Example

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 0+18

s1 8+12

s2 5+14

s5 15+4

s6 23+0

s3 18+12

s4 16+6

s5 12+4

s6 20+0

8 5

108 4 10

10 8

(9)

Motivation

(10)

From A

^∗

to AO

^∗

Equivalent of A^∗ in (acyclic) probabilistic planning is AO^∗ Even though we know A^∗ and foundations of probabilistic planning, the generalization is far from straightforward:

e.g., in A^∗,g(n) is cost from rootn0ton equivalent in AO^∗ isexpected costfromn0ton

alternative could beexpected costfromn0tongivennis reached

(11)

Expected Cost to Reach State

Consider the following expansion of states0:

s0

a0 a1

s1

100 s2

1

s3

2 s4

2

1 1

.99 .01 .5 .5

Expected cost to reachany of the leaves isinfiniteor undefined (neither is reached with probability 1).

assumingstate-value estimateV(s) :=h(s), a₁ is greedy action

(12)

From A

^∗

to AO

^∗

Equivalent of A^∗ in (acyclic) probabilistic planning is AO^∗ Even though we know A^∗ and foundations of probabilistic planning, the generalization is far from straightforward:

e.g., in A^∗,g(n) is cost from rootn0ton equivalent in AO^∗ isexpected costfromn0ton

alternative could beexpected costfromn0tongivennis reached

(13)

Expected Cost to Reach State Given It Is Reached

Consider the following expansion of states₀:

s0

a0 a1

s1

100 s2

1

s3

2 s4

2

1 1

.99 .01 .5 .5

Conditional probability ismisleading: s₂ would be expanded, which isn’t part of thebest lookingoption

:

withstate-value estimate Vˆ(s) :=h(s), greedy action aVˆ(s) =a1

(14)

The Best Looking Action

Consider the following expansion of states₀:

s0

a0 a1

s1

100 s2

1

s3

2 s4

2

1 1

.99 .01 .5 .5

Conditional probability ismisleading: s₂ would be expanded, which isn’t part of thebest lookingoption:

withstate-value estimateVˆ(s) :=h(s), greedy action a_V_ˆ(s) =a1

(15)

Expansion in Best Solution Graph

AO^∗ uses different idea:

AO^∗ keeps track ofbest solution graph

AO^∗ expands a state that can bereached from s0 by only applying greedy actions

⇒ nog-value equivalent required

Equivalent version of A^∗ built on this idea can be derived

⇒ A^∗ with backward induction

Since change is non-trivial, we focus on A^∗ variant now and generalize later to acyclic probabilistic tasks (AO^∗) and probabilistic tasks in general (LAO^∗)

(16)

Expansion in Best Solution Graph

AO^∗ uses different idea:

AO^∗ keeps track ofbest solution graph

AO^∗ expands a state that can bereached from s0 by only applying greedy actions

⇒ nog-value equivalent required

Equivalent version of A^∗ built on this idea can be derived

⇒ A^∗ with backward induction

Since change is non-trivial, we focus on A^∗ variant now and generalize later to acyclic probabilistic tasks (AO^∗) and probabilistic tasks in general (LAO^∗)

(17)

A ^∗ with Backward Induction

(18)

Transition Systems

A^∗ with backward induction distinguishesthree transition systems:

The transition systemT =hS,L,c,T,s₀,S^?i

⇒ given implicitly

The explicated graphTˆ_t =hSˆt,L,c,Tˆt,s0,S^?i

⇒ the part of T explicitly considered during search The partial solution graphTˆ_t^? =hSˆ_t^?,L,c,Tˆ_t^?,s₀,S^?i

⇒ The part of ˆT_t that contains best solution

s0 Tˆ_t^? Tˆ_t T

(19)

Explicated Graph

Expanding a states at time stept explicates all successors s⁰ ∈succ(s) by adding them toexplicated graph:

Tˆ_t =hSˆt−1∪succ(s),L,c,Tˆt−1∪ {hs,l,s⁰i ∈T},s₀,S^?} Each explicated state is annotated withstate-value estimate Vˆ_t(s) that describesestimated cost to a goal at time stept When state s⁰ is explicated ands⁰ ∈/ Sˆt−1, its state-value estimate is initializedto ˆVt(s⁰) :=h(s⁰)

We callleaf states of ˆT_t fringe states

(20)

Partial Solution Graph

The partial solution graphTˆ_t^? is the subgraph of ˆT_t that is spanned by the smallest set of states ˆS_t^? that satisfies:

s₀∈Sˆ_t^?

ifs∈Sˆ_t^?,s⁰∈Sˆ_t andhs,aVˆ_t(s)(s),s⁰i ∈Tˆ_t, thens⁰ in ˆS_t^? The partial solution graph forms a sequence of states

hs₀, . . . ,s_ni, starting with the initial state s₀ and ending in the greedy fringe state sn

(21)

Backward Induction

A^∗ with backward induction does not maintainstatic open list State-value estimatesdetermine partial solution graph

Partial solution graph determines which state is expanded (Some) state-value estimates are updated in time step t by backward induction:

Vˆt(s) = min

hs,l,s⁰i∈Tˆt(s)

c(l) + ˆVt(s⁰)

(22)

A

^∗

with backward induction

A^∗ with backward induction for classical planning taskT explicates₀

whilegreedy fringe states ∈/ S?: expand s

perform backward induction of states in ˆT_t−1^? in reverse order returnTˆ_t^?

(23)

A

^∗

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0

18

s1

12 s2

14 18 18

s3 12

s4 6

s5 8 8

s6 0 0

8 8 55

10 8

4 10

8

(24)

A

^∗

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0

19

s1

12 s2

s2

14

18 18

s3 12

s4 6

s5 8 8

s6 0 0

8

5

10 8

4 10

8

(25)

A

^∗

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 19

s1

12 s2

s2

14

18 18

s3 12

s4 6

s5

4

8 8

s6 0 0

8

5

10 8

4

10

8

(26)

A

^∗

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 20

s1

12 s2

14

18

s3 12

s4 6

s5

8

8 8

s6 0

0 0

8

5

10 8

4

10

8

(27)

A

^∗

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 20

s1

12 s2

14 18

18

s3 12

s4 6

s5 8

8

s6 0

0

8

5

10 8

4 10

8

(28)

A

^∗

with backward induction

s0 18

s1

12 s2

14

s3 12

s4 6

s5 4

s6 0

8 5

10 8

4 10

6

8 8

s0 20

s1

12 s2

14 18

18

s3 12

s4 6

s5

8

s6

0

8

5

10 8

4 10

8

(29)

Equivalence of A

^∗

and A

^∗

with Backward Induction

Theorem

A^∗ andA^∗ with Backward Induction expand the same set of states if run with identical admissible heuristic h and identical

tie-breaking criterion.

Proof Sketch.

The proof shows that

there is always a unique states in greedy fringe of A^∗ with backward induction

f(s) =g(s) +h(s) is minimal among all fringe states g(s) of fringe node s encoded in greedy action choices h(s) of fringe node equal to ˆVt(s)

(30)

Summary

(31)

Summary

Non-trivial to generalizeA^∗ to probabilistic planning

For better understanding of AO^∗, wechangeA^∗ towards AO^∗ Derived A^∗ with backward induction, which is similarto AO^∗ and expandsidentical states as A^∗

Planning and Optimization G1. Heuristic Search: AO