Planning and Optimization
G1. Heuristic Search: AO∗ & LAO∗ Part I
Gabriele R¨oger and Thomas Keller
Universit¨at Basel
December 3, 2018
Heuristic Search Motivation A∗with Backward Induction Summary
Content of this Course
Planning
Classical
Tasks Progression/
Regression Complexity Heuristics
Probabilistic
MDPs Blind Methods Heuristic Search
Monte-Carlo Methods
Heuristic Search Motivation A∗with Backward Induction Summary
Heuristic Search
Heuristic Search Motivation A∗with Backward Induction Summary
Heuristic Search: Recap
Heuristic Search Algorithms
Heuristic search algorithmsuse heuristic functions
to (partially or fully) determine the order of node expansion.
(From Lecture 15 of the AI course last semester)
Heuristic Search Motivation A∗with Backward Induction Summary
Best-first Search: Recap
Best-first Search
Abest-first searchis a heuristic search algorithm
that evaluates search nodes with anevaluation function f and always expands a noden with minimal f(n) value.
(From Lecture 15 of the AI course last semester)
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗Search: Recap
A∗Search
A∗ is the best-first search algorithm with evaluation function f(n) =g(n) +h(n.state).
(From Lecture 15 of the AI course last semester)
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗Search (With Reopening): Example
s0 18
s1
12 s2
14
s3 12
s4 6
s5 4
s6 0
8 5
10 8
4 10
6
8 8
s0 0+18
s1 8+12
s2 5+14
s5 15+4
s6 23+0
s3 18+12
s4 16+6
s5 12+4
s6 20+0
8 5
108 4 10
10 8
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗Search (With Reopening): Example
s0 18
s1
12 s2
14
s3 12
s4 6
s5 4
s6 0
8 5
10 8
4 10
6
8 8
s0 0+18
s1 8+12
s2 5+14
s5 15+4
s6 23+0
s3 18+12
s4 16+6
s5 12+4
s6 20+0
8 5
108 4 10
10 8
Heuristic Search Motivation A∗with Backward Induction Summary
Motivation
Heuristic Search Motivation A∗with Backward Induction Summary
From A
∗to AO
∗Equivalent of A∗ in (acyclic) probabilistic planning is AO∗ Even though we know A∗ and foundations of probabilistic planning, the generalization is far from straightforward:
e.g., in A∗,g(n) is cost from rootn0ton equivalent in AO∗ isexpected costfromn0ton
alternative could beexpected costfromn0tongivennis reached
Heuristic Search Motivation A∗with Backward Induction Summary
Expected Cost to Reach State
Consider the following expansion of states0:
s0
a0 a1
s1
100 s2
1
s3
2 s4
2
1 1
.99 .01 .5 .5
Expected cost to reachany of the leaves isinfiniteor undefined (neither is reached with probability 1).
assumingstate-value estimateV(s) :=h(s), a1 is greedy action
Heuristic Search Motivation A∗with Backward Induction Summary
From A
∗to AO
∗Equivalent of A∗ in (acyclic) probabilistic planning is AO∗ Even though we know A∗ and foundations of probabilistic planning, the generalization is far from straightforward:
e.g., in A∗,g(n) is cost from rootn0ton equivalent in AO∗ isexpected costfromn0ton
alternative could beexpected costfromn0tongivennis reached
Heuristic Search Motivation A∗with Backward Induction Summary
Expected Cost to Reach State Given It Is Reached
Consider the following expansion of states0:
s0
a0 a1
s1
100 s2
1
s3
2 s4
2
1 1
.99 .01 .5 .5
Conditional probability ismisleading: s2 would be expanded, which isn’t part of thebest lookingoption
:
withstate-value estimate Vˆ(s) :=h(s), greedy action aVˆ(s) =a1
Heuristic Search Motivation A∗with Backward Induction Summary
The Best Looking Action
Consider the following expansion of states0:
s0
a0 a1
s1
100 s2
1
s3
2 s4
2
1 1
.99 .01 .5 .5
Conditional probability ismisleading: s2 would be expanded, which isn’t part of thebest lookingoption:
withstate-value estimateVˆ(s) :=h(s), greedy action aVˆ(s) =a1
Heuristic Search Motivation A∗with Backward Induction Summary
Expansion in Best Solution Graph
AO∗ uses different idea:
AO∗ keeps track ofbest solution graph
AO∗ expands a state that can bereached from s0 by only applying greedy actions
⇒ nog-value equivalent required
Equivalent version of A∗ built on this idea can be derived
⇒ A∗ with backward induction
Since change is non-trivial, we focus on A∗ variant now and generalize later to acyclic probabilistic tasks (AO∗) and probabilistic tasks in general (LAO∗)
Heuristic Search Motivation A∗with Backward Induction Summary
Expansion in Best Solution Graph
AO∗ uses different idea:
AO∗ keeps track ofbest solution graph
AO∗ expands a state that can bereached from s0 by only applying greedy actions
⇒ nog-value equivalent required
Equivalent version of A∗ built on this idea can be derived
⇒ A∗ with backward induction
Since change is non-trivial, we focus on A∗ variant now and generalize later to acyclic probabilistic tasks (AO∗) and probabilistic tasks in general (LAO∗)
Heuristic Search Motivation A∗with Backward Induction Summary
A ∗ with Backward Induction
Heuristic Search Motivation A∗with Backward Induction Summary
Transition Systems
A∗ with backward induction distinguishesthree transition systems:
The transition systemT =hS,L,c,T,s0,S?i
⇒ given implicitly
The explicated graphTˆt =hSˆt,L,c,Tˆt,s0,S?i
⇒ the part of T explicitly considered during search The partial solution graphTˆt? =hSˆt?,L,c,Tˆt?,s0,S?i
⇒ The part of ˆTt that contains best solution
s0 Tˆt? Tˆt T
Heuristic Search Motivation A∗with Backward Induction Summary
Explicated Graph
Expanding a states at time stept explicates all successors s0 ∈succ(s) by adding them toexplicated graph:
Tˆt =hSˆt−1∪succ(s),L,c,Tˆt−1∪ {hs,l,s0i ∈T},s0,S?} Each explicated state is annotated withstate-value estimate Vˆt(s) that describesestimated cost to a goal at time stept When state s0 is explicated ands0 ∈/ Sˆt−1, its state-value estimate is initializedto ˆVt(s0) :=h(s0)
We callleaf states of ˆTt fringe states
Heuristic Search Motivation A∗with Backward Induction Summary
Partial Solution Graph
The partial solution graphTˆt? is the subgraph of ˆTt that is spanned by the smallest set of states ˆSt? that satisfies:
s0∈Sˆt?
ifs∈Sˆt?,s0∈Sˆt andhs,aVˆt(s)(s),s0i ∈Tˆt, thens0 in ˆSt? The partial solution graph forms a sequence of states
hs0, . . . ,sni, starting with the initial state s0 and ending in the greedy fringe state sn
Heuristic Search Motivation A∗with Backward Induction Summary
Backward Induction
A∗ with backward induction does not maintainstatic open list State-value estimatesdetermine partial solution graph
Partial solution graph determines which state is expanded (Some) state-value estimates are updated in time step t by backward induction:
Vˆt(s) = min
hs,l,s0i∈Tˆt(s)
c(l) + ˆVt(s0)
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗with backward induction
A∗ with backward induction for classical planning taskT explicates0
whilegreedy fringe states ∈/ S?: expand s
perform backward induction of states in ˆTt−1? in reverse order returnTˆt?
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗with backward induction
s0 18
s1
12 s2
14
s3 12
s4 6
s5 4
s6 0
8 5
10 8
4 10
6
8 8
s0
s0
18
s1
12 s2
14 18 18
s3 12
s4 6
s5 8 8
s6 0 0
8 8 55
10 8
4 10
8
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗with backward induction
s0 18
s1
12 s2
14
s3 12
s4 6
s5 4
s6 0
8 5
10 8
4 10
6
8 8
s0
s0
19
s1
s1
12 s2
s2
14
18 18
s3 12
s4 6
s5 8 8
s6 0 0
8
8
5
5
10 8
4 10
8
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗with backward induction
s0 18
s1
12 s2
14
s3 12
s4 6
s5 4
s6 0
8 5
10 8
4 10
6
8 8
s0 19
s1
s1
12 s2
s2
14
18 18
s3 12
s4 6
s5
s5
4
8 8
s6 0 0
8
8
5
5
10 8
4
10
8
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗with backward induction
s0 18
s1
12 s2
14
s3 12
s4 6
s5 4
s6 0
8 5
10 8
4 10
6
8 8
s0 20
s1
s1
12 s2
14
18
18
s3 12
s4 6
s5
s5
8
8 8
s6 0
0 0
8
8
5
5
10 8
4
10
8
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗with backward induction
s0 18
s1
12 s2
14
s3 12
s4 6
s5 4
s6 0
8 5
10 8
4 10
6
8 8
s0 20
s1
s1
12 s2
14 18
18
s3 12
s4 6
s5 8
8
s6 0
0
8
8
5
5
10 8
4 10
8
Heuristic Search Motivation A∗with Backward Induction Summary
A
∗with backward induction
s0 18
s1
12 s2
14
s3 12
s4 6
s5 4
s6 0
8 5
10 8
4 10
6
8 8
s0 20
s1
12 s2
14 18
18
s3 12
s4 6
s5
8
8
s6
s6
0
0
8
8
5
5
10 8
4 10
8
Heuristic Search Motivation A∗with Backward Induction Summary
Equivalence of A
∗and A
∗with Backward Induction
Theorem
A∗ andA∗ with Backward Induction expand the same set of states if run with identical admissible heuristic h and identical
tie-breaking criterion.
Proof Sketch.
The proof shows that
there is always a unique states in greedy fringe of A∗ with backward induction
f(s) =g(s) +h(s) is minimal among all fringe states g(s) of fringe node s encoded in greedy action choices h(s) of fringe node equal to ˆVt(s)
Heuristic Search Motivation A∗with Backward Induction Summary
Summary
Heuristic Search Motivation A∗with Backward Induction Summary
Summary
Non-trivial to generalizeA∗ to probabilistic planning
For better understanding of AO∗, wechangeA∗ towards AO∗ Derived A∗ with backward induction, which is similarto AO∗ and expandsidentical states as A∗