Planning and Optimization
G1. Heuristic Search: AO ∗ & LAO ∗ Part I
Gabriele R¨ oger and Thomas Keller
Universit¨ at Basel
December 3, 2018
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 1 / 32
Planning and Optimization
December 3, 2018 — G1. Heuristic Search: AO
∗& LAO
∗Part I
G1.1 Heuristic Search G1.2 Motivation
G1.3 A ∗ with Backward Induction G1.4 Summary
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 2 / 32
Content of this Course
Planning
Classical
Tasks Progression/
Regression Complexity Heuristics
Probabilistic
MDPs Blind Methods Heuristic Search
Monte-Carlo Methods
G1. Heuristic Search: AO∗& LAO∗Part I Heuristic Search
G1.1 Heuristic Search
G1. Heuristic Search: AO∗& LAO∗Part I Heuristic Search
Heuristic Search: Recap
Heuristic Search Algorithms
Heuristic search algorithms use heuristic functions
to (partially or fully) determine the order of node expansion.
(From Lecture 15 of the AI course last semester)
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 5 / 32
G1. Heuristic Search: AO∗& LAO∗Part I Heuristic Search
Best-first Search: Recap
Best-first Search
A best-first search is a heuristic search algorithm
that evaluates search nodes with an evaluation function f and always expands a node n with minimal f (n) value.
(From Lecture 15 of the AI course last semester)
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 6 / 32
G1. Heuristic Search: AO∗& LAO∗Part I Heuristic Search
A ∗ Search: Recap
A ∗ Search
A ∗ is the best-first search algorithm with evaluation function f (n) = g (n) + h(n.state).
(From Lecture 15 of the AI course last semester)
G1. Heuristic Search: AO∗& LAO∗Part I Heuristic Search
A ∗ Search (With Reopening): Example
s
0 18s
112
s
214
s
3 12s
4 6s
5 4s
6 08 5
10 8
4 10
6
8 8
s
0 0+18s
1 8+12s
2 5+14s
5 15+4s
6 23+0s
3 18+12s
4 16+6s
5 12+4s
6 20+08 5
10 8 4 10
10
8
G1. Heuristic Search: AO∗& LAO∗Part I Heuristic Search
A ∗ Search (With Reopening): Example
s
0 18s
112
s
214
s
3 12s
4 6s
5 4s
6 08 5
10 8
4 10
6
8 8
s
0 0+18s
1 8+12s
2 5+14s
5 15+4s
6 23+0s
3 18+12s
4 16+6s
5 12+4s
6 20+08 5
10 8 4 10
10 8
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 9 / 32
G1. Heuristic Search: AO∗& LAO∗Part I Motivation
G1.2 Motivation
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 10 / 32
G1. Heuristic Search: AO∗& LAO∗Part I Motivation
From A ∗ to AO ∗
I Equivalent of A ∗ in (acyclic) probabilistic planning is AO ∗
I Even though we know A ∗ and foundations of probabilistic planning, the generalization is far from straightforward:
I
e.g., in A
∗, g (n) is cost from root n
0to n
I
equivalent in AO
∗is expected cost from n
0to n
I
alternative could be expected cost from n
0to n given n is reached
G1. Heuristic Search: AO∗& LAO∗Part I Motivation
Expected Cost to Reach State
Consider the following expansion of state s 0 :
s
0a
0a
1s
1100 s
21
s
32 s
42
1 1
.99 .01 .5 .5
Expected cost to reach any of the leaves is infinite or undefined (neither is reached with probability 1).
assuming state-value estimate V (s) := h(s), a 1 is greedy action
G1. Heuristic Search: AO∗& LAO∗Part I Motivation
From A ∗ to AO ∗
I Equivalent of A ∗ in (acyclic) probabilistic planning is AO ∗
I Even though we know A ∗ and foundations of probabilistic planning, the generalization is far from straightforward:
I
e.g., in A
∗, g (n) is cost from root n
0to n
I
equivalent in AO
∗is expected cost from n
0to n
I
alternative could be expected cost from n
0to n given n is reached
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 13 / 32
G1. Heuristic Search: AO∗& LAO∗Part I Motivation
Expected Cost to Reach State Given It Is Reached
Consider the following expansion of state s 0 :
s
0a
0a
1s
1100 s
21
s
32 s
42
1 1
.99 .01 .5 .5
Conditional probability is misleading: s 2 would be expanded, which isn’t part of the best looking option
:
with state-value estimate V ˆ (s ) := h(s), greedy action a V ˆ (s) = a 1
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 14 / 32
G1. Heuristic Search: AO∗& LAO∗Part I Motivation
The Best Looking Action
Consider the following expansion of state s 0 :
s
0a
0a
1s
1100 s
21
s
32 s
42
1 1
.99 .01 .5 .5
Conditional probability is misleading: s 2 would be expanded, which isn’t part of the best looking option:
with state-value estimate V ˆ (s ) := h(s ), greedy action a ˆ (s) = a 1
G1. Heuristic Search: AO∗& LAO∗Part I Motivation
Expansion in Best Solution Graph
AO ∗ uses different idea:
I AO ∗ keeps track of best solution graph
I AO ∗ expands a state that can be reached from s 0 by only applying greedy actions
I ⇒ no g -value equivalent required
I Equivalent version of A ∗ built on this idea can be derived
⇒ A ∗ with backward induction
I Since change is non-trivial, we focus on A ∗ variant now
I and generalize later to acyclic probabilistic tasks (AO ∗ )
I and probabilistic tasks in general (LAO ∗ )
G1. Heuristic Search: AO∗& LAO∗Part I Motivation
Expansion in Best Solution Graph
AO ∗ uses different idea:
I AO ∗ keeps track of best solution graph
I AO ∗ expands a state that can be reached from s 0 by only applying greedy actions
I ⇒ no g -value equivalent required
I Equivalent version of A ∗ built on this idea can be derived
⇒ A ∗ with backward induction
I Since change is non-trivial, we focus on A ∗ variant now
I and generalize later to acyclic probabilistic tasks (AO ∗ )
I and probabilistic tasks in general (LAO ∗ )
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 17 / 32
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
G1.3 A ∗ with Backward Induction
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 18 / 32
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
Transition Systems
A ∗ with backward induction distinguishes three transition systems:
I The transition system T = hS , L, c, T , s 0 , S ? i
⇒ given implicitly
I The explicated graph T ˆ t = h S ˆ t , L, c , T ˆ t , s 0 , S ? i
⇒ the part of T explicitly considered during search
I The partial solution graph T ˆ t ? = h S ˆ t ? , L, c , T ˆ t ? , s 0 , S ? i
⇒ The part of ˆ T t that contains best solution
s
0T ˆ t ? T ˆ t T
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
Explicated Graph
I Expanding a state s at time step t explicates all successors s 0 ∈ succ(s) by adding them to explicated graph:
T ˆ t = h S ˆ t−1 ∪ succ(s ), L, c , T ˆ t−1 ∪ {hs , l, s 0 i ∈ T }, s 0 , S ? }
I Each explicated state is annotated with state-value estimate V ˆ t (s ) that describes estimated cost to a goal at time step t
I When state s 0 is explicated and s 0 ∈ / S ˆ t−1 , its state-value estimate is initialized to ˆ V t (s 0 ) := h(s 0 )
I We call leaf states of ˆ T t fringe states
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
Partial Solution Graph
I The partial solution graph T ˆ t ? is the subgraph of ˆ T t that is spanned by the smallest set of states ˆ S t ? that satisfies:
I
s
0∈ S ˆ
t?I
if s ∈ S ˆ
t?, s
0∈ S ˆ
tand hs, a
Vˆt(s)(s), s
0i ∈ T ˆ
t, then s
0in ˆ S
t?I The partial solution graph forms a sequence of states
hs 0 , . . . , s n i, starting with the initial state s 0 and ending in the greedy fringe state s n
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 21 / 32
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
Backward Induction
I A ∗ with backward induction does not maintain static open list
I State-value estimates determine partial solution graph
I Partial solution graph determines which state is expanded
I (Some) state-value estimates are updated in time step t by backward induction:
V ˆ t (s ) = min
hs,l,s
0i∈ T ˆ
t(s)
c (l) + ˆ V t (s 0 )
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 22 / 32
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
A ∗ with backward induction
A ∗ with backward induction for classical planning task T explicate s 0
while greedy fringe state s ∈ / S ? : expand s
perform backward induction of states in ˆ T t−1 ? in reverse order return T ˆ t ?
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
A ∗ with backward induction
s
0 18s
112
s
214
s
3 12s
4 6s
5 4s
6 08 5
10 8
4 10
6
8 8
s
0s
018
s
112
s
214 18 18
s
3 12s
4 6s
5 8 8s
6 0 08 8 5 5
10 8
4 10
8
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
A ∗ with backward induction
s
0 18s
112
s
214
s
3 12s
4 6s
5 4s
6 08 5
10 8
4 10
6
8 8
s
0s
019
s
1s
112
s
2s
214
18 18
s
3 12s
4 6s
5 8 8s
6 0 08
8
5
5
10 8
4 10
8
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 25 / 32
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
A ∗ with backward induction
s
0 18s
112
s
214
s
3 12s
4 6s
5 4s
6 08 5
10 8
4 10
6
8 8
s
0 19s
1s
112
s
2s
214
18 18
s
3 12s
4 6s
5s
54
8 8
s
6 0 08
8
5
5
10 8
4
10
8
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 26 / 32
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
A ∗ with backward induction
s
0 18s
112
s
214
s
3 12s
4 6s
5 4s
6 08 5
10 8
4 10
6
8 8
s
0 20s
1s
112
s
214
18
18
s
3 12s
4 6s
5s
58
8 8
s
6 00 0
8
8
5
5
10 8
4
10
8
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
A ∗ with backward induction
s
0 18s
112
s
214
s
3 12s
4 6s
5 4s
6 08 5
10 8
4 10
6
8 8
s
0 20s
1s
112
s
214 18
18
s
3 12s
4 6s
5 88
s
6 00
8
8
5
5
10 8
4 10
8
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
A ∗ with backward induction
s
0 18s
112
s
214
s
3 12s
4 6s
5 4s
6 08 5
10 8
4 10
6
8 8
s
0 20s
112
s
214 18
18
s
3 12s
4 6s
58
8
s
6s
60
0
8
8
5
5
10 8
4 10
8
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 29 / 32
G1. Heuristic Search: AO∗& LAO∗Part I A∗with Backward Induction
Equivalence of A ∗ and A ∗ with Backward Induction
Theorem
A ∗ and A ∗ with Backward Induction expand the same set of states if run with identical admissible heuristic h and identical
tie-breaking criterion.
Proof Sketch.
The proof shows that
I there is always a unique state s in greedy fringe of A ∗ with backward induction
I f (s) = g (s) + h(s ) is minimal among all fringe states
I g (s ) of fringe node s encoded in greedy action choices
I h(s ) of fringe node equal to ˆ V t (s)
G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 30 / 32
G1. Heuristic Search: AO∗& LAO∗Part I Summary
G1.4 Summary
G1. Heuristic Search: AO∗& LAO∗Part I Summary