withBackwardInductionG1.4Summary G1.1HeuristicSearch G1.1HeuristicSearchG1.2MotivationG1.3A PlanningandOptimization PlanningandOptimization ContentofthisCourse

(1)

Planning and Optimization

G1. Heuristic Search: AO ^∗ & LAO ^∗ Part I

Gabriele R¨ oger and Thomas Keller

Universit¨ at Basel

December 3, 2018

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 1 / 32

Planning and Optimization

December 3, 2018 — G1. Heuristic Search: AO

^∗

& LAO

^∗

Part I

G1.1 Heuristic Search G1.2 Motivation

G1.3 A ^∗ with Backward Induction G1.4 Summary

Content of this Course

Planning

Classical

Tasks Progression/

Regression Complexity Heuristics

Probabilistic

MDPs Blind Methods Heuristic Search

Monte-Carlo Methods

G1. Heuristic Search: AO^∗& LAO^∗Part I Heuristic Search

G1.1 Heuristic Search

(2)

Heuristic Search: Recap

Heuristic Search Algorithms

Heuristic search algorithms use heuristic functions

to (partially or fully) determine the order of node expansion.

(From Lecture 15 of the AI course last semester)

Best-first Search: Recap

Best-first Search

A best-first search is a heuristic search algorithm

that evaluates search nodes with an evaluation function f and always expands a node n with minimal f (n) value.

(From Lecture 15 of the AI course last semester)

A ^∗ Search: Recap

A ^∗ Search

A ^∗ is the best-first search algorithm with evaluation function f (n) = g (n) + h(n.state).

(From Lecture 15 of the AI course last semester)

A ^∗ Search (With Reopening): Example

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6 8 8

s

0 0+18

s

1 8+12

s

2 5+14

s

5 15+4

s

6 23+0

s

3 18+12

s

4 16+6

s

5 12+4

s

6 20+0

8 5

10 8 4 10

10

8

(3)

A ^∗ Search (With Reopening): Example

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6 8 8

s

0 0+18

s

1 8+12

s

2 5+14

s

5 15+4

s

6 23+0

s

3 18+12

s

4 16+6

s

5 12+4

s

6 20+0

8 5

10 8 4 10

10 8

G1. Heuristic Search: AO^∗& LAO^∗Part I Motivation

G1.2 Motivation

From A ^∗ to AO ^∗

I Equivalent of A ^∗ in (acyclic) probabilistic planning is AO ^∗

I Even though we know A ^∗ and foundations of probabilistic planning, the generalization is far from straightforward:

I

e.g., in A

^∗

, g (n) is cost from root n

0

to n

I

equivalent in AO

^∗

is expected cost from n

0

to n

I

alternative could be expected cost from n

₀

to n given n is reached

Expected Cost to Reach State

Consider the following expansion of state s ₀ :

s

0

a

0

a

1

s

1

100 s

2

1 s

3

2 s

4

2 1 1

.99 .01 .5 .5

Expected cost to reach any of the leaves is infinite or undefined (neither is reached with probability 1).

assuming state-value estimate V (s) := h(s), a ₁ is greedy action

(4)

From A ^∗ to AO ^∗

I Equivalent of A ^∗ in (acyclic) probabilistic planning is AO ^∗

I Even though we know A ^∗ and foundations of probabilistic planning, the generalization is far from straightforward:

I

e.g., in A

^∗

, g (n) is cost from root n

0

to n

I

equivalent in AO

^∗

is expected cost from n

₀

to n

I

alternative could be expected cost from n

0

to n given n is reached

Expected Cost to Reach State Given It Is Reached

Consider the following expansion of state s ₀ :

s

0

a

0

a

1

s

1

100 s

2

1 s

3

2 s

4

2 1 1

.99 .01 .5 .5

Conditional probability is misleading: s ₂ would be expanded, which isn’t part of the best looking option

:

with state-value estimate V ˆ (s ) := h(s), greedy action a _V _ˆ (s) = a ₁

The Best Looking Action

Consider the following expansion of state s ₀ :

s

0

a

0

a

1

s

1

100 s

2

1 s

3

2 s

4

2 1 1

.99 .01 .5 .5

Conditional probability is misleading: s ₂ would be expanded, which isn’t part of the best looking option:

with state-value estimate V ˆ (s ) := h(s ), greedy action a _ˆ (s) = a ₁

Expansion in Best Solution Graph

AO ^∗ uses different idea:

I AO ^∗ keeps track of best solution graph

I AO ^∗ expands a state that can be reached from s ₀ by only applying greedy actions

I ⇒ no g -value equivalent required

I Equivalent version of A ^∗ built on this idea can be derived

⇒ A ^∗ with backward induction

I Since change is non-trivial, we focus on A ^∗ variant now

I and generalize later to acyclic probabilistic tasks (AO ^∗ )

I and probabilistic tasks in general (LAO ^∗ )

(5)

Expansion in Best Solution Graph

AO ^∗ uses different idea:

I AO ^∗ keeps track of best solution graph

I AO ^∗ expands a state that can be reached from s ₀ by only applying greedy actions

I ⇒ no g -value equivalent required

I Equivalent version of A ^∗ built on this idea can be derived

⇒ A ^∗ with backward induction

I Since change is non-trivial, we focus on A ^∗ variant now

I and generalize later to acyclic probabilistic tasks (AO ^∗ )

I and probabilistic tasks in general (LAO ^∗ )

G1. Heuristic Search: AO^∗& LAO^∗Part I A^∗with Backward Induction

G1.3 A ^∗ with Backward Induction

Transition Systems

A ^∗ with backward induction distinguishes three transition systems:

I The transition system T = hS , L, c, T , s ₀ , S ^? i

⇒ given implicitly

I The explicated graph T ˆ _t = h S ˆ _t , L, c , T ˆ _t , s ₀ , S ^? i

⇒ the part of T explicitly considered during search

I The partial solution graph T ˆ _t ^? = h S ˆ _t ^? , L, c , T ˆ _t ^? , s ₀ , S ^? i

⇒ The part of ˆ T _t that contains best solution

s

0

T ˆ _t ^? T ˆ _t T

Explicated Graph

I Expanding a state s at time step t explicates all successors s ⁰ ∈ succ(s) by adding them to explicated graph:

T ˆ _t = h S ˆ _t−1 ∪ succ(s ), L, c , T ˆ _t−1 ∪ {hs , l, s ⁰ i ∈ T }, s ₀ , S ^? }

I Each explicated state is annotated with state-value estimate V ˆ _t (s ) that describes estimated cost to a goal at time step t

I When state s ⁰ is explicated and s ⁰ ∈ / S ˆ _t−1 , its state-value estimate is initialized to ˆ V _t (s ⁰ ) := h(s ⁰ )

I We call leaf states of ˆ T _t fringe states

(6)

Partial Solution Graph

I The partial solution graph T ˆ _t ^? is the subgraph of ˆ T _t that is spanned by the smallest set of states ˆ S _t ^? that satisfies:

I

s

₀

∈ S ˆ

_t^?

I

if s ∈ S ˆ

_t^?

, s

⁰

∈ S ˆ

t

and hs, a

Vˆt(s)

(s), s

⁰

i ∈ T ˆ

t

, then s

⁰

in ˆ S

_t^?

I The partial solution graph forms a sequence of states

hs ₀ , . . . , s _n i, starting with the initial state s ₀ and ending in the greedy fringe state s _n

Backward Induction

I A ^∗ with backward induction does not maintain static open list

I State-value estimates determine partial solution graph

I Partial solution graph determines which state is expanded

I (Some) state-value estimates are updated in time step t by backward induction:

V ˆ _t (s ) = min

hs,l,s

⁰

i∈ T ˆ

t

(s)

c (l) + ˆ V _t (s ⁰ )

A ^∗ with backward induction

A ^∗ with backward induction for classical planning task T explicate s ₀

while greedy fringe state s ∈ / S _? : expand s

perform backward induction of states in ˆ T _t−1 ^? in reverse order return T ˆ _t ^?

A ^∗ with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6 8 8

s

0

s

0

18

s

1

12

s

2

14 18 18

s

3 12

s

4 6

s

5 8 8

s

6 0 0

8 8 5 5

10 8

4 10

8

(7)

A ^∗ with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6 8 8

s

0

s

0

19

s

1

s

1

12

s

2

s

2

14

18 18

s

3 12

s

4 6

s

5 8 8

s

6 0 0

8

5

5 10 8

4 10

8 A ^∗ with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6 8 8

s

0 19

s

1

s

1

12

s

2

s

2

14

18 18

s

3 12

s

4 6

s

5

s

5

4

8 8

s

6 0 0

8

5

5 10 8

4

10

8 A ^∗ with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6 8 8

s

0 20

s

1

s

1

12

s

2

14

18

s

3 12

s

4 6

s

5

s

5

8

8 8

s

6 0

0 0

8

5

5 10 8

4

10

8 A ^∗ with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6 8 8

s

0 20

s

1

s

1

12

s

2

14 18

18

s

3 12

s

4 6

s

5 8

8

s

6 0

0

8

5

5 10 8

4 10

8

(8)

A ^∗ with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6 8 8

s

0 20

s

1

12

s

2

14 18

18

s

3 12

s

4 6

s

5

8

s

6

s

6

0

withBackwardInductionG1.4Summary G1.1HeuristicSearch G1.1HeuristicSearchG1.2MotivationG1.3A PlanningandOptimization PlanningandOptimization ContentofthisCourse

Planning and Optimization

G1. Heuristic Search: AO ∗ & LAO ∗ Part I

Gabriele R¨ oger and Thomas Keller

Universit¨ at Basel

December 3, 2018

Planning and Optimization

December 3, 2018 — G1. Heuristic Search: AO

& LAO

Part I

G1.1 Heuristic Search G1.2 Motivation

G1.3 A ∗ with Backward Induction G1.4 Summary

Content of this Course

Planning

Classical

Tasks Progression/

Regression Complexity Heuristics

Probabilistic

MDPs Blind Methods Heuristic Search

Monte-Carlo Methods

G1.1 Heuristic Search

Heuristic Search: Recap

Heuristic Search Algorithms

Heuristic search algorithms use heuristic functions

to (partially or fully) determine the order of node expansion.

(From Lecture 15 of the AI course last semester)

Best-first Search: Recap

Best-first Search

A best-first search is a heuristic search algorithm

that evaluates search nodes with an evaluation function f and always expands a node n with minimal f (n) value.

(From Lecture 15 of the AI course last semester)

A ∗ Search: Recap

A ∗ Search

A ∗ is the best-first search algorithm with evaluation function f (n) = g (n) + h(n.state).

(From Lecture 15 of the AI course last semester)

A ∗ Search (With Reopening): Example

s

s

s

s

s

s

s

8 5

10 8

4 10

6

8 8

s

s

s

s

s

s

s

s

s

8 5

10 8 4 10

10

8

A ∗ Search (With Reopening): Example

s

s

s

s

s

s

s

8 5

10 8

4 10

6

8 8

s

s

s

s

s

s

G1. Heuristic Search: AO ^∗ & LAO ^∗ Part I

G1.3 A ^∗ with Backward Induction G1.4 Summary

A ^∗ Search: Recap

A ^∗ Search

A ^∗ is the best-first search algorithm with evaluation function f (n) = g (n) + h(n.state).

A ^∗ Search (With Reopening): Example

A ^∗ Search (With Reopening): Example

From A ^∗ to AO ^∗

I Equivalent of A ^∗ in (acyclic) probabilistic planning is AO ^∗

I Even though we know A ^∗ and foundations of probabilistic planning, the generalization is far from straightforward:

Consider the following expansion of state s ₀ :

assuming state-value estimate V (s) := h(s), a ₁ is greedy action

From A ^∗ to AO ^∗

I Equivalent of A ^∗ in (acyclic) probabilistic planning is AO ^∗

I Even though we know A ^∗ and foundations of probabilistic planning, the generalization is far from straightforward:

Consider the following expansion of state s ₀ :

Conditional probability is misleading: s ₂ would be expanded, which isn’t part of the best looking option

with state-value estimate V ˆ (s ) := h(s), greedy action a _V _ˆ (s) = a ₁

Consider the following expansion of state s ₀ :