• Keine Ergebnisse gefunden

withBackwardInductionG1.4Summary G1.1HeuristicSearch G1.1HeuristicSearchG1.2MotivationG1.3A PlanningandOptimization PlanningandOptimization ContentofthisCourse

N/A
N/A
Protected

Academic year: 2022

Aktie "withBackwardInductionG1.4Summary G1.1HeuristicSearch G1.1HeuristicSearchG1.2MotivationG1.3A PlanningandOptimization PlanningandOptimization ContentofthisCourse"

Copied!
8
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Planning and Optimization

G1. Heuristic Search: AO & LAO Part I

Gabriele R¨ oger and Thomas Keller

Universit¨ at Basel

December 3, 2018

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 1 / 32

Planning and Optimization

December 3, 2018 — G1. Heuristic Search: AO

& LAO

Part I

G1.1 Heuristic Search G1.2 Motivation

G1.3 A with Backward Induction G1.4 Summary

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 2 / 32

Content of this Course

Planning

Classical

Tasks Progression/

Regression Complexity Heuristics

Probabilistic

MDPs Blind Methods Heuristic Search

Monte-Carlo Methods

G1. Heuristic Search: AO& LAOPart I Heuristic Search

G1.1 Heuristic Search

(2)

G1. Heuristic Search: AO& LAOPart I Heuristic Search

Heuristic Search: Recap

Heuristic Search Algorithms

Heuristic search algorithms use heuristic functions

to (partially or fully) determine the order of node expansion.

(From Lecture 15 of the AI course last semester)

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 5 / 32

G1. Heuristic Search: AO& LAOPart I Heuristic Search

Best-first Search: Recap

Best-first Search

A best-first search is a heuristic search algorithm

that evaluates search nodes with an evaluation function f and always expands a node n with minimal f (n) value.

(From Lecture 15 of the AI course last semester)

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 6 / 32

G1. Heuristic Search: AO& LAOPart I Heuristic Search

A Search: Recap

A Search

A is the best-first search algorithm with evaluation function f (n) = g (n) + h(n.state).

(From Lecture 15 of the AI course last semester)

G1. Heuristic Search: AO& LAOPart I Heuristic Search

A Search (With Reopening): Example

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6

8 8

s

0 0+18

s

1 8+12

s

2 5+14

s

5 15+4

s

6 23+0

s

3 18+12

s

4 16+6

s

5 12+4

s

6 20+0

8 5

10 8 4 10

10

8

(3)

G1. Heuristic Search: AO& LAOPart I Heuristic Search

A Search (With Reopening): Example

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6

8 8

s

0 0+18

s

1 8+12

s

2 5+14

s

5 15+4

s

6 23+0

s

3 18+12

s

4 16+6

s

5 12+4

s

6 20+0

8 5

10 8 4 10

10 8

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 9 / 32

G1. Heuristic Search: AO& LAOPart I Motivation

G1.2 Motivation

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 10 / 32

G1. Heuristic Search: AO& LAOPart I Motivation

From A to AO

I Equivalent of A in (acyclic) probabilistic planning is AO

I Even though we know A and foundations of probabilistic planning, the generalization is far from straightforward:

I

e.g., in A

, g (n) is cost from root n

0

to n

I

equivalent in AO

is expected cost from n

0

to n

I

alternative could be expected cost from n

0

to n given n is reached

G1. Heuristic Search: AO& LAOPart I Motivation

Expected Cost to Reach State

Consider the following expansion of state s 0 :

s

0

a

0

a

1

s

1

100 s

2

1

s

3

2 s

4

2

1 1

.99 .01 .5 .5

Expected cost to reach any of the leaves is infinite or undefined (neither is reached with probability 1).

assuming state-value estimate V (s) := h(s), a 1 is greedy action

(4)

G1. Heuristic Search: AO& LAOPart I Motivation

From A to AO

I Equivalent of A in (acyclic) probabilistic planning is AO

I Even though we know A and foundations of probabilistic planning, the generalization is far from straightforward:

I

e.g., in A

, g (n) is cost from root n

0

to n

I

equivalent in AO

is expected cost from n

0

to n

I

alternative could be expected cost from n

0

to n given n is reached

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 13 / 32

G1. Heuristic Search: AO& LAOPart I Motivation

Expected Cost to Reach State Given It Is Reached

Consider the following expansion of state s 0 :

s

0

a

0

a

1

s

1

100 s

2

1

s

3

2 s

4

2

1 1

.99 .01 .5 .5

Conditional probability is misleading: s 2 would be expanded, which isn’t part of the best looking option

:

with state-value estimate V ˆ (s ) := h(s), greedy action a V ˆ (s) = a 1

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 14 / 32

G1. Heuristic Search: AO& LAOPart I Motivation

The Best Looking Action

Consider the following expansion of state s 0 :

s

0

a

0

a

1

s

1

100 s

2

1

s

3

2 s

4

2

1 1

.99 .01 .5 .5

Conditional probability is misleading: s 2 would be expanded, which isn’t part of the best looking option:

with state-value estimate V ˆ (s ) := h(s ), greedy action a ˆ (s) = a 1

G1. Heuristic Search: AO& LAOPart I Motivation

Expansion in Best Solution Graph

AO uses different idea:

I AO keeps track of best solution graph

I AO expands a state that can be reached from s 0 by only applying greedy actions

I ⇒ no g -value equivalent required

I Equivalent version of A built on this idea can be derived

⇒ A with backward induction

I Since change is non-trivial, we focus on A variant now

I and generalize later to acyclic probabilistic tasks (AO )

I and probabilistic tasks in general (LAO )

(5)

G1. Heuristic Search: AO& LAOPart I Motivation

Expansion in Best Solution Graph

AO uses different idea:

I AO keeps track of best solution graph

I AO expands a state that can be reached from s 0 by only applying greedy actions

I ⇒ no g -value equivalent required

I Equivalent version of A built on this idea can be derived

⇒ A with backward induction

I Since change is non-trivial, we focus on A variant now

I and generalize later to acyclic probabilistic tasks (AO )

I and probabilistic tasks in general (LAO )

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 17 / 32

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

G1.3 A with Backward Induction

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 18 / 32

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

Transition Systems

A with backward induction distinguishes three transition systems:

I The transition system T = hS , L, c, T , s 0 , S ? i

⇒ given implicitly

I The explicated graph T ˆ t = h S ˆ t , L, c , T ˆ t , s 0 , S ? i

⇒ the part of T explicitly considered during search

I The partial solution graph T ˆ t ? = h S ˆ t ? , L, c , T ˆ t ? , s 0 , S ? i

⇒ The part of ˆ T t that contains best solution

s

0

T ˆ t ? T ˆ t T

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

Explicated Graph

I Expanding a state s at time step t explicates all successors s 0 ∈ succ(s) by adding them to explicated graph:

T ˆ t = h S ˆ t−1 ∪ succ(s ), L, c , T ˆ t−1 ∪ {hs , l, s 0 i ∈ T }, s 0 , S ? }

I Each explicated state is annotated with state-value estimate V ˆ t (s ) that describes estimated cost to a goal at time step t

I When state s 0 is explicated and s 0 ∈ / S ˆ t−1 , its state-value estimate is initialized to ˆ V t (s 0 ) := h(s 0 )

I We call leaf states of ˆ T t fringe states

(6)

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

Partial Solution Graph

I The partial solution graph T ˆ t ? is the subgraph of ˆ T t that is spanned by the smallest set of states ˆ S t ? that satisfies:

I

s

0

∈ S ˆ

t?

I

if s ∈ S ˆ

t?

, s

0

∈ S ˆ

t

and hs, a

Vˆt(s)

(s), s

0

i ∈ T ˆ

t

, then s

0

in ˆ S

t?

I The partial solution graph forms a sequence of states

hs 0 , . . . , s n i, starting with the initial state s 0 and ending in the greedy fringe state s n

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 21 / 32

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

Backward Induction

I A with backward induction does not maintain static open list

I State-value estimates determine partial solution graph

I Partial solution graph determines which state is expanded

I (Some) state-value estimates are updated in time step t by backward induction:

V ˆ t (s ) = min

hs,l,s

0

i∈ T ˆ

t

(s)

c (l) + ˆ V t (s 0 )

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 22 / 32

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

A with backward induction

A with backward induction for classical planning task T explicate s 0

while greedy fringe state s ∈ / S ? : expand s

perform backward induction of states in ˆ T t−1 ? in reverse order return T ˆ t ?

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

A with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6

8 8

s

0

s

0

18

s

1

12

s

2

14 18 18

s

3 12

s

4 6

s

5 8 8

s

6 0 0

8 8 5 5

10 8

4 10

8

(7)

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

A with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6

8 8

s

0

s

0

19

s

1

s

1

12

s

2

s

2

14

18 18

s

3 12

s

4 6

s

5 8 8

s

6 0 0

8

8

5

5

10 8

4 10

8

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 25 / 32

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

A with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6

8 8

s

0 19

s

1

s

1

12

s

2

s

2

14

18 18

s

3 12

s

4 6

s

5

s

5

4

8 8

s

6 0 0

8

8

5

5

10 8

4

10

8

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 26 / 32

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

A with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6

8 8

s

0 20

s

1

s

1

12

s

2

14

18

18

s

3 12

s

4 6

s

5

s

5

8

8 8

s

6 0

0 0

8

8

5

5

10 8

4

10

8

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

A with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6

8 8

s

0 20

s

1

s

1

12

s

2

14 18

18

s

3 12

s

4 6

s

5 8

8

s

6 0

0

8

8

5

5

10 8

4 10

8

(8)

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

A with backward induction

s

0 18

s

1

12

s

2

14

s

3 12

s

4 6

s

5 4

s

6 0

8 5

10 8

4 10

6

8 8

s

0 20

s

1

12

s

2

14 18

18

s

3 12

s

4 6

s

5

8

8

s

6

s

6

0

0

8

8

5

5

10 8

4 10

8

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 29 / 32

G1. Heuristic Search: AO& LAOPart I Awith Backward Induction

Equivalence of A and A with Backward Induction

Theorem

A and A with Backward Induction expand the same set of states if run with identical admissible heuristic h and identical

tie-breaking criterion.

Proof Sketch.

The proof shows that

I there is always a unique state s in greedy fringe of A with backward induction

I f (s) = g (s) + h(s ) is minimal among all fringe states

I g (s ) of fringe node s encoded in greedy action choices

I h(s ) of fringe node equal to ˆ V t (s)

G. R¨oger, T. Keller (Universit¨at Basel) Planning and Optimization December 3, 2018 30 / 32

G1. Heuristic Search: AO& LAOPart I Summary

G1.4 Summary

G1. Heuristic Search: AO& LAOPart I Summary

Summary

I Non-trivial to generalize A to probabilistic planning

I For better understanding of AO , we change A towards AO

I Derived A with backward induction, which is similar to AO

I and expands identical states as A

Referenzen

ÄHNLICHE DOKUMENTE

A6.2 Positive Normal Form A6.3 Example and Discussion A6.4 STRIPS..

A7.3 FDR Planning Tasks A7.4 FDR Task Semantics A7.5 SAS + Planning Tasks A7.6 Summary.. Keller (Universit¨ at Basel) Planning and Optimization October 8, 2018 2

I backward search from goal to initial state based on regression?. I

Definition (Delete Relaxation of Propositional Planning Tasks) The delete relaxation Π + of a propositional planning task Π = hV , I, O, γi in positive normal form is the planning

Keller (Universit¨ at Basel) Planning and Optimization November 21, 2018 2 / 25G. Content of

I Compact descriptions that induce SSPs and MDPs analogous to classical

DR-MDPs with finite state set are always cyclic ⇒ acyclic policy evaluation not applicable But: existence of goal state not required for iterative policy evaluation albeit traces

I Single-outcome determinizations: important parts of state space can become unreachable ⇒ poor policy or unsolvable. I All-outcomes determinization: