• Keine Ergebnisse gefunden

Planning and Optimization E2. Landmarks: RTG Landmarks & MHS Heuristic Malte Helmert and Gabriele R¨oger

N/A
N/A
Protected

Academic year: 2022

Aktie "Planning and Optimization E2. Landmarks: RTG Landmarks & MHS Heuristic Malte Helmert and Gabriele R¨oger"

Copied!
9
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Planning and Optimization

E2. Landmarks: RTG Landmarks & MHS Heuristic

Malte Helmert and Gabriele R¨ oger

Universit¨ at Basel

November 16, 2020

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 1 / 35

Planning and Optimization

November 16, 2020 — E2. Landmarks: RTG Landmarks & MHS Heuristic

E2.1 Landmarks

E2.2 Landmarks from RTGs

E2.3 Minimum Hitting Set Heuristic E2.4 Summary

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 2 / 35

Content of this Course

Planning

Classical

Foundations Logic Heuristics Constraints

Probabilistic

Explicit MDPs Factored MDPs

Content of this Course: Constraints

Constraints

Landmarks RTG Landmarks

MHS Heuristic

LM-Cut Heuristic Cost

Partitioning Network

Flows

Operator

Counting

(2)

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks

E2.1 Landmarks

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 5 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks

Landmarks

Basic Idea: Something that must happen in every solution For example

I some operator must be applied (action landmark) I some atomic proposition must hold (fact landmark) I some formula must be true (formula landmark)

→ Derive heuristic estimate from this kind of information.

We only consider fact and disjunctive action landmarks.

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 6 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks

Definition

Definition (Disjunctive Action Landmark)

Let s be a state of planning task Π = hV , I, O , γi.

A disjunctive action landmark for s is a set of operators L ⊆ O such that every label path from s to a goal state contains an operator from L.

The cost of landmark L is cost(L) = min o∈L cost(o).

Definition (Fact Landmark)

Let s be a state of planning task Π = hV , I, O , γi.

An atomic proposition v = d for v ∈ V and d ∈ dom(v ) is a fact landmark for s if every state path from s to a goal state contains a state s 0 with s 0 (v ) = d .

If we talk about landmarks for the initial state, we omit “for I ”.

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks

Landmarks: Example

Example

Consider a FDR planning task hV , I, O, γi with I V = {robot-at, dishes-at} with

I dom(robot-at) = {A1, . . . , C3, B4, A5, . . . , B6}

I dom(dishes-at) = {Table, Robot, Dishwasher}

I I = {robot-at 7→ C1, dishes-at 7→ Table}

I operators

I move-x-y to move from cell x to adjacent cell y I pickup dishes, and

I load dishes into the dishwasher.

I γ = (robot-at = B 6) ∧ (dishes-at = Dishwasher)

(3)

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks

Fact Landmarks: Example

1 2 3 4 5 6

C B A

Images from wikimedia

Each fact in gray is a fact landmark:

I robot-at = x for x ∈ {A1, A6, B3, B4, B5, B6, C1}

I dishes-at = x for x ∈ {Dishwasher, Robot, Table}

I Dummy 1 I Dumym 2

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 9 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks

Disjunctive Action Landmarks: Example

1 2 3 4 5 6

C B A

Actions of same color form disjunctive action landmark:

I {pickup}

I {load}

I {move-B3-B4}

I {move-B4-B5}

I {move-A6-B6, move-B5-B6}

I {move-A3-B3, move-B2-B3, move-C3-B3}

I {move-B1-A1, move-A2-A1}

I . . .

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 10 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks

Remarks

I Not every landmark is informative. Some examples:

I The set of all operators is a disjunctive action landmark unless the initial state is already a goal state.

I Every variable that is initially true is a fact landmark.

I Deciding whether a given variable is a fact landmark is as hard as the plan existence problem.

I Deciding whether a given operator set is a disjunctive action landmark is as hard as the plan existence problem.

I Every fact landmark v that is initially false induces a disjunctive action landmark consisting of all operators that possibly make v true.

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

E2.2 Landmarks from RTGs

(4)

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Content of this Course: Constraints

Constraints

Landmarks RTG Landmarks

MHS Heuristic

LM-Cut Heuristic Cost

Partitioning Network

Flows Operator Counting

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 13 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Computing Landmarks

How can we come up with landmarks?

Most landmarks are derived from the relaxed task graph:

I RHW landmarks: Richter, Helmert & Westphal. Landmarks Revisited. (AAAI 2008)

I LM-Cut: Helmert & Domshlak. Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? (ICAPS 2009) I h m landmarks: Keyder, Richter & Helmert: Sound and

Complete Landmarks for And/Or Graphs (ECAI 2010) We discuss h m landmarks restricted to m = 1

and to STRIPS planning tasks.

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 14 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Incidental Landmarks: Example

Example (Incidental Landmarks)

Consider a STRIPS planning task hV , I , {o 1 , o 2 }, γi with V = {a, b, c, d , e, f },

I = {a 7→ T, b 7→ T, c 7→ F, d 7→ F, e 7→ T, f 7→ F}, o 1 = h{a}, {c , d , e}, {a, b}i,

o 2 = h{d , e}, {f }, {a, d }i, and γ = {e, f }.

Single solution: ho 1 , o 2 i

I All variables are fact landmarks.

I Variable b is initially true but irrelevant for the plan.

I Variable c gets true as “side effect” of o 1 but it is not necessary for the goal or to make an operator applicable.

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Causal Landmarks

Definition (Causal Fact Landmark)

Let Π = hV , I , O, γi be a STRIPS planning task.

An atomic proposition v = T for v ∈ V is a causal fact landmark I if v ∈ γ

I or if for all goal paths π = ho 1 , . . . , o n i there is an o i with

v ∈ pre(o i ).

(5)

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Causal Landmarks: Example

Example (Causal Landmarks)

Consider a STRIPS planning task hV , I , {o 1 , o 2 }, γi with V = {a, b, c, d , e, f },

I = {a 7→ T, b 7→ T, c 7→ F, d 7→ F, e 7→ T, f 7→ F}, o 1 = h{a}, {c , d , e}, {a, b}i,

o 2 = h{d , e}, {f }, {a, d }i, and γ = {e, f }.

Single solution: ho 1 , o 2 i

I All variables are fact landmarks for the initial state.

I Only a, d , e and f are causal landmarks.

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 17 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

What We Are Doing Next

I Causal landmarks are the desirable landmarks.

I We can use a simplified version of RTGs to compute causal landmarks for STRIPS planning tasks.

I We will define landmarks of AND/OR graphs, . . . I and show how they can be computed.

I Afterwards we establish that these are landmarks of the planning task.

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 18 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Simplified Relaxed Task Graph

Definition

For a STRIPS planning task Π = hV , I , O, γi, the simplified relaxed task graph sRTG(Π + ) is the AND/OR graph hN and ∪ N or , A, typei with

I N and = {n o | o ∈ O } ∪ {v I , v G } with type(n) = ∧ for all n ∈ N and , I N or = {n v | v ∈ V }

with type(n) = ∨ for all n ∈ N or , and I A = {hn a , n o i | o ∈ O, a ∈ add(o)} ∪ E = {hn o , n p i | o ∈ O, p ∈ pre(o)} ∪ E = {hn v , n I i | v ∈ I} ∪

E = {hn G , n v i | v ∈ γ}

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Simplified RTG: Example

The simplified RTG for our example task is:

a b

c

d

e f

I

o1 o2

G

(6)

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Characterizing Equation System

Theorem

Let G = hN , A, typei be an AND/OR graph. Consider the following system of equations:

LM(n) = {n} ∪ \

hn,n

0

i∈A

LM(n 0 ) type(n) = ∨ LM(n) = {n} ∪ [

hn,n

0

i∈A

LM(n 0 ) type(n) = ∧

The equation system has a unique maximal solution (maximal with regard to set inclusion), and for this solution it holds that

n 0 ∈ LM(n) iff n 0 is a landmark for reaching n in G .

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 21 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Computation of Maximal Solution

Theorem

Let G = hN, A, typei be an AND/OR graph. Consider the following system of equations:

LM(n) = {n} ∪ \

hn,n

0

i∈A

LM(n 0 ) type(n) = ∨ LM(n) = {n} ∪ [

hn,n

0

i∈A

LM(n 0 ) type(n) = ∧

The equation system has a unique maximal solution (maximal with regard to set inclusion).

Computation: Initialize landmark sets as LM(n) = N and ∪ N or and Computation: apply equations as update rules until fixpoint.

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 22 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

a-f,I,G,o

1

,o

2

I

a,I b,I e,I

a,I,o

1

a,c,I,o

1

a,d,I,o

1

a,d,e,I,o

1

,o

2

a,d,e,f,I,o

1

,o

2

a,d,e,f,I,G,o

1

,o

2

(cf. screen version of slides for step-wise computation)

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Relation to Planning Task Landmarks

Theorem

Let Π = hV , I , O, γi be a STRIPS planning task and

let L be the set of landmarks for reaching n G in sRTG(Π + ).

The set {v = T | v ∈ V and n v ∈ L} is exactly the set of causal fact landmarks in Π + .

For operators o ∈ O, if n o ∈ L then {o } is a disjunctive action landmark in Π + .

There are no other disjunctive action landmarks of size 1.

(Proofs omitted.)

(7)

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Computed RTG Landmarks: Example

Example (Computed RTG Landmarks)

Consider a STRIPS planning task hV , I , {o 1 , o 2 }, γi with V = {a, b, c, d , e, f },

I = {a 7→ T, b 7→ T, c 7→ F, d 7→ F, e 7→ T, f 7→ F}, o 1 = h{a}, {c , d , e}, {a, b}i,

o 2 = h{d , e}, {f }, {a, d }i, and γ = {e, f }.

I LM(n G ) = {a, d , e, f , I , G , o 1 , o 2 }

I a, d , e, and f are causal fact landmarks of Π + .

I {o 1 } and {o 2 } are disjunctive action landmarks of Π + .

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 25 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Landmarks from RTGs

Landmarks of Π + Are Landmarks of Π

Theorem

Let Π be a STRIPS planning task.

All fact landmarks of Π + are fact landmarks of Π and all disjunctive action landmarks of Π + are disjunctive action landmarks of Π.

Proof.

Let L be a disjunctive action landmark of Π + and π be a plan for Π. Then π is also a plan for Π + and, thus, π contains an operator from L.

Let f be a fact landmark of Π + . If f is already true in the initial state, then it is also a landmark of Π. Otherwise, every plan for Π + contains an operator that adds f and the set of all these operators is a disjunctive action landmark of Π + . Therefore, also each plan of Π contains such an operator, making f a fact landmark of Π.

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 26 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Minimum Hitting Set Heuristic

E2.3 Minimum Hitting Set Heuristic

E2. Landmarks: RTG Landmarks & MHS Heuristic Minimum Hitting Set Heuristic

Content of this Course: Constraints

Constraints

Landmarks RTG Landmarks

MHS Heuristic

LM-Cut Heuristic Cost

Partitioning Network

Flows

Operator

Counting

Potential

Heuristics

(8)

E2. Landmarks: RTG Landmarks & MHS Heuristic Minimum Hitting Set Heuristic

Exploiting Disjunctive Action Landmarks

I The cost cost(L) of a disjunctive action landmark L is an admissible heuristic, but it is usually not very informative.

I Landmark heuristics typically aim to combine multiple disjunctive action landmarks.

How can we exploit a given set L of disjunctive action landmarks?

I Sum of costs P

L∈L cost(L)?

not admissible!

I Maximize costs max L∈L cost(L)?

usually very weak heuristic I better: Hitting sets

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 29 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Minimum Hitting Set Heuristic

Hitting Sets

Definition (Hitting Set)

Let X be a set, F = {F 1 , . . . , F n } ⊆ 2 X be a family of subsets of X and c : X → R + 0 be a cost function for X .

A hitting set is a subset H ⊆ X that “hits” all subsets in F, i.e., H ∩ F 6= ∅ for all F ∈ F . The cost of H is P

x∈H c (x ).

A minimum hitting set (MHS) is a hitting set with minimal cost.

MHS is a “classical” NP-complete problem (Karp, 1972)

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 30 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Minimum Hitting Set Heuristic

Example: Hitting Sets

Example

X = {o 1 , o 2 , o 3 , o 4 }

F = {{o 4 }, {o 1 , o 2 }, {o 1 , o 3 }, {o 2 , o 3 }}

c (o 1 ) = 3, c(o 2 ) = 4, c(o 3 ) = 5, c (o 4 ) = 0 What is a minimum hitting set?

Solution: {o 1 , o 2 , o 4 } with cost 3 + 4 + 0 = 7

E2. Landmarks: RTG Landmarks & MHS Heuristic Minimum Hitting Set Heuristic

Hitting Sets for Disjunctive Action Landmarks

Idea: disjunctive action landmarks are interpreted as Idea: instance of minimum hitting set

Definition (Hitting Set Heuristic)

Let L be a set of disjunctive action landmarks. The hitting set heuristic h MHS (L) is defined as the cost of a minimum hitting set for L with c (o) = cost(o).

Proposition (Hitting Set Heuristic is Admissible)

Let L be a set of disjunctive action landmarks for state s.

Then h MHS (L) is an admissible estimate for s.

(9)

E2. Landmarks: RTG Landmarks & MHS Heuristic Minimum Hitting Set Heuristic

Hitting Set Heuristic: Discussion

I The hitting set heuristic is the best possible heuristic that only uses the given information. . .

I . . . but is NP-hard to compute.

I Use approximations that can be efficiently computed.

⇒ LP-relaxation, cost partitioning (both discussed later)

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 33 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Summary

E2.4 Summary

M. Helmert, G. R¨oger (Universit¨at Basel) Planning and Optimization November 16, 2020 34 / 35

E2. Landmarks: RTG Landmarks & MHS Heuristic Summary

Summary

I Fact landmark: atomic proposition that is true in each state path to a goal

I Disjunctive action landmark: set L of operators such that every plan uses some operator from L

I Relaxed task graphs allows efficient computation of landmarks I Hitting sets yield the most accurate heuristic for a given set of

disjunctive action landmarks

I Computation of minimal hitting set is NP-hard

Referenzen

ÄHNLICHE DOKUMENTE

Landmarks, network flows and potential heuristics are based on constraints that can be specified for a planning task.... Constraint-based Heuristics Multiple

some operator must be applied (action landmark) some atomic proposition must hold (fact landmark) some formula must be true (formula landmark).. → Derive heuristic estimate from

I The minimum hitting set over all cut landmarks is a perfect heuristic for delete-free planning tasks. I The LM-cut heuristic is an admissible heuristic based on

Same algorithm can be used for disjunctive action landmarks, where we also have a minimal saturated cost function. Definition (MSCF for Disjunctive

Same algorithm can be used for disjunctive action landmarks, where we also have a minimal saturated cost function. Definition (MSCF for Disjunctive

Asynchronous VI performs backups for individual states Different approaches lead to different backup orders Can significantly reduce computation. Guaranteed to converge if all

I Quality of given policy π can be computed (via LP or backward induction) or approximated arbitrarily closely (via iterative policy evaluation) in small SSPs or MDPs I Impossible

Quality of given policy π can be computed (via LP or backward induction) or approximated arbitrarily closely (via iterative policy evaluation) in small SSPs or MDPs Impossible