• Keine Ergebnisse gefunden

Planning and Optimization E2. Landmarks: RTG Landmarks & MHS Heuristic Malte Helmert and Gabriele R¨oger

N/A
N/A
Protected

Academic year: 2022

Aktie "Planning and Optimization E2. Landmarks: RTG Landmarks & MHS Heuristic Malte Helmert and Gabriele R¨oger"

Copied!
47
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

E2. Landmarks: RTG Landmarks & MHS Heuristic

Malte Helmert and Gabriele R¨oger

Universit¨at Basel

(2)

Content of this Course

Planning

Classical

Foundations Logic Heuristics Constraints

Probabilistic

Explicit MDPs Factored MDPs

(3)

Content of this Course: Constraints

Constraints

Landmarks RTG Landmarks

MHS Heuristic

LM-Cut Heuristic Cost

Partitioning Network

Flows Operator Counting

(4)

Landmarks

(5)

Landmarks

Basic Idea: Something that must happen in every solution For example

some operator must be applied (action landmark) some atomic proposition must hold (fact landmark) some formula must be true (formula landmark)

→Derive heuristic estimate from this kind of information.

We only consider fact and disjunctive action landmarks.

(6)

Landmarks

Basic Idea: Something that must happen in every solution For example

some operator must be applied (action landmark) some atomic proposition must hold (fact landmark) some formula must be true (formula landmark)

→Derive heuristic estimate from this kind of information.

We only considerfactanddisjunctive action landmarks.

(7)

Definition

Definition (Disjunctive Action Landmark)

Lets be a state of planning task Π =hV,I,O, γi.

Adisjunctive action landmarkfor s is a set of operatorsL⊆O such that every label path froms to a goal state contains an operator fromL.

Thecostof landmark Lis cost(L) = mino∈Lcost(o).

Definition (Fact Landmark)

Lets be a state of planning task Π =hV,I,O, γi.

An atomic propositionv =d for v ∈V andd ∈dom(v) is a fact landmarkfors if every state path from s to a goal state contains a states0 with s0(v) =d.

If we talk about landmarks for the initial state, we omit “forI”.

(8)

Landmarks: Example

Example

Consider a FDR planning taskhV,I,O, γi with V ={robot-at,dishes-at} with

dom(robot-at) ={A1, . . . ,C3,B4,A5, . . . ,B6}

dom(dishes-at) ={Table,Robot,Dishwasher}

I ={robot-at7→C1,dishes-at7→Table}

operators

move-x-y to move from cellx to adjacent celly pickup dishes, and

load dishes into the dishwasher.

γ = (robot-at=B6)∧(dishes-at= Dishwasher)

(9)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Fact Landmarks: Example

1 2 3 4 5 6

C B A

Images from wikimedia

Each fact in gray is a fact landmark:

robot-at= x for x∈ {A1,A6,B3,B4,B5,B6,C1}

dishes-at =x forx ∈ {Dishwasher,Robot,Table}

Dumym 2

(10)

Disjunctive Action Landmarks: Example

1 2 3 4 5 6

C B A

Actions of same color form disjunctive action landmark:

{pickup}

{load}

{move-B3-B4}

{move-B4-B5}

{move-A6-B6,move-B5-B6}

{move-A3-B3,move-B2-B3,move-C3-B3}

{move-B1-A1,move-A2-A1}

. . .

(11)

Remarks

Not every landmark is informative. Some examples:

The set of all operators is a disjunctive action landmark unless the initial state is already a goal state.

Every variable that is initially true is a fact landmark.

Deciding whether a given variable is a fact landmark is as hard as the plan existence problem.

Deciding whether a given operator set is a disjunctive action landmark is as hard as the plan existence problem.

Every fact landmark v that is initially false induces a disjunctive action landmark consisting of all operators that possibly makev true.

(12)

Landmarks from RTGs

(13)

Content of this Course: Constraints

Constraints

Landmarks RTG Landmarks

MHS Heuristic

LM-Cut Heuristic Cost

Partitioning Network

Flows Operator Counting

(14)

Computing Landmarks

How can we come up with landmarks?

Most landmarks are derived from therelaxed task graph:

RHW landmarks: Richter, Helmert & Westphal. Landmarks Revisited. (AAAI 2008)

LM-Cut: Helmert & Domshlak. Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? (ICAPS 2009) hm landmarks: Keyder, Richter & Helmert: Sound and Complete Landmarks for And/Or Graphs (ECAI 2010) We discusshm landmarksrestricted to m= 1

and to STRIPS planning tasks.

(15)

Incidental Landmarks: Example

Example (Incidental Landmarks)

Consider a STRIPS planning taskhV,I,{o1,o2}, γi with V ={a,b,c,d,e,f},

I ={a7→T,b 7→T,c 7→F,d 7→F,e 7→T,f 7→F}, o1 =h{a},{c,d,e},{a,b}i,

o2 =h{d,e},{f},{a,d}i, and γ ={e,f}.

Single solution: ho1,o2i

All variables are fact landmarks.

Variable b is initially true but irrelevant for the plan.

Variable c gets true as “side effect” ofo1 but it is not necessary for the goal or to make an operator applicable.

(16)

Causal Landmarks

Definition (Causal Fact Landmark)

Let Π =hV,I,O, γibe a STRIPS planning task.

An atomic propositionv =Tfor v ∈V is a causal fact landmark if v∈γ

or if for all goal pathsπ =ho1, . . . ,oni there is anoi with v ∈pre(oi).

(17)

Causal Landmarks: Example

Example (Causal Landmarks)

Consider a STRIPS planning taskhV,I,{o1,o2}, γi with V ={a,b,c,d,e,f},

I ={a7→T,b 7→T,c 7→F,d 7→F,e 7→T,f 7→F}, o1 =h{a},{c,d,e},{a,b}i,

o2 =h{d,e},{f},{a,d}i, and γ ={e,f}.

Single solution: ho1,o2i

All variables are fact landmarks for the initial state.

Only a,d,e andf are causal landmarks.

(18)

What We Are Doing Next

Causal landmarks are the desirable landmarks.

We can use a simplified version of RTGs to compute causal landmarks for STRIPS planning tasks.

We will define landmarks of AND/OR graphs, . . . and show how they can be computed.

Afterwards we establish that these are landmarks of the planning task.

(19)

Simplified Relaxed Task Graph

Definition

For a STRIPS planning task Π =hV,I,O, γi, thesimplified relaxed task graphsRTG(Π+) is theAND/OR graph hNand∪Nor,A,typei with

Nand={no |o ∈O} ∪ {vI,vG} with type(n) =∧for all n∈Nand, Nor={nv |v ∈V}

with type(n) =∨for all n∈Nor, and A={hna,noi |o∈O,a∈add(o)} ∪ E ={hno,npi |o ∈O,p ∈pre(o)} ∪ E ={hnv,nIi |v ∈I} ∪

E ={hnG,nvi |v ∈γ}

(20)

Simplified RTG: Example

The simplified RTG for our example task is:

a b

c

d

e f

I

o1 o2

G

(21)

Characterizing Equation System

Theorem

Let G =hN,A,typeibe an AND/OR graph. Consider the following system of equations:

LM(n) ={n} ∪ \

hn,n0i∈A

LM(n0) type(n) =∨ LM(n) ={n} ∪ [

hn,n0i∈A

LM(n0) type(n) =∧

The equation system has a unique maximal solution (maximal with regard to set inclusion), and for this solution it holds that

n0 ∈LM(n) iff n0 is a landmark for reaching n in G.

(22)

Computation of Maximal Solution

Theorem

Let G =hN,A,typeibe an AND/OR graph. Consider the following system of equations:

LM(n) ={n} ∪ \

hn,n0i∈A

LM(n0) type(n) =∨ LM(n) ={n} ∪ [

hn,n0i∈A

LM(n0) type(n) =∧

The equation system has a unique maximal solution (maximal with regard to set inclusion).

Computation: Initialize landmark sets asLM(n) =Nand∪Nor and Computation: apply equations as update rules until fixpoint.

(23)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,oI 1,o2 a-f,I,G,o1,o2

a,I b,I e,I

a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

(24)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

I

a,I b,I e,I

a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

Initialize with all nodes

(25)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

I

a,I b,I e,I

a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(I) ={I}

(26)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

I a,I

b,I e,I

a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(a) ={a} ∪LM(I)

(27)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

I

a,I b,I

e,I a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(b) ={b} ∪LM(I)

(28)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

I

a,I b,I e,I

a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(e) ={e} ∪(LM(I)∩LM(o1))

(29)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

I

a,I b,I e,I

a,I,o1

a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(o1) ={o1} ∪LM(a)

(30)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

I

a,I b,I e,I

a,I,o1

a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(c) ={c} ∪LM(o1)

(31)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

I

a,I b,I e,I

a,I,o1

a,c,I,o1

a,d,I,o1

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(d) ={d} ∪LM(o1)

(32)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

I

a,I b,I e,I

a,I,o1

a,c,I,o1

a,d,I,o1

a,d,e,I,o1,o2

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(o2) ={o2} ∪LM(d)∪LM(e)

(33)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

I

a,I b,I e,I

a,I,o1

a,c,I,o1

a,d,I,o1

a,d,e,I,o1,o2

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(f) ={f} ∪LM(o2)

(34)

Landmarks Landmarks from RTGs Minimum Hitting Set Heuristic Summary

Computation: Example

a b

c

d

e f

I

o1 o2

G

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

a-f,I,G,o1,o2 a-f,I,G,o1,o2

I

a,I b,I e,I

a,I,o1

a,c,I,o1

a,d,I,o1

a,d,e,I,o1,o2

a,d,e,f,I,o1,o2

a,d,e,f,I,G,o1,o2

LM(G) ={G} ∪LM(e)∪LM(f)

(35)

Relation to Planning Task Landmarks

Theorem

LetΠ =hV,I,O, γibe a STRIPS planning task and

letL be the set of landmarks for reaching nG in sRTG(Π+).

The set{v =T|v ∈V and nv ∈ L} is exactly the set of causal fact landmarksinΠ+.

For operators o∈O, if no ∈ Lthen{o} is a disjunctive action landmarkin Π+.

There are no other disjunctive action landmarks of size1.

(Proofs omitted.)

(36)

Computed RTG Landmarks: Example

Example (Computed RTG Landmarks)

Consider a STRIPS planning taskhV,I,{o1,o2}, γi with V ={a,b,c,d,e,f},

I ={a7→T,b 7→T,c 7→F,d 7→F,e 7→T,f 7→F}, o1 =h{a},{c,d,e},{a,b}i,

o2 =h{d,e},{f},{a,d}i, and γ ={e,f}.

LM(nG) ={a,d,e,f,I,G,o1,o2}

a,d,e,andf are causal fact landmarks of Π+.

{o1}and {o2} are disjunctive action landmarks of Π+.

(37)

Landmarks of Π

+

Are Landmarks of Π

Theorem

LetΠbe a STRIPS planning task.

All fact landmarks ofΠ+ are fact landmarks ofΠand all disjunctive action landmarks ofΠ+ are disjunctive action landmarks ofΠ.

Proof.

LetLbe a disjunctive action landmark of Π+ andπ be a plan for Π. Thenπ is also a plan for Π+ and, thus, π contains an operator fromL.

Letf be a fact landmark of Π+. If f is already true in the initial state, then it is also a landmark of Π. Otherwise, every plan for Π+ contains an operator that addsf and the set of all these operators is a disjunctive action landmark of Π+. Therefore, also each plan of Π contains such an operator, makingf a fact landmark of Π.

(38)

Minimum Hitting Set Heuristic

(39)

Content of this Course: Constraints

Constraints

Landmarks RTG Landmarks

MHS Heuristic

LM-Cut Heuristic Cost

Partitioning Network

Flows Operator Counting Potential Heuristics

(40)

Exploiting Disjunctive Action Landmarks

The cost cost(L) of a disjunctive action landmarkLis an admissible heuristic, but it is usually not very informative.

Landmark heuristics typically aim to combine multiple disjunctive action landmarks.

How can we exploit a given setL of disjunctive action landmarks?

Sum of costsP

L∈Lcost(L)?

not admissible!

Maximize costs maxL∈Lcost(L)?

usually very weak heuristic better: Hitting sets

(41)

Hitting Sets

Definition (Hitting Set)

LetX be a set,F ={F1, . . . ,Fn} ⊆2X be a family of subsets of X andc :X →R+0 be a cost function for X.

Ahitting setis a subsetH⊆X that “hits” all subsets in F, i.e., H∩F 6=∅ for allF ∈ F. The costofH isP

x∈Hc(x).

Aminimum hitting set (MHS)is a hitting set with minimal cost.

MHS is a “classical” NP-complete problem (Karp, 1972)

(42)

Example: Hitting Sets

Example

X ={o1,o2,o3,o4}

F={{o4},{o1,o2},{o1,o3},{o2,o3}}

c(o1) = 3, c(o2) = 4, c(o3) = 5, c(o4) = 0 What is a minimum hitting set?

Solution: {o1,o2,o4} with cost 3 + 4 + 0 = 7

(43)

Example: Hitting Sets

Example

X ={o1,o2,o3,o4}

F={{o4},{o1,o2},{o1,o3},{o2,o3}}

c(o1) = 3, c(o2) = 4, c(o3) = 5, c(o4) = 0 What is a minimum hitting set?

Solution: {o1,o2,o4} with cost 3 + 4 + 0 = 7

(44)

Hitting Sets for Disjunctive Action Landmarks

Idea: disjunctive action landmarks are interpreted as Idea: instance of minimum hitting set

Definition (Hitting Set Heuristic)

LetL be a set of disjunctive action landmarks. The hitting set heuristichMHS(L) is defined as the cost of a minimum hitting set forL with c(o) =cost(o).

Proposition (Hitting Set Heuristic is Admissible)

LetL be a set of disjunctive action landmarks for state s.

Then hMHS(L)is an admissible estimate for s.

(45)

Hitting Set Heuristic: Discussion

The hitting set heuristic is the best possibleheuristic that only uses the given information. . .

. . . but is NP-hard to compute.

Use approximations that can be efficiently computed.

⇒ LP-relaxation,cost partitioning (both discussed later)

(46)

Summary

(47)

Summary

Fact landmark: atomic proposition that is true in each state path to a goal

Disjunctive action landmark: setL of operators such that every plan uses some operator from L

Relaxed task graphsallows efficient computation of landmarks Hitting setsyield the most accurate heuristic for a given set of disjunctive action landmarks

Computation of minimal hitting set is NP-hard

Referenzen

ÄHNLICHE DOKUMENTE

define a limit N on the number of states allowed in each factor in each iteration, select two factors we would like to merge merge them if this does not exhaust the state number

Landmarks, network flows and potential heuristics are based on constraints that can be specified for a planning task.... Constraint-based Heuristics Multiple

I The minimum hitting set over all cut landmarks is a perfect heuristic for delete-free planning tasks. I The LM-cut heuristic is an admissible heuristic based on

Same algorithm can be used for disjunctive action landmarks, where we also have a minimal saturated cost function. Definition (MSCF for Disjunctive

Same algorithm can be used for disjunctive action landmarks, where we also have a minimal saturated cost function. Definition (MSCF for Disjunctive

Asynchronous VI performs backups for individual states Different approaches lead to different backup orders Can significantly reduce computation. Guaranteed to converge if all

I Quality of given policy π can be computed (via LP or backward induction) or approximated arbitrarily closely (via iterative policy evaluation) in small SSPs or MDPs I Impossible

Quality of given policy π can be computed (via LP or backward induction) or approximated arbitrarily closely (via iterative policy evaluation) in small SSPs or MDPs Impossible