• Keine Ergebnisse gefunden

Foundations of Artificial Intelligence 38. Automated Planning: Landmarks Malte Helmert

N/A
N/A
Protected

Academic year: 2022

Aktie "Foundations of Artificial Intelligence 38. Automated Planning: Landmarks Malte Helmert"

Copied!
30
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

38. Automated Planning: Landmarks

Malte Helmert

University of Basel

May 10, 2021

(2)

Planning Heuristics

We discussthree basic ideasfor general heuristics:

Delete Relaxation Abstraction

Landmarks this and next chapter

Basic Idea: Landmarks

landmark = something (e.g., an action) that must be part of every solution

Estimate solution costs based on unachieved landmarks.

(3)

Planning Heuristics

We discussthree basic ideasfor general heuristics:

Delete Relaxation Abstraction

Landmarks this and next chapter

Basic Idea: Landmarks

landmark= something (e.g., an action) that must be part ofevery solution

Estimate solution costs based on unachieved landmarks.

(4)

Automated Planning: Overview

Chapter overview: automated planning 33. Introduction

34. Planning Formalisms

35.–36. Planning Heuristics: Delete Relaxation 37. Planning Heuristics: Abstraction

38.–39. Planning Heuristics: Landmarks 38. Landmarks

39. Landmark Heuristics

(5)

Delete Relaxation

(6)

Landmarks and Delete Relaxation

In this chapter, we discuss a further technique to compute planning heuristics: landmarks.

We restrict ourselves to delete-freeplanning tasks:

For a STRIPS task Π, we compute its delete relaxed task Π+, and then apply landmark heuristics on Π+.

Hence the objective of our landmark heuristics is to approximate theoptimal delete relaxed heuristic h+ as accurately as possible.

More advanced landmark techniques work directly on general planning tasks.

German: Landmarke

(7)

Delete-Free STRIPS planning tasks

reminder:

Definition (delete-free STRIPS planning task)

Adelete-free STRIPS planning taskis a 4-tuple Π+=hV,I,G,Ai with the following components:

V: finite set ofstate variables I ⊆V: the initial state G ⊆V: the set of goals

A: finite set ofactions, where for every a∈A, we define pre(a)V: its preconditions

add(a)V: itsadd effects cost(a)N0: itscost

denoted aspre(a)−−−→cost(a) add(a) (omitting set braces)

(8)

Delete-Free STRIPS Planning Task in Normal Form

A delete-free STRIPS planning taskhV,I,G,Ai is innormal form if

I consists of exactly one element i: I ={i}

G consists of exactly one elementg: G ={g}

Every action has at least one precondition.

German: Normalform

Every task can easily be transformed

into an equivalent task in normal form. (How?) In the following, we assume tasks in normal form.

Describing Asuffices to describe overall task:

V are the variables mentioned in A’s actions.

alwaysI ={i} andG={g} In the following, we only describe A.

(9)

Example: Delete-Free Planning Task in Normal Form

Example actions:

a1 =i −→3 x,y a2 =i −→4 x,z a3 =i −→5 y,z a4 =x,y,z −→0 g

optimal solution?

(10)

Example: Delete-Free Planning Task in Normal Form

Example actions:

a1 =i −→3 x,y a2 =i −→4 x,z a3 =i −→5 y,z a4 =x,y,z −→0 g

optimal solution to reach{g}from{i}:

plan: a1,a2,a4

cost: 3 + 4 + 0 = 7 (=h+({i}) because plan is optimal)

(11)

Landmarks

(12)

Landmarks

Definition (landmark)

Alandmarkof a planning task Π is a set of actionsL such thatevery plan must contain an action fromL.

Thecostof a landmarkL,cost(L) is defined as mina∈Lcost(a).

landmark cost corresponds to (very simple) admissible heuristic Speaking more strictly, landmarks as considered in this course are called disjunctive action landmarks.

other kinds of landmarks exist

(fact landmarks, formula landmarks, . . . )

German: disjunktive Aktionslandmarke, Faktlandmarke, Formellandmarke

(13)

Example: Landmarks

Example actions:

a1 =i −→3 x,y a2 =i −→4 x,z a3 =i −→5 y,z a4 =x,y,z −→0 g

landmark examples?

(14)

Example: Landmarks

Example actions:

a1 =i −→3 x,y a2 =i −→4 x,z a3 =i −→5 y,z a4 =x,y,z −→0 g

some landmarks:

A={a4} (cost 0) B ={a1,a2}(cost 3) C ={a1,a3}(cost 3) D ={a2,a3} (cost 4)

also: {a1,a2,a3} (cost 3),{a1,a2,a4} (cost 0), . . .

(15)

Overview: Landmarks

in the following:

exploiting landmarks:

How can we compute an accurate heuristic for a given set of landmarks?

this chapter finding landmarks:

How can we find landmarks?

next chapter LM-cut heuristic:

an algorithm to find landmarks and exploit them as heuristic next chapter

(16)

Exploiting Landmarks

(17)

Exploiting Landmarks

Assume the set of landmarksL={A,B,C,D}.

How touseL for computing heuristics?

sumthe costs: 0 + 3 + 3 + 4 = 10 not admissible!

maximizethe costs: max{0,3,3,4}= 4 usually yields a weak heuristic better: hitting setsor cost partitioning German: Hitting-Set, Kostenpartitionierung

(18)

Hitting Sets

Definition (hitting set)

given: finitesupport set X,family of subsets F ⊆2X, costc :X →R+0

hitting set:

subsetH ⊆X that “hits” all subsets inF:

H∩S 6=∅for all S ∈ F costof H: P

x∈Hc(x)

minimumhitting set (MHS):

hitting set with minimal cost

“classical” NP-complete problem (Karp, 1972)

(19)

Example: Hitting Sets

Example

X ={a1,a2,a3,a4} F={A,B,C,D}

withA={a4}, B ={a1,a2}, C ={a1,a3}, D ={a2,a3} c(a1) = 3, c(a2) = 4, c(a3) = 5, c(a4) = 0

minimum hitting set: {a1,a2,a4}with cost 3 + 4 + 0 = 7

(20)

Example: Hitting Sets

Example

X ={a1,a2,a3,a4} F={A,B,C,D}

withA={a4}, B ={a1,a2}, C ={a1,a3}, D ={a2,a3} c(a1) = 3, c(a2) = 4, c(a3) = 5, c(a4) = 0

minimum hitting set: {a1,a2,a4}with cost 3 + 4 + 0 = 7

(21)

Hitting Sets for Landmarks

idea: landmarksare interpreted as instance of minimum hitting set Definition (hitting set heuristic)

LetL be a set of landmarks for a delete-free planning task in normal form with actionsA, action costs costand initial stateI. Thehitting set heuristichMHS(I)is defined as the minimal solution cost for the minimum hitting set instance with support setA, family of subsetsL and costs cost.

Proposition (Hitting Set Heuristic is Admissible) The minimum hitting set heuristic hMHS is admissible.

Why?

(22)

Approximation of h

MHS

As computing minimal hitting sets is NP-hard, we want to approximatehMHS in polynomial time.

Optimal Cost Partitioning (Karpas & Domshlak, 2009) idea: Construct a linear program (LP) forL.

rows (constraints) correspond to actions columns (variables) correspond to landmarks

entries: 1 if row action is contained in column landmark;

0 otherwise

objective: maximize sum of variables heuristic valuehOCP (optimal cost partitioning):

objective value of LP

(23)

Approximation of h

MHS

As computing minimal hitting sets is NP-hard, we want to approximatehMHS in polynomial time.

Optimal Cost Partitioning (Karpas & Domshlak, 2009) idea: Construct alinear program (LP) forL.

rows(constraints) correspond toactions columns(variables) correspond tolandmarks

entries: 1 if row action is contained in column landmark;

0 otherwise

objective: maximize sum of variables heuristic valuehOCP (optimal cost partitioning):

objective value of LP

(24)

Example: Optimal Cost Partitioning

Example

cost(a1) = 3, cost(a2) = 4, cost(a3) = 5, cost(a4) = 0 L={A,B,C,D}

withA={a4}, B ={a1,a2}, C ={a1,a3}, D ={a2,a3} LP:maximizea+b+c+d subject toa,b,c,d ≥0 and

b + c ≤ 3

b + d ≤ 4

c + d ≤ 5

a ≤ 0

solution: a= 0, b = 1, c = 2, d = 3 hOCP(I) = 6

(25)

Example: Optimal Cost Partitioning

Example

cost(a1) = 3, cost(a2) = 4, cost(a3) = 5, cost(a4) = 0 L={A,B,C,D}

withA={a4}, B ={a1,a2}, C ={a1,a3}, D ={a2,a3} LP:maximizea+b+c+d subject toa,b,c,d ≥0 and

b + c ≤ 3 a1

b + d ≤ 4 a2

c + d ≤ 5 a3

a ≤ 0 a4

A B C D

solution: a= 0, b = 1, c = 2, d = 3 hOCP(I) = 6

(26)

Example: Optimal Cost Partitioning

Example

cost(a1) = 3, cost(a2) = 4, cost(a3) = 5, cost(a4) = 0 L={A,B,C,D}

withA={a4}, B ={a1,a2}, C ={a1,a3}, D ={a2,a3} LP:maximizea+b+c+d subject toa,b,c,d ≥0 and

b + c ≤ 3 a1

b + d ≤ 4 a2

c + d ≤ 5 a3

a ≤ 0 a4

A B C D

solution: ? a= 0, b = 1, c = 2, d = 3 hOCP(I) = 6

(27)

Example: Optimal Cost Partitioning

Example

cost(a1) = 3, cost(a2) = 4, cost(a3) = 5, cost(a4) = 0 L={A,B,C,D}

withA={a4}, B ={a1,a2}, C ={a1,a3}, D ={a2,a3} LP:maximizea+b+c+d subject toa,b,c,d ≥0 and

b + c ≤ 3 a1

b + d ≤ 4 a2

c + d ≤ 5 a3

a ≤ 0 a4

A B C D

solution: a= 0, b = 1, c = 2, d = 3 hOCP(I) = 6

(28)

Relationship of Heuristics

Proposition (hOCP vs. hMHS)

LetL be a set of landmarks for a planning task with initial state I . ThenhOCP(I)≤hMHS(I)≤h+(I)

The heuristichOCP can be computed in polynomial time because linear programs can be solved in polynomial time.

(29)

Summary

(30)

Summary

Landmarks are action sets such that every plan must contain at least one of the actions.

Hitting sets yield the most accurate heuristic for a given set of landmarks, but the computation is NP-hard.

Optimal cost partitioning is a polynomial approach for the computation of informative landmark heuristics.

Referenzen

ÄHNLICHE DOKUMENTE

We have already seen this algorithm: Backtracking corresponds to depth-first search Chapter 12 with the following state space: states: partial assignments initial state:

Propositional formulas combine atomic formulas with ¬, ∧, ∨, → to more complex statements. Interpretations determine which atomic formulas are true and which ones

if splitting rule applied, then current formula satisfiable, and if a wrong decision is taken, then this will be recognized without applying further splitting rules (i.e., only

compact description of state space as input to algorithms state spaces exponentially larger than the input algorithms directly operate on compact description allows automatic

I Hitting sets over all cut landmarks yield a perfect heuristic for delete-free planning tasks. I The LM-cut heuristic is an admissible heuristic based on

As in classical search problems, the number of positions of (interesting) board games is huge:. Chess: roughly 10 40

default policy simulates a game to obtain utility estimate default policy must be evaluated in many positions if default policy is expensive to compute,. simulations

initialize first RL policy network to SL policy network in each iteration, pick a former RL policy network uniformly randomly prevents overfitting to the current policy play with