Automatic Creation of Search Heuristics

(1)

Automatic Creation of Search Heuristics

Stefan Edelkamp

(2)

1 Overview

- Automatic Creation of Heuristics - Macro Problem Solving

- Hierarchical A* and Voltorta’s Theorem - Pattern Databases

- Disjoint Pattern Databases

- Multiple, Bounded, Symmetrical, Dual, and Secondary Databases

Overview 1

(3)

2 History

History on automated creation of admissible heuristics: Gaschnik (1979), Pearl (1984), Prititis (1993), Guida and Somalvico (1979)

- Korf: Macro Problem Solver, Learning Real-Time A*

- Valtorta: A result on complexity of heuristic estimates for A*

- Holte et al.’s Hierarchical A*: Searching Abstraction Hierarchies Efficiently - Recent work: Pattern/Abstraction Databases

On-line computation in Hierarchical A* and its precursors probably main difference from off-line calculations that are applied in construction of pattern databases

History 2

(4)

3 Macro Problem Solving

Macro Problem Solver constructs a table that contains macros solving subproblems Search: Solver looks at table entries to sequentially improve current state to goal Eight-Puzzle example: operators labeled by direction blank is moving

Table Structure: entry in row r and column c denotes operator sequence to move tile in position r into c

Invariance: after executing macro, tiles in position 1 to r −1 remain correctly placed

Macro Problem Solving 3

(5)

Macro Table for the Eight-Puzzle

0 1 2 3 4 5 6

0

1 DR

2 D LURD

3 DL URDL URDL LURD

4 L RULD RULD LURRD

LURD LULDR

5 UL DRUL RDLU RULD RDLU DLUR RULD RDLU

ULDR URDL

6 U DLUR DRU DLUU DRUL LURRD

ULDR ULD RDRU DLURU

LLDR LLDR

7 UR LDRU ULDDR LDRUL DLUR DRULDL DLUR ULDR ULURD URDRU DRUL URRDLU

LLDR

8 R ULDR LDRR LURDR LDRRUL DRUL LDRU

UULD ULLDR LDRU

RDLU

(6)

Running the Table

For tile 1 ≤ i ≤ 6, determine current position c and goal r and apply macro (c, r)

1 4

6 3

8 7

5 2

c=0, r=5 c=1, r=2 c=2, r=7 c=3, r=3

c=4, r=4 c=5, r=7 c=6, r=7

6 1 3

8 7

2 2 5

3

3 3

3 1 2 3

8 7

4 8

1 2 4 5 6 7 1 2

5 8 7

6 4

1 2 4 5 8 7 6

4 5

1 8 3 4 7 6

1 2

5 7

4 8 6

6 5

DRUL DLUR DLUR RDLU

ULDDR ULURD LURD

UL

worst case solution (sum column maxima): 2 + 12 + 10 + 14 + 8 + 14 + 4 = 64 average: 12/9 + 52/8 + 40/7 + 58/6 + 22/5 + 38/4 + 8/3 = 39.78

(7)

Construction

. . . with Backward-DFS or Backward-BFS starting from the set of goals - backward operators m⁻¹ do not necessarily need to be valid

- given vector representation of current position p = (p₀, . . . , p_k), then m⁻¹ can be reached from goal position p⁰ = (p⁰₀, . . . , p⁰_k)

- Column c(m) of m, which transforms p into p⁰: length of longest common prefix of p and p⁰, i.e.,

c(m) = min{i ∈ {0, . . . , k − 1} | p_i 6= p⁰_i}

- Row r(m) of macro m: position on which p_c(m) is located, which has to be moved to c(m) in next macro application

(8)

Example

- m⁻¹ =LDRU alters goal position p⁰ = (0,1,2,3,4,5,6,7,8) into p = (0,1,2, 3,4,5,8,6,7)

- its inverse m is DLUR

⇒ c(m) = 6 and r(m) = 7, matching last macro application in table

- larger problems: BFS exhaust memory resources before table entry fixed - larger tables require pattern database heuristic search

(9)

4 Patterns and Domain Abstraction

- pattern: refers to vector representation v(u), each position i contains an assignment to variable v_i, i ∈ {1, . . . , k}

- specialized pattern: state with one or more constants replaced by don’t cares - generalized pattern: each variable v_i with domain D_i is mapped to abstract domain A_i, i ∈ {1, . . . , k}

- start and goal pattern: making same substitutions in start and goal states

- domain abstraction: mapping stating which assignment to replace by which other

Patterns and Domain Abstraction 8

(10)

Two Examples in Eight-Puzzle

1. Tiles 1, 2, 7 replaced by don’t care x

⇒ φ₁(v) = v⁰ with v_i⁰ = v_i if v_i ∈ {0,3,4,5,6,8}, and v_i = x, otherwise 2. φ₂: also map tiles 3 and 4 to y, and tiles 6 and 8 to z

Granularity: vector indicating how many constants in the original domain are mapped to each constant in the abstract domain

⇒ gran (φ₂) = (3,2,2,1,1) - 3 constants are mapped to x

- 2 are mapped to each of y and z

- constants 5 and 0 (the blank) remain unique

Patterns and Domain Abstraction 9

(11)

5 Embeddings and Homomorphisms

• embedding: earliest and most commonly studied type of abstraction transformation

- informally, φ embedding transformation if it adds edges to S - E.g., macro-operators, dropped preconditions

• homomorphism: other main type of abstraction transformation

- informally, homomorphism φ groups together several states in S to create single abstract state

- E.g., drop predicate entirely from state space description (Knoblock 1994)

Embeddings and Homomorphisms 10

(12)

Hierarchical A*

Abstraction works by replacing one state space by another that is easier to search Hierarchical A* is an versions that computes distances of an abstract to the abstract goal on-the-fly, by means for each node that is expanded

- different to earlier approaches (exploring abstract space from scratch),

Hierarchical A* uses caching to avoid repeated expansion of states in abstract space

- restricts to state space embedding that are homomorphisms

(13)

Voltorta Theorem

Theorem If state space S is embedded in S⁰ and h is computed by blind BFS in v then A* using h will expand every state that is expanded by BFS

⇒ by re-computing heuristic estimates for each state this option cannot possibly speed-up search

- Absolver II: 1st system to break this barrier - Hierarchical A*: subsequent one

(14)

Example

N × N grid, abstracted by ignoring 2nd coordinate - goal state is (N, 1)

- initial state (1,1)

Theorem of Voltorta ⇒ A* expands Ω(N²) nodes

Main Observation: Search for h(s) also generates value of h(u), ∀u ∈ S⁰ - abstraction yields a perfect heuristic on solution path

- Hierarchical A* will expand optimum of O(N) nodes

(15)

Proof of Voltorta’s Theorem

When A* terminates, u closed, open, or unvisited u closed ⇒ it will have been expanded

u open ⇒ h_φ(u) must have been computed - h_φ(u) computed by search starting at φ(u) - φ(u) ∈/ φ(T) ⇒ 1st step is to expand φ(u)

- φ(u) ∈ φ(T) ⇒ h_φ(u) = 0, and u itself is necessarily expanded

u unvisited ⇒ ∀ paths from s to u, ∃ never expanded state added to Open Let w be any such state on shortest path from s to u

- w opened ⇒ h_φ(w) must have been computed

(16)

Proof (ctd)

To show: in computing h_φ(w), φ(u) expanded

- u necessarily expanded by BFS ⇒ δ(s, u) < δ(s, T) - w on shortest path ⇒ δ(s, w) + δ(w, u) < δ(s, T)

- M never expanded by A* ⇒ δ(s, w) + h_φ(w) ≥ δ(s, T)

- combining the two inequalities: δ(w, u) < h_φ(w) = δ_φ(w, T) - homomorphism: δ_φ(w, u) ≤ δ(w, u) ⇒ δ_φ(w, u) < δ(w, T)

⇒ φ(u) necessarily expanded

(17)

Consistency

Theorem h_φ is consistent

h_φ consistent ⇒ ∀u, v ∈ S: h_φ(u) ≤ δ(u, , v) + h_φ(v)

- δ_φ(u, T) shortest path ⇒ δ_φ(u, T) ≤ δ_φ(u, v) + δ_φ(v, T) for all u and v - substituting h_φ: h_φ(u) ≤ δ_φ(u, v) + h_φ(v) for all u and v

- homomorphism: δ_φ(u, v) ≤ δ(u, v) ⇒ h_φ(u) ≤ δ(u, v) + h_φ(v) for all u and v

(18)

6 Pattern Databases

Name inspired by (n² − 1)-Puzzle, where pattern is selection of tiles

Pattern database: stores all pattern together with their shortest path distance on simplified board to the pattern for goal

Construction PDB: prior to overall search in a Backward BFS starting with goal pattern and using inverse abstract state transitions

Search in original space: pattern selected in active state with stored distance value as estimator function

Pattern Databases 17

(19)

Example ( n

²

− 1) -Puzzle

fringe and the corner pattern (databases):

11 3 7

15 14

13

12 12 13 14 15

8 9 10

Multiple pattern databases:

(20)

Maximizing Pattern Databases

Shortest path distance in pattern space ≤ shortest path distance in the original one

⇒ pattern databases heuristics are admissible Combined pattern databases:

- take maximum of heuristic values provided by different databases - use result as admissible heuristic

⇒ optimal solutions for random instances to Rubik’s Cube

(21)

Disjoint Pattern Databases

For sliding-tile puzzles only one tile can move at a time ⇒ disjoint pattern databases count moves of pattern tiles only

General Problem: different pattern databases may count operators twice, since an operator can have a non-trivial image under more than one relaxation

Assumption: Two pattern databases D_φ₁ and D_φ₂ are disjoint, if for all non-trivial O⁰ ∈ O_φ₁, O⁰⁰ ∈ O_φ₂ we have φ⁻¹₁ (O⁰) ∩ φ⁻¹₂ (O⁰⁰) = ∅

Finding partitions for pairwise disjoint pattern databases automatically not trivial

⇒ assign 1 to each operator only in 1 relaxation ⇒ sum of retrieved pattern database values preserves admissibility, while being more accurate

(22)

Automated Pattern Selection

Simplify the problem of finding a suitable partition to bin-packing

Task: distribute pattern position (tile) to bins in such a way that a minimal number of bins is used.

Size of the bins: determined by maximum size of abstract state space, which is to be approximated

Adding a position to the pattern: multiplication of domain size to the expected abstract state size

⇒ bin-packing based on multiplying the individual object sizes (for addition use logarithms)

Bin-packing is NP complete but has several efficient approximations

(23)

Korf’s Conjecture

- n: # states in entire problem space - b: brute-force branching factor

- d: average optimal solution length for a random problem instance - e: expected value of heuristic

- m: amount of memory used, in terms of abstract states stored - t: in # generated nodes in A* (without duplicate detection)

Estimated average optimal solution length d of random instance (depth to which A*

must search): d ≈ log_b n

Furthermore: e ≈ log_b m (abstract space) and t ≈ b^d−e Substituting values for d and e into this formula gives:

t ≈ b^d−e ≈ b^log^b ^n−log^b ^m = n/m

(24)

Multiple Pattern Databases

Observation: Maximized smaller databases reduces # nodes generated better Example Eight-puzzle: 20 pattern databases of size 252 perform less state

expansions (318) then 1 pattern database of size 5,040 (2,160 state expansions) 1. Smaller pattern databases reduces # patterns with high h-values, but

maximization of smaller pattern databases can make the number of patterns with low h-values significantly larger than in the larger pattern database

2. Eliminating low h values more important for improving search performance than retaining high h-values

(25)

On-Demand Pattern Databases

Secondary A* PDB construction ———— Need for on-demand extension

abstract space A*

t −1

s s’

s’ (A*) t’

A*

(A*)⁻¹ t’

original space

abstract space

(26)

Symmetrical and Dual Lookups in Pattern Databases

Symmetry PDB: exploit physical symmetries

(n² − 1)-Puzzle: symmetry about the main diagonal in the (n² − 1)-Puzzle ⇒ PDB for 2, 3, 6, and 7 reused for pattern 8, 9, 12, and 13

Dual PDB: symmetry between objects and locations

inverse 10

3 7

2 3

6 7

1 2 3

4 5 6 7

12

8 9 10 11

13 14 15 15

2

9 14 13

7

12 8

6 2 9

2

1 7 6

14 13 12

9 14 13 12 3

3

6 5 11

4

2

7 6 3

original abstract

dual

abstract

(27)

7 Bounded Computation of Pattern Database

Theorem U upper bound on δ(s, T), φ : abstraction function

f: cost function in backward traversal of abstract space

⇒ pattern database heuristic only needs to be computed if f(φ(u)) < U Proof: f(φ(u)) ≤ δ_φ(s, T) ≤ δ(s, T) ≤ U ⇒

all φ(u) with f(φ(u)) > U cannot lead to any better solution with cost ≤ U

⇒ ignore u

Bounded Computation of Pattern Database 26

(28)

8 Planning Pattern Databases

An abstract planning problem P|_R = < S|_R,O|_R,I|_R,G|_R > of a propositional planning problem < S,O,I,G > wrt. set of atoms R is defined by

1. S|_R = {S|_R= S ∩ R | S ∈ S}, 2. T|_R = {G|_R | G ∈ G},

3. O|_R = {O|_R | O ∈ O}, with O|_R = (P|_R, A|_R, D|_R) π_R: solutions for abstract planning problem P|_R

δ_R: optimal abstract plan length

Planning Pattern Databases 27

(29)

Pattern Databases in AI Planning

Planning pattern database D_R (wrt. a set of propositions R and a propositional planning problem < S,O,I,G >):

collection of pairs (h, u) with u ∈ S|_R such that h = δ_R(u), set

D_R = {(δ_R(u), u) | u ∈ S|_R}

optimal abstract plan π_R^opt for P|_R always shorter than an optimal concrete plan π^opt for P, i.e., δ_R(u|_R) ≤ δ(u), for all u ∈ S

Remark: Strict inequality δ_R(u|_R) < δ(u): some abstract operators are void, or we have alternative even shorter paths in abstract space

Planning Pattern Databases 28