Automatic Creation of Search Heuristics
Stefan Edelkamp
1 Overview
- Automatic Creation of Heuristics - Macro Problem Solving
- Hierarchical A* and Voltorta’s Theorem - Pattern Databases
- Disjoint Pattern Databases
- Multiple, Bounded, Symmetrical, Dual, and Secondary Databases
Overview 1
2 History
History on automated creation of admissible heuristics: Gaschnik (1979), Pearl (1984), Prititis (1993), Guida and Somalvico (1979)
- Korf: Macro Problem Solver, Learning Real-Time A*
- Valtorta: A result on complexity of heuristic estimates for A*
- Holte et al.’s Hierarchical A*: Searching Abstraction Hierarchies Efficiently - Recent work: Pattern/Abstraction Databases
On-line computation in Hierarchical A* and its precursors probably main difference from off-line calculations that are applied in construction of pattern databases
History 2
3 Macro Problem Solving
Macro Problem Solver constructs a table that contains macros solving subproblems Search: Solver looks at table entries to sequentially improve current state to goal Eight-Puzzle example: operators labeled by direction blank is moving
Table Structure: entry in row r and column c denotes operator sequence to move tile in position r into c
Invariance: after executing macro, tiles in position 1 to r −1 remain correctly placed
Macro Problem Solving 3
Macro Table for the Eight-Puzzle
0 1 2 3 4 5 6
0
1 DR
2 D LURD
3 DL URDL URDL LURD
4 L RULD RULD LURRD
LURD LULDR
5 UL DRUL RDLU RULD RDLU DLUR RULD RDLU
ULDR URDL
6 U DLUR DRU DLUU DRUL LURRD
ULDR ULD RDRU DLURU
LLDR LLDR
7 UR LDRU ULDDR LDRUL DLUR DRULDL DLUR ULDR ULURD URDRU DRUL URRDLU
LLDR
8 R ULDR LDRR LURDR LDRRUL DRUL LDRU
UULD ULLDR LDRU
RDLU
Macro Problem Solving 4
Running the Table
For tile 1 ≤ i ≤ 6, determine current position c and goal r and apply macro (c, r)
1 4
6 3
8 7
5 2
c=0, r=5 c=1, r=2 c=2, r=7 c=3, r=3
c=4, r=4 c=5, r=7 c=6, r=7
6 1 3
8 7
2 2 5
3
3 3
3 1 2 3
8 7
4 8
1 2 4 5 6 7 1 2
5 8 7
6 4
1 2 4 5 8 7 6
4 5
1 8 3 4 7 6
1 2
5 7
4 8 6
6 5
DRUL DLUR DLUR RDLU
ULDDR ULURD LURD
UL
worst case solution (sum column maxima): 2 + 12 + 10 + 14 + 8 + 14 + 4 = 64 average: 12/9 + 52/8 + 40/7 + 58/6 + 22/5 + 38/4 + 8/3 = 39.78
Macro Problem Solving 5
Construction
. . . with Backward-DFS or Backward-BFS starting from the set of goals - backward operators m−1 do not necessarily need to be valid
- given vector representation of current position p = (p0, . . . , pk), then m−1 can be reached from goal position p0 = (p00, . . . , p0k)
- Column c(m) of m, which transforms p into p0: length of longest common prefix of p and p0, i.e.,
c(m) = min{i ∈ {0, . . . , k − 1} | pi 6= p0i}
- Row r(m) of macro m: position on which pc(m) is located, which has to be moved to c(m) in next macro application
Macro Problem Solving 6
Example
- m−1 =LDRU alters goal position p0 = (0,1,2,3,4,5,6,7,8) into p = (0,1,2, 3,4,5,8,6,7)
- its inverse m is DLUR
⇒ c(m) = 6 and r(m) = 7, matching last macro application in table
- larger problems: BFS exhaust memory resources before table entry fixed - larger tables require pattern database heuristic search
Macro Problem Solving 7
4 Patterns and Domain Abstraction
- pattern: refers to vector representation v(u), each position i contains an assignment to variable vi, i ∈ {1, . . . , k}
- specialized pattern: state with one or more constants replaced by don’t cares - generalized pattern: each variable vi with domain Di is mapped to abstract domain Ai, i ∈ {1, . . . , k}
- start and goal pattern: making same substitutions in start and goal states
- domain abstraction: mapping stating which assignment to replace by which other
Patterns and Domain Abstraction 8
Two Examples in Eight-Puzzle
1. Tiles 1, 2, 7 replaced by don’t care x
⇒ φ1(v) = v0 with vi0 = vi if vi ∈ {0,3,4,5,6,8}, and vi = x, otherwise 2. φ2: also map tiles 3 and 4 to y, and tiles 6 and 8 to z
Granularity: vector indicating how many constants in the original domain are mapped to each constant in the abstract domain
⇒ gran (φ2) = (3,2,2,1,1) - 3 constants are mapped to x
- 2 are mapped to each of y and z
- constants 5 and 0 (the blank) remain unique
Patterns and Domain Abstraction 9
5 Embeddings and Homomorphisms
• embedding: earliest and most commonly studied type of abstraction transformation
- informally, φ embedding transformation if it adds edges to S - E.g., macro-operators, dropped preconditions
• homomorphism: other main type of abstraction transformation
- informally, homomorphism φ groups together several states in S to create single abstract state
- E.g., drop predicate entirely from state space description (Knoblock 1994)
Embeddings and Homomorphisms 10
Hierarchical A*
Abstraction works by replacing one state space by another that is easier to search Hierarchical A* is an versions that computes distances of an abstract to the abstract goal on-the-fly, by means for each node that is expanded
- different to earlier approaches (exploring abstract space from scratch),
Hierarchical A* uses caching to avoid repeated expansion of states in abstract space
- restricts to state space embedding that are homomorphisms
Embeddings and Homomorphisms 11
Voltorta Theorem
Theorem If state space S is embedded in S0 and h is computed by blind BFS in v then A* using h will expand every state that is expanded by BFS
⇒ by re-computing heuristic estimates for each state this option cannot possibly speed-up search
- Absolver II: 1st system to break this barrier - Hierarchical A*: subsequent one
Embeddings and Homomorphisms 12
Example
N × N grid, abstracted by ignoring 2nd coordinate - goal state is (N, 1)
- initial state (1,1)
Theorem of Voltorta ⇒ A* expands Ω(N2) nodes
Main Observation: Search for h(s) also generates value of h(u), ∀u ∈ S0 - abstraction yields a perfect heuristic on solution path
- Hierarchical A* will expand optimum of O(N) nodes
Embeddings and Homomorphisms 13
Proof of Voltorta’s Theorem
When A* terminates, u closed, open, or unvisited u closed ⇒ it will have been expanded
u open ⇒ hφ(u) must have been computed - hφ(u) computed by search starting at φ(u) - φ(u) ∈/ φ(T) ⇒ 1st step is to expand φ(u)
- φ(u) ∈ φ(T) ⇒ hφ(u) = 0, and u itself is necessarily expanded
u unvisited ⇒ ∀ paths from s to u, ∃ never expanded state added to Open Let w be any such state on shortest path from s to u
- w opened ⇒ hφ(w) must have been computed
Embeddings and Homomorphisms 14
Proof (ctd)
To show: in computing hφ(w), φ(u) expanded
- u necessarily expanded by BFS ⇒ δ(s, u) < δ(s, T) - w on shortest path ⇒ δ(s, w) + δ(w, u) < δ(s, T)
- M never expanded by A* ⇒ δ(s, w) + hφ(w) ≥ δ(s, T)
- combining the two inequalities: δ(w, u) < hφ(w) = δφ(w, T) - homomorphism: δφ(w, u) ≤ δ(w, u) ⇒ δφ(w, u) < δ(w, T)
⇒ φ(u) necessarily expanded
Embeddings and Homomorphisms 15
Consistency
Theorem hφ is consistent
hφ consistent ⇒ ∀u, v ∈ S: hφ(u) ≤ δ(u, , v) + hφ(v)
- δφ(u, T) shortest path ⇒ δφ(u, T) ≤ δφ(u, v) + δφ(v, T) for all u and v - substituting hφ: hφ(u) ≤ δφ(u, v) + hφ(v) for all u and v
- homomorphism: δφ(u, v) ≤ δ(u, v) ⇒ hφ(u) ≤ δ(u, v) + hφ(v) for all u and v
Embeddings and Homomorphisms 16
6 Pattern Databases
Name inspired by (n2 − 1)-Puzzle, where pattern is selection of tiles
Pattern database: stores all pattern together with their shortest path distance on simplified board to the pattern for goal
Construction PDB: prior to overall search in a Backward BFS starting with goal pattern and using inverse abstract state transitions
Search in original space: pattern selected in active state with stored distance value as estimator function
Pattern Databases 17
Example ( n
2− 1) -Puzzle
fringe and the corner pattern (databases):
11 3 7
15 14
13
12 12 13 14 15
8 9 10
Multiple pattern databases:
Pattern Databases 18
Maximizing Pattern Databases
Shortest path distance in pattern space ≤ shortest path distance in the original one
⇒ pattern databases heuristics are admissible Combined pattern databases:
- take maximum of heuristic values provided by different databases - use result as admissible heuristic
⇒ optimal solutions for random instances to Rubik’s Cube
Pattern Databases 19
Disjoint Pattern Databases
For sliding-tile puzzles only one tile can move at a time ⇒ disjoint pattern databases count moves of pattern tiles only
General Problem: different pattern databases may count operators twice, since an operator can have a non-trivial image under more than one relaxation
Assumption: Two pattern databases Dφ1 and Dφ2 are disjoint, if for all non-trivial O0 ∈ Oφ1, O00 ∈ Oφ2 we have φ−11 (O0) ∩ φ−12 (O00) = ∅
Finding partitions for pairwise disjoint pattern databases automatically not trivial
⇒ assign 1 to each operator only in 1 relaxation ⇒ sum of retrieved pattern database values preserves admissibility, while being more accurate
Pattern Databases 20
Automated Pattern Selection
Simplify the problem of finding a suitable partition to bin-packing
Task: distribute pattern position (tile) to bins in such a way that a minimal number of bins is used.
Size of the bins: determined by maximum size of abstract state space, which is to be approximated
Adding a position to the pattern: multiplication of domain size to the expected abstract state size
⇒ bin-packing based on multiplying the individual object sizes (for addition use logarithms)
Bin-packing is NP complete but has several efficient approximations
Pattern Databases 21
Korf’s Conjecture
- n: # states in entire problem space - b: brute-force branching factor
- d: average optimal solution length for a random problem instance - e: expected value of heuristic
- m: amount of memory used, in terms of abstract states stored - t: in # generated nodes in A* (without duplicate detection)
Estimated average optimal solution length d of random instance (depth to which A*
must search): d ≈ logb n
Furthermore: e ≈ logb m (abstract space) and t ≈ bd−e Substituting values for d and e into this formula gives:
t ≈ bd−e ≈ blogb n−logb m = n/m
Pattern Databases 22
Multiple Pattern Databases
Observation: Maximized smaller databases reduces # nodes generated better Example Eight-puzzle: 20 pattern databases of size 252 perform less state
expansions (318) then 1 pattern database of size 5,040 (2,160 state expansions) 1. Smaller pattern databases reduces # patterns with high h-values, but
maximization of smaller pattern databases can make the number of patterns with low h-values significantly larger than in the larger pattern database
2. Eliminating low h values more important for improving search performance than retaining high h-values
Pattern Databases 23
On-Demand Pattern Databases
Secondary A* PDB construction ———— Need for on-demand extension
abstract space A*
t −1
s s’
s’ (A*) t’
A*
(A*)−1 t’
original space
abstract space
Pattern Databases 24
Symmetrical and Dual Lookups in Pattern Databases
Symmetry PDB: exploit physical symmetries
(n2 − 1)-Puzzle: symmetry about the main diagonal in the (n2 − 1)-Puzzle ⇒ PDB for 2, 3, 6, and 7 reused for pattern 8, 9, 12, and 13
Dual PDB: symmetry between objects and locations
inverse 10
3 7
2 3
6 7
1 2 3
4 5 6 7
12
8 9 10 11
13 14 15 15
2
9 14 13
7
12 8
6 2 9
2
1 7 6
14 13 12
9 14 13 12 3
3
6 5 11
4
2
7 6 3
original abstract
dual
abstract
Pattern Databases 25
7 Bounded Computation of Pattern Database
Theorem U upper bound on δ(s, T), φ : abstraction function
f: cost function in backward traversal of abstract space
⇒ pattern database heuristic only needs to be computed if f(φ(u)) < U Proof: f(φ(u)) ≤ δφ(s, T) ≤ δ(s, T) ≤ U ⇒
all φ(u) with f(φ(u)) > U cannot lead to any better solution with cost ≤ U
⇒ ignore u
Bounded Computation of Pattern Database 26
8 Planning Pattern Databases
An abstract planning problem P|R = < S|R,O|R,I|R,G|R > of a propositional planning problem < S,O,I,G > wrt. set of atoms R is defined by
1. S|R = {S|R= S ∩ R | S ∈ S}, 2. T|R = {G|R | G ∈ G},
3. O|R = {O|R | O ∈ O}, with O|R = (P|R, A|R, D|R) πR: solutions for abstract planning problem P|R
δR: optimal abstract plan length
Planning Pattern Databases 27
Pattern Databases in AI Planning
Planning pattern database DR (wrt. a set of propositions R and a propositional planning problem < S,O,I,G >):
collection of pairs (h, u) with u ∈ S|R such that h = δR(u), set
DR = {(δR(u), u) | u ∈ S|R}
optimal abstract plan πRopt for P|R always shorter than an optimal concrete plan πopt for P, i.e., δR(u|R) ≤ δ(u), for all u ∈ S
Remark: Strict inequality δR(u|R) < δ(u): some abstract operators are void, or we have alternative even shorter paths in abstract space
Planning Pattern Databases 28