The CYK+ parsing algorithm - Hierarchical Decoding

4.2 Hierarchical Decoding

4.2.1 The CYK+ parsing algorithm

[Chiang, 2007, Hoang, 2011] formalize the CYK+ parsing algorithm as a deductive proof system and define a decoding procedure using this system. We begin by intro-ducing deductive proof systems. Then we present the inference rules of the chart parser before showing the search procedure for an input sentence. As we present the CYK+ parser for hierarchical systems, we take all non-terminal symbols from the alphabetN_s=N_t ={X, S}, as defined in Section 4.1.1.

Deductive Proof Systems

A deductive proof system consists of (i) a set of weighted items, which we write X_i :w_i, and (ii) a set of inference rules of the form:

A₁ :w₁· · ·A_k:w_k

B :w φ

Moses.

4.2 Hierarchical Decoding where A_i : w_i and B : w are weighted items and φ is a side condition. Inference rules are used in the inference process to prove new items given a set of items that have already been proved. The first, denoted byB :win the inference rule, is called theconsequentof the rule. The latter, writtenA_i :w_i, are called theantecedents. The inference process starts with a set of items, called the axioms, which are assumed being already proved (or true) before inference starts. Given these axioms and the inference rules, new items are proved. The process stops once a special axiom, called thegoalof the process, has been proved.

The CYK+ Chart Parser

Instead of using grammar rules in Chomsky normal form, the CYK+ chart parser uses dotted rules of the form:

X →α1•α2

whereα₁andα₂are strings composed of terminal or non-terminal symbols. Dotted rules are applied on the input string in a bottom-up fashion, just as for the CYK algorithm. In addition, the rules memorize the processed symbols in the right-hand side (rhs) of a rule by moving the dot left-to-right, one symbol at a time. Once the dot has consumed all symbols in the rhs of a rule, this rule is calledpassive. Passive rules can be used to parse non-terminal symbols in the same fashion as grammar rules in the CYK parsing algorithm.

Written as a deductive proof system, the CYK+ chart parsing algorithm is com-posed of items of the form [X → α1 •α2, i, j, w] where X → α1 •α2 is a dotted rule in which the string α₁ has been recognized. The complete item indicates that the string α₁ spanning from i to j has been processed inside of a subtree rooted by X. For the complete subtree to be processed, the string α₂ remains to be rec-ognized. The last element of the item is the weight wof the subtree. We call such items parse items. We distinguish betweenactive and passiveparse items. The first are composed of active dotted rules and the second of passive rules. The axioms

4.2 Hierarchical Decoding of the parser are obtained by means of the grammar rules X −→^w α. For each rule, the axiom in Equation 4.5 is created. Because axioms model rules that have not yet been applied, we omit the span information in the remainder of the thesis.

[X → •α, i, i, w] (X −→^w α∈P) (4.5) The axiom indicates that the stringα, that is the rhs of the grammar ruleX −→^w α, remains to be recognized for the subtree rooted byXto be processed. The inference rules of the parser are divided into two categories:

1. Lexical rules process terminal symbols and are of the form:

[X →α₁•t_j+1α₂, i, j, w]

[X →α₁t_j+1•α₂, i, j+ 1, w] (4.6) wheretj+1 is the terminal spanning fromjtoj+ 1. The antecedent of the rule is a parse item where the terminal symbol t_j+1 has not yet been recognized.

The consequent of the rule is an item in which t_j+1 has been processed and the span extended toj+ 1.

2. Non-lexical rules process non-terminal symbols and have the form:

[X →α₁•Xα₂, i, j, w₁] [X →β•, j, k, w₂]

[X →α₁X•α₂, i, k, w₁∗w₂] (4.7) where X is a non-terminal spanning from j tok. The first antecedent of the rule is a parse item where the non-terminal X remains to be processed. The second antecedent is a parse item where the ruleX →β•is passive, meaning that its right hand side has already been consumed. The consequent is a parse item where the non-terminalX has been processed via the passive ruleX → β•and the span extended fromj tok.

4.2 Hierarchical Decoding been processed.

CYK+ search Procedure

The CYK+ parsing process starts with the axioms defined above and produces new items using the inference rules. The search procedure is given in Algorithm 1. It uses six data structures. The first two, denoted bya[· · ·], are lists containing the axioms constructed using the rules of the grammar, one list for each non-terminal of the grammar⁷. The second and fourth, denoted byd[· · ·], contain active parse items for each non-terminal. The third and fifth, indicated by h[· · ·], contain passive parse items for each non-terminal. The item lists are ordered from smallest to largest span in an input sentencef of length|f|.

The parsing algorithm traverses each list cell by cell. In each cell, the parser tries to prove all items belonging to this cell. Each time a new item is proven, it is added to the cell together with a tuple of back pointers to the items from which it has been inferred. Proven items that are still active are added to the list of active items (line 7). Proven items that are passive are added to the list of passive items (line 10). Once all items rooted by X have been handled, the parser processes the items rooted by S. When several equivalent items populate a cell, the one with the best weight is kept, together with its back pointers. When the complete input sentence has been processed, the parsing procedure ends. If the goal [S → α•,0,|f|, w]has been proved, the sentence f could be parsed successfully. The best derivation is obtained by starting with the goal item [S → α•,0,|f|, w] and following the back pointers stored in each item.

7[Chiang, 2007] puts all rules in the same list. BecauseXrules have to be applied beforeSrules, we found it convenient to put each rule type in a specific list.

4.2 Hierarchical Decoding

Algorithm 1CYK+ search algorithm Data structures:

- a[X]: list of axioms created from rulesX−→^w α∈P - a[S]: list of axioms created from rulesS −^w→α ∈P

- d[X, i, j]: list of active items rooted byXwith span[i . . . j]

- h[X, i, j]: list of passive items rooted byXwith span[i . . . j]

- d[S, i, j]: list of active items rooted bySwith span[i . . . j]

- h[S, i, j]: list of passive items rooted byS with span[i . . . j]

1: for allaxioms [Y → •α, w]do 2: Insert [Y → •α, w]intoa[Y] 3: end for

4: forl←1,· · ·,|f|do 5: for alli, j|j−i=ldo

6: for allitems[Y →α₁•α₂, i, j, w](withα₂not empty) provable from items ina[Y] andd[Y, i, j]andh[Y, i, j]do

7: Add[Y →α₁•α₂, i, j, w]tod[Y, i, j] 8: end for

9: for all items [Y → α•, i, j, w] provable from items in in a[Y] and d[Y, i, j] and h[Y, i, j]do

10: Add[Y →α•, i, j, w]toh[Y, i, j]

11: end for 12: end for 13: end for

4.2 Hierarchical Decoding

Im Dokument Decoding strategies for syntax-based statistical machine translation (Seite 72-77)