Decoding without syntactic annotation - Decoding strategies for syntax-based statistical machin

VBD predicted

VAFIN sind

; VVPP ausgegangen

VBD predicted

VAFIN haben

; VVPP vorausgesagt

VBD predicted

VAFIN prognostizierten

English Word German Word Weight

predicted sind 0.01

predicted ausgegangen 0.15 predicted prognostizierten 0.20

predicted haben 0.05

predicted voausgesagt 0.30

Figure 5.16: Lexical weight of aligned words in rulesr₁ tor₃.

cal model. The count c of target fragments in the sequence is scored as 100^1−c to discourage rules with many fragments.

Weight training

The weights of the features in the hierarchical translation model are trained using minimum error rate training [Och, 2003].

5.3 Decoding without syntactic annotation

A rule extraction procedure yielding Sh-l-MBOT rules without syntactic annotation has been presented in Section 5.1.1. In order to build a SMT system using these rules, the hierarchical decoding procedure presented in Section 4.2 has to be ex-tended to deal with rules having (possibly) discontiguous target sides.

5.3.1 CYK+ parsing and Translation Generation

As opposed to the hierarchical case, decoding with Sh-l-MBOT rules requires to integrate the target side of the rules into the deductive inference process presented in Section 4.2.1. This needs to be done because Sh-l-MBOT rules with multiple target side components can only be plugged into rules with the same

5.3 Decoding without syntactic annotation number of target non-terminalsX₁, . . . , X_naligned to the same source non-terminal X. To illustrate this point, consider the Sh-l-MBOT grammar in Figure 5.17. Rule r1, which has 3 target side components can only be composed with rule r7, which has the same number of target non-terminals (X₁). In the same way, rule r₂ can only be assembled with ruler₅.

r₁:X−−→ h^0.6 predicted, sind ; von; ausgegangeni r₂:X−−→ h^0.3 predicted, haben ; vorausgesagti r₃:X−−→ h^0.1 predicted, prognostizierteni r₄:X−−→ h^1.0 increase, Erh¨ohungi

r₅:X−−→ h^0.5 They X1an X2, Sie X1eine X2X1i r₆:X−−→ h^0.2 They X1an X2, Sie X1eine X2i r7:X−−→ h^0.3 They X1an X2, Sie X1X1einer X2X1i

Figure 5.17: Example Sh-l-MBOT grammar without syntactic annotations In order to capture this specificity, the deductive proof system in Section 4.2.1 can be extended in two distinct ways:

1. By explicitly representing the target side of the rules in the axioms and parse items.

2. By using the same items as for hierarchical decoding and only taking the target sides into account to restrict non-lexical inference.

Although option (2) is simpler and closer to the proof system already presented in Section 4.2.1, we find that (1) leads to a more generic description because it already integrates all components required to define translation generation.

CYK+ parsing for Sh-l-MBOT grammars

The axioms of the Sh-l-MBOT parser can be written as:

[X → •α, w, γ₁. . . γ_n] (X −→ hα, γ^w ₁. . . γ_ni ∈P) (5.3)

5.3 Decoding without syntactic annotation

[X →α₁•t_j+1α₂, i, j, w, γ₁. . . γ_n]

[X →α1tj+1•α2, i, j+ 1, w, γ1. . . γn] (5.4) In these rules, the consequent simply carries the same sequence of partial transla-tion as the antecedent. Non-lexical inference rules, given in Equatransla-tion 5.5, are more constrained. In these rules, a non-terminal X in an active parse item is processed using a passive item provided that the number of target non-terminals X in the active item is the same as the number n of target side components in the passive item.

[X→α₁•Xα₂, i, j, w₁, γ₁₁. . . γ_1n] [X→β•, j, k, w₂, γ₂₁. . . γ_2m]

[X→α1X•α2, i, k, w1∗w2, γ11. . . γ1n⊗γ21. . . γ2m] ,∗ (5.5) where ∗means that the number of non-terminals in γ₁₁. . . γ_1n is limited to m. On the target language side of these items, all non-terminals in the sequenceγ₁₁. . . γ_1n that are aligned to X are replaced by the sequence γ₂₁. . . γ_2m in left to right order.

This operation is denoted by⊗. Translation generation is included in the inference rules above. Following the terminology adopted in Section 4.2.2, we call the target language sides of passive parse items translation options. In Sh-l-MBOT parsing, translation options can consist of several discontiguous segments. The goal of the system is the item [S → α•,0,|s|, w, β], where the rhs of a rule starting with the start non-terminal S and spanning the entire input string s has been processed.

When the goal is reached, i.e, when the entire input sentence has been processed, there are no target side discontiguities anymore.

Because the source side of Sh-l-MBOT rules is the same as for hierarchical rules, the CYK+ search procedure and the n-best list generation for Sh-l-MBOT grammars are the same as those given in Algorithms 1 and 2.

5.3.2 Example

We illustrate CYK+ parsing with the Sh-l-MBOT grammar in Figure 5.17 as well as

5.3 Decoding without syntactic annotation

E They predicted an increase

From these Sh-l-MBOT rules, the axioms in Figure 5.18 are created following Equation 5.3. As seen in Section 5.3.1, the axioms can:

(i) carry multiple target language sidesγ₁, . . . , γ_n(such as the sequencesind; von;

ausgegangen in axioma₁).

(ii) contain multiple target terminals aligned to a single source side non-terminal (such as three X₁ in axioma₇)

a₁:

X→•predicted,0.6,sind ; von ; ausgegangen a₂:

X→•predicted,0.3,haben ; vorausgesagt a₃:

X→•predicted,0.1,prognostizierten a₄:

X→•increase,1.0,Erh¨ohung a₅:

X→•They X1an X2,0.5,Sie X1eine X2X1

a₆:

X→•They X1an X2,0.2,Sie X1eine X2

a₇:

X→•They X1an X2,0.3,Sie X1X1einer X2X1

Figure 5.18: Axioms created with rules having lhsX intoa[X]

During CYK+ chart parsing, rules are assembled according to the combina-tion operator ⊗ defined in Section 5.3.1. Figure 5.19 shows the items inferred from the axioms in Figure 5.18 on increasing spans of sentence E. For bet-ter readability we only display the target sides γ₁. . . γ_n of passive rules (the translation options). This illustration shows, for instance, that the passive item [X → predicted•,1,2, 0.6, sind ; von ; ausgegangen] cannot be combined with [X → They•X₁ an X₂,0,1, 0.2,Sie X₁ eine X₂] because the number of its target sides (3) is different from the number (1) of aligned target non-terminals in the active item. The translation optionssind ; von ; ausgegangenand haben ; vorausge-sagtcovering span 1 are discontiguous.

5.3.3 Language Model Integration

5.3 Decoding without syntactic annotation discontiguous segments γ₁. . . γ_n are passed to the language model scoring func-tions. To LM-score such segments, we define the function P S_LM which applies to discontiguous segments of terminals γ1. . . γn and multiplies the language model score for each segment.P S_LM can be written as:

P S_LM(γ₁. . . γ_n) =

i=1

P_LM(γ_i) (5.6)

whereP_LM is defined as in Equation 4.12. For each segmentγ_i bigger than the size m of the m-gram language model, we apply the function M ark_m defined in Equa-tion 4.13. The funcEqua-tionP S_LM(γ₁. . . γ_n)yields unprecise language model scores for translation options consisting of many small discontiguous units because it treats these as being independent although those will be combined into a single string.

However, as soon as segments inγ₁. . . γ_n are assembled to form larger units, these are scored. This means that the LM scores get more accurate as the span size in-creases. As in the final rules only one component is allowed, the final LM score is computed for the complete output sentence.

The functionsP S_LM(γ₁. . . γ_n)andM ark_m(γ)are added to the Sh-l-MBOT infer-ence rules. The axioms become:

[X → •α, w, γ₁. . . γn, M arkm(γi), P SLM(γ1. . . γn)] (X −→ hα, γ^w ₁. . . γ_ni ∈P) (5.7) The lexical inference rule becomes:

[X →α1•tj+1α2, i, j, w, γ1. . . γn, M arkm(γi), P SLM(γ1. . . γn)]

[X →α₁t_j+1•α₂, i, j+ 1, w, M ark_m(γ_i), P S_LM(γ₁. . . γ_n)] (5.8) The non-lexical inference rule is given by the following equations where 5.9 and 5.10 are the antecedents of the item and 5.11 is the consequent.

Im Dokument Decoding strategies for syntax-based statistical machine translation (Seite 121-126)