• Keine Ergebnisse gefunden

Decoding without syntactic annotation

r1

VBD predicted

VAFIN sind

; VVPP ausgegangen

r2

VBD predicted

VAFIN haben

; VVPP vorausgesagt

r3

VBD predicted

VAFIN prognostizierten

English Word German Word Weight

predicted sind 0.01

predicted ausgegangen 0.15 predicted prognostizierten 0.20

predicted haben 0.05

predicted voausgesagt 0.30

Figure 5.16: Lexical weight of aligned words in rulesr1 tor3.

cal model. The count c of target fragments in the sequence is scored as 1001−c to discourage rules with many fragments.

Weight training

The weights of the features in the hierarchical translation model are trained using minimum error rate training [Och, 2003].

5.3 Decoding without syntactic annotation

A rule extraction procedure yielding Sh-l-MBOT rules without syntactic annotation has been presented in Section 5.1.1. In order to build a SMT system using these rules, the hierarchical decoding procedure presented in Section 4.2 has to be ex-tended to deal with rules having (possibly) discontiguous target sides.

5.3.1 CYK+ parsing and Translation Generation

As opposed to the hierarchical case, decoding with Sh-l-MBOT rules requires to integrate the target side of the rules into the deductive inference process presented in Section 4.2.1. This needs to be done because Sh-l-MBOT rules with multiple target side components can only be plugged into rules with the same

5.3 Decoding without syntactic annotation number of target non-terminalsX1, . . . , Xnaligned to the same source non-terminal X. To illustrate this point, consider the Sh-l-MBOT grammar in Figure 5.17. Rule r1, which has 3 target side components can only be composed with rule r7, which has the same number of target non-terminals (X1). In the same way, rule r2 can only be assembled with ruler5.

r1:X−−→ h0.6 predicted, sind ; von; ausgegangeni r2:X−−→ h0.3 predicted, haben ; vorausgesagti r3:X−−→ h0.1 predicted, prognostizierteni r4:X−−→ h1.0 increase, Erh¨ohungi

r5:X−−→ h0.5 They X1an X2, Sie X1eine X2X1i r6:X−−→ h0.2 They X1an X2, Sie X1eine X2i r7:X−−→ h0.3 They X1an X2, Sie X1X1einer X2X1i

Figure 5.17: Example Sh-l-MBOT grammar without syntactic annotations In order to capture this specificity, the deductive proof system in Section 4.2.1 can be extended in two distinct ways:

1. By explicitly representing the target side of the rules in the axioms and parse items.

2. By using the same items as for hierarchical decoding and only taking the target sides into account to restrict non-lexical inference.

Although option (2) is simpler and closer to the proof system already presented in Section 4.2.1, we find that (1) leads to a more generic description because it already integrates all components required to define translation generation.

CYK+ parsing for Sh-l-MBOT grammars

The axioms of the Sh-l-MBOT parser can be written as:

[X → •α, w, γ1. . . γn] (X −→ hα, γw 1. . . γni ∈P) (5.3)

5.3 Decoding without syntactic annotation

[X →α1•tj+1α2, i, j, w, γ1. . . γn]

[X →α1tj+1•α2, i, j+ 1, w, γ1. . . γn] (5.4) In these rules, the consequent simply carries the same sequence of partial transla-tion as the antecedent. Non-lexical inference rules, given in Equatransla-tion 5.5, are more constrained. In these rules, a non-terminal X in an active parse item is processed using a passive item provided that the number of target non-terminals X in the active item is the same as the number n of target side components in the passive item.

[X→α1•Xα2, i, j, w1, γ11. . . γ1n] [X→β•, j, k, w2, γ21. . . γ2m]

[X→α1X•α2, i, k, w1∗w2, γ11. . . γ1n⊗γ21. . . γ2m] ,∗ (5.5) where ∗means that the number of non-terminals in γ11. . . γ1n is limited to m. On the target language side of these items, all non-terminals in the sequenceγ11. . . γ1n that are aligned to X are replaced by the sequence γ21. . . γ2m in left to right order.

This operation is denoted by⊗. Translation generation is included in the inference rules above. Following the terminology adopted in Section 4.2.2, we call the target language sides of passive parse items translation options. In Sh-l-MBOT parsing, translation options can consist of several discontiguous segments. The goal of the system is the item [S → α•,0,|s|, w, β], where the rhs of a rule starting with the start non-terminal S and spanning the entire input string s has been processed.

When the goal is reached, i.e, when the entire input sentence has been processed, there are no target side discontiguities anymore.

Because the source side of Sh-l-MBOT rules is the same as for hierarchical rules, the CYK+ search procedure and the n-best list generation for Sh-l-MBOT grammars are the same as those given in Algorithms 1 and 2.

5.3.2 Example

We illustrate CYK+ parsing with the Sh-l-MBOT grammar in Figure 5.17 as well as

5.3 Decoding without syntactic annotation

E They predicted an increase

From these Sh-l-MBOT rules, the axioms in Figure 5.18 are created following Equation 5.3. As seen in Section 5.3.1, the axioms can:

(i) carry multiple target language sidesγ1, . . . , γn(such as the sequencesind; von;

ausgegangen in axioma1).

(ii) contain multiple target terminals aligned to a single source side non-terminal (such as three X1 in axioma7)

a1:

X→•predicted,0.6,sind ; von ; ausgegangen a2:

X→•predicted,0.3,haben ; vorausgesagt a3:

X→•predicted,0.1,prognostizierten a4:

X→•increase,1.0,Erh¨ohung a5:

X→•They X1an X2,0.5,Sie X1eine X2X1

a6:

X→•They X1an X2,0.2,Sie X1eine X2

a7:

X→•They X1an X2,0.3,Sie X1X1einer X2X1

Figure 5.18: Axioms created with rules having lhsX intoa[X]

During CYK+ chart parsing, rules are assembled according to the combina-tion operator ⊗ defined in Section 5.3.1. Figure 5.19 shows the items inferred from the axioms in Figure 5.18 on increasing spans of sentence E. For bet-ter readability we only display the target sides γ1. . . γn of passive rules (the translation options). This illustration shows, for instance, that the passive item [X → predicted•,1,2, 0.6, sind ; von ; ausgegangen] cannot be combined with [X → They•X1 an X2,0,1, 0.2,Sie X1 eine X2] because the number of its target sides (3) is different from the number (1) of aligned target non-terminals in the active item. The translation optionssind ; von ; ausgegangenand haben ; vorausge-sagtcovering span 1 are discontiguous.

5.3.3 Language Model Integration

5.3 Decoding without syntactic annotation discontiguous segments γ1. . . γn are passed to the language model scoring func-tions. To LM-score such segments, we define the function P SLM which applies to discontiguous segments of terminals γ1. . . γn and multiplies the language model score for each segment.P SLM can be written as:

P SLM1. . . γn) =

n

Y

i=1

PLMi) (5.6)

wherePLM is defined as in Equation 4.12. For each segmentγi bigger than the size m of the m-gram language model, we apply the function M arkm defined in Equa-tion 4.13. The funcEqua-tionP SLM1. . . γn)yields unprecise language model scores for translation options consisting of many small discontiguous units because it treats these as being independent although those will be combined into a single string.

However, as soon as segments inγ1. . . γn are assembled to form larger units, these are scored. This means that the LM scores get more accurate as the span size in-creases. As in the final rules only one component is allowed, the final LM score is computed for the complete output sentence.

The functionsP SLM1. . . γn)andM arkm(γ)are added to the Sh-l-MBOT infer-ence rules. The axioms become:

[X → •α, w, γ1. . . γn, M arkmi), P SLM1. . . γn)] (X −→ hα, γw 1. . . γni ∈P) (5.7) The lexical inference rule becomes:

[X →α1•tj+1α2, i, j, w, γ1. . . γn, M arkmi), P SLM1. . . γn)]

[X →α1tj+1•α2, i, j+ 1, w, M arkmi), P SLM1. . . γn)] (5.8) The non-lexical inference rule is given by the following equations where 5.9 and 5.10 are the antecedents of the item and 5.11 is the consequent.