• Keine Ergebnisse gefunden

4.3 SMT with syntactically annotated SCFG

4.3.4 SCFG Rules as Shallow STSG Rules

Shallow Synchronous Tree Substitution Grammars (sh-STSG) are a restricted vari-ant of the STSG grammars presented in Chapter 2 (Section 2.1.2). The restriction

4.3 SMT with syntactically annotated SCFG

Automatic extraction of Shallow Minimal STSG Rules

Shallow STSG rules are obtained by first extracting minimal STSG rules following the procedure in [Liu et al., 2009] and then making those shallow by removing their internal nodes.

Extraction of Minimal STSG Rules Given a word aligned biparsed parallel cor-pus such as in Figure 4.17, the extraction procedure for minimal STSG rules can be summarized as shown in Figure 4.20. The detailed steps are presented in Figures 4.21 to 4.24. Figure 4.25 displays all rules extracted after one iteration of the STSG rule extraction procedure as well as the remaining word aligned tree pair. All words could be extracted to form rules exceptpredictedandsind ausgegangenbecause the maximal source node consistent with the alignment to the leaves sind and ausge-gangenis the root node of the word aligned tree pair. Extracting a rule that contains this node requires to extract first all rules arising from tree pairs that are lower in the source tree. Figures 4.26 and 4.27 show the rules extracted after a second and third pass of the rule extraction algorithm. Finally, Figure 4.28 shows the last step of the procedure, where the remaining tree pair is the rule containingpredictedand sind ausgegangen.

Step 1 Find a maximal source node consistent with a (minimal) setS of word alignment links

Step 2 Find a maximal target node consistent with the minimal extension of S Reiterate 1 and 2 if necessary

Step 3 Excise the rule with maximal source and target node from the biparsed sentence pair and replace with parent labels

Step 4 Add the rule to the rule set Step 5 Repeat

Figure 4.20: Main steps of the rule extraction procedure for minimal STSG rules.

4.3 SMT with syntactically annotated SCFG

Step 1 Find maximal source node consistent with word alignment links Algorithm Given a lowest nodelsin the source tree that is not a non-terminal leaf

and that is aligned to a set of leaves{lt1, ..., ltn}in the target tree, find the highest node in the source tree such that no leaf in its subtree aligns toltk∈ {l/ t1, ..., ltn}.

Example We assume that the English tree is the source tree.

The source leaf nodels=Of f icialis aligned to the target leaf nodelt1=Of f izielle.

The maximal node in the source tree that still aligns to onlylt1is labeledJ J.

S NP JJ Official

NNS forecasts

VP VBD predicted

NP QP RB just

CD 3

NN

%

S NP

ADJA Offizielle

NN Prognosen

VAFIN sind

VP PP APPR

von AP ADV

nur CARD

3

NN

%

VVPP ausgegangen

Figure 4.21: First step of STSG rule extraction procedure.

Step 2 Find maximal target node consistent with word alignment links

Algorithm Given a set of target leaveslti∈ {lt1, ..., ltn}in the target tree that is aligned to the source leaves{ls1, ..., lsm}, find the highest node in the target tree such that it contains all leaveslti∈ {lt1, ..., ltn}.

Example We assume that the English tree is the source tree.

The target leaf node islt1=Of f izielle.

The source leaf node isls=Of f icial.

The maximal node in the target tree that still aligns tolsis labeledADJ A.

S NP JJ Official

NNS forecasts

VP VBD predicted

NP QP RB just

CD 3

NN

%

NP ADJA Offizielle

NN Prognosen

VAFIN sind

VP PP APPR

von AP ADV

nur CARD

3

NN

%

VVPP ausgegangen

4.3 SMT with syntactically annotated SCFG

Step 3 Excise the rule with maximal source and target node

Remove the maximal node in the source tree as well as its subtree.

Replace the removed subtree by its root.

Remove the maximal node in the target tree as well as its subtree.

Replace the removed subtree by its root.

Align the remaining non-terminals.

Example Remove the source subtree rooted byJ J and replace withJ J.

Remove the target subtree rooted byADJ Aand replace withADJ A.

AlignJ J andADJ A.

S NP JJ NNS

forecasts

VP VBD predicted

NP QP RB just

CD 3

NN

%

S NP

ADJA NN Prognosen

VAFIN sind

VP PP APPR

von AP ADV

nur CARD

3

NN

%

VVPP ausgegangen

Figure 4.23: Third step of STSG rule extraction procedure.

Step 4 Add rule to rule set

Create an STSG rule with the excised trees on the input and output sides.

Example

JJ Official

ADJA Offizielle

Figure 4.24: Fourth step of STSG rule extraction procedure.

4.3 SMT with syntactically annotated SCFG

S NP JJ NNS

VP VBD predicted

NP QP

RB CD

NN

S NP ADJA NN

VAFIN sind

VP PP APPR AP

ADV CARD NN

VVPP ausgegangen

JJ Official

ADJA Offizielle

NNS forecasts

NN Prognosen

RB just

ADV nur

CD 3

CARD 3

NN

% NN

%

Figure 4.25: Word aligned biparsed sentence pair and extracted rules after the first pass of the algorithm.

S NP VP VBD predicted

NP

QP NN

S NP VAFIN sind

VP PP APPR AP NN

VVPP ausgegangen

NP JJ NNS

NP ADJA NN

QP RB CD

AP ADV CARD

Figure 4.26: Word aligned biparsed sentence pair and extracted rules after the sec-ond pass of the algorithm.

S NP VP

VBD predicted

NP

S NP VAFIN

sind VP PP VVPP

ausgegangen

NP QP NN

PP APPR

von

AP NN

4.3 SMT with syntactically annotated SCFG

S NP VP

VBD predicted

NP

S NP VAFIN

sind VP PP VVPP

ausgegangen

Figure 4.28: Rules extracted after last pass of STSG rule extraction algorithm.

Removal of Internal Nodes All STSG rules extracted above (Paragraph 4.3.4) are shallow except the last one, shown in Figure 4.28. The shallow version of this rule is given in Figure 4.29.

S

NP predicted NP

S NP VAFIN

sind VP PP VVPP

ausgegangen

Figure 4.29: STSG rule after removal of internal nodes.