4.3 SMT with syntactically annotated SCFG
4.3.4 SCFG Rules as Shallow STSG Rules
Shallow Synchronous Tree Substitution Grammars (sh-STSG) are a restricted vari-ant of the STSG grammars presented in Chapter 2 (Section 2.1.2). The restriction
4.3 SMT with syntactically annotated SCFG
Automatic extraction of Shallow Minimal STSG Rules
Shallow STSG rules are obtained by first extracting minimal STSG rules following the procedure in [Liu et al., 2009] and then making those shallow by removing their internal nodes.
Extraction of Minimal STSG Rules Given a word aligned biparsed parallel cor-pus such as in Figure 4.17, the extraction procedure for minimal STSG rules can be summarized as shown in Figure 4.20. The detailed steps are presented in Figures 4.21 to 4.24. Figure 4.25 displays all rules extracted after one iteration of the STSG rule extraction procedure as well as the remaining word aligned tree pair. All words could be extracted to form rules exceptpredictedandsind ausgegangenbecause the maximal source node consistent with the alignment to the leaves sind and ausge-gangenis the root node of the word aligned tree pair. Extracting a rule that contains this node requires to extract first all rules arising from tree pairs that are lower in the source tree. Figures 4.26 and 4.27 show the rules extracted after a second and third pass of the rule extraction algorithm. Finally, Figure 4.28 shows the last step of the procedure, where the remaining tree pair is the rule containingpredictedand sind ausgegangen.
Step 1 Find a maximal source node consistent with a (minimal) setS of word alignment links
Step 2 Find a maximal target node consistent with the minimal extension of S Reiterate 1 and 2 if necessary
Step 3 Excise the rule with maximal source and target node from the biparsed sentence pair and replace with parent labels
Step 4 Add the rule to the rule set Step 5 Repeat
Figure 4.20: Main steps of the rule extraction procedure for minimal STSG rules.
4.3 SMT with syntactically annotated SCFG
Step 1 Find maximal source node consistent with word alignment links Algorithm Given a lowest nodelsin the source tree that is not a non-terminal leaf
and that is aligned to a set of leaves{lt1, ..., ltn}in the target tree, find the highest node in the source tree such that no leaf in its subtree aligns toltk∈ {l/ t1, ..., ltn}.
Example We assume that the English tree is the source tree.
The source leaf nodels=Of f icialis aligned to the target leaf nodelt1=Of f izielle.
The maximal node in the source tree that still aligns to onlylt1is labeledJ J.
S NP JJ Official
NNS forecasts
VP VBD predicted
NP QP RB just
CD 3
NN
%
S NP
ADJA Offizielle
NN Prognosen
VAFIN sind
VP PP APPR
von AP ADV
nur CARD
3
NN
%
VVPP ausgegangen
Figure 4.21: First step of STSG rule extraction procedure.
Step 2 Find maximal target node consistent with word alignment links
Algorithm Given a set of target leaveslti∈ {lt1, ..., ltn}in the target tree that is aligned to the source leaves{ls1, ..., lsm}, find the highest node in the target tree such that it contains all leaveslti∈ {lt1, ..., ltn}.
Example We assume that the English tree is the source tree.
The target leaf node islt1=Of f izielle.
The source leaf node isls=Of f icial.
The maximal node in the target tree that still aligns tolsis labeledADJ A.
S NP JJ Official
NNS forecasts
VP VBD predicted
NP QP RB just
CD 3
NN
%
NP ADJA Offizielle
NN Prognosen
VAFIN sind
VP PP APPR
von AP ADV
nur CARD
3
NN
%
VVPP ausgegangen
4.3 SMT with syntactically annotated SCFG
Step 3 Excise the rule with maximal source and target node
Remove the maximal node in the source tree as well as its subtree.
Replace the removed subtree by its root.
Remove the maximal node in the target tree as well as its subtree.
Replace the removed subtree by its root.
Align the remaining non-terminals.
Example Remove the source subtree rooted byJ J and replace withJ J.
Remove the target subtree rooted byADJ Aand replace withADJ A.
AlignJ J andADJ A.
S NP JJ NNS
forecasts
VP VBD predicted
NP QP RB just
CD 3
NN
%
S NP
ADJA NN Prognosen
VAFIN sind
VP PP APPR
von AP ADV
nur CARD
3
NN
%
VVPP ausgegangen
Figure 4.23: Third step of STSG rule extraction procedure.
Step 4 Add rule to rule set
Create an STSG rule with the excised trees on the input and output sides.
Example
JJ Official
ADJA Offizielle
Figure 4.24: Fourth step of STSG rule extraction procedure.
4.3 SMT with syntactically annotated SCFG
S NP JJ NNS
VP VBD predicted
NP QP
RB CD
NN
S NP ADJA NN
VAFIN sind
VP PP APPR AP
ADV CARD NN
VVPP ausgegangen
JJ Official
ADJA Offizielle
NNS forecasts
NN Prognosen
RB just
ADV nur
CD 3
CARD 3
NN
% NN
%
Figure 4.25: Word aligned biparsed sentence pair and extracted rules after the first pass of the algorithm.
S NP VP VBD predicted
NP
QP NN
S NP VAFIN sind
VP PP APPR AP NN
VVPP ausgegangen
NP JJ NNS
NP ADJA NN
QP RB CD
AP ADV CARD
Figure 4.26: Word aligned biparsed sentence pair and extracted rules after the sec-ond pass of the algorithm.
S NP VP
VBD predicted
NP
S NP VAFIN
sind VP PP VVPP
ausgegangen
NP QP NN
PP APPR
von
AP NN
4.3 SMT with syntactically annotated SCFG
S NP VP
VBD predicted
NP
S NP VAFIN
sind VP PP VVPP
ausgegangen
Figure 4.28: Rules extracted after last pass of STSG rule extraction algorithm.
Removal of Internal Nodes All STSG rules extracted above (Paragraph 4.3.4) are shallow except the last one, shown in Figure 4.28. The shallow version of this rule is given in Figure 4.29.
S
NP predicted NP
S NP VAFIN
sind VP PP VVPP
ausgegangen
Figure 4.29: STSG rule after removal of internal nodes.