• Keine Ergebnisse gefunden

III. Compound Merging 141

11.6. Chapter Summary

the POS tag of the words is taken into account in order to prevent erroneous mergings.

The example in Table 11.5 shows that merging “in”+“dem” into the portemanteau “im”

should only be performed when “dem” is used as a determiner (POS = “ART”). Oth-erwise, for example when used as a relative pronoun (POS = “PRELS”) the two words should be left separated.

Sometimes, the merging of a preposition and a determiner into a portemanteau of our list does not adequately capture the meaning of a sentence, even though the sentence is grammatically correct. However, such exceptions are very rare and it would require a semantic interpretation to capture them.

In the previous chapter, we presented our compound merging procedure in detail. It makes use of machine learning (conditional random fields, Crfs) to predict whether simple words should be merged into compounds. In the present chapter, we evalute the accuracy of the Crf compound prediction models on clean data, before we include the whole compound processing pipeline into end-to-end SMT experiments in Chapter 13.

This allows us to get a first impression on the reliability of the different feature combina-tions we used to train the models. Accuracies are measured with respect to automatically obtained gold annotations using theprecision, recall and F-score metrics. Note that we already published parts of these results in Cap et al. (2014a).

Inflection Accuracy In our compound processing approach, we use Crfs not only to predict suitable merging points of simple words, but also to predict grammatical features like e.g. case or number. As mentioned earlier, we reuse an inflection handling component developed by Marion Weller, Alexande Fraser and Aoife Cahill. In Weller (2009) and Fraser et al. (2012), the accuracies of different inflection prediction models have already been examined on clean data. The fourCrfmodel cascade we use was found to score highest, namely 94.29% accuracy without compound processing. We adapted their models without any major modifications and thus do no re-calculate clean data accuracies. The only modfication we performed was an improved compound selection in the final morhpological generation process, which is independent of the Crf models themselves. Details are given in Section 11.5.2.

Structure The remainder of this chapter consists of two parts: In Section 12.1, we give details on the experimental settings we used. In Section 12.2 we report on compound prediction accuracies. Section 12.3 briefly summarises this chapter.

12.1. Setup

In this section, we describe the experimental settings we used to evaulate compound prediction accuracies, including the data, the creation of gold annotations, different ex-periments (in terms of different feature combinations) and how we evaluated the outcome of the predictions against the gold annotations.

Data In order to be able to use source language features for the Crfs, it is necessary to use a parallel text. We use data from the EACL 2009 workshop on statistical machine translation.80 The compound prediction Crfs are trained on the parallel training data (∼40 million words), but the frequencies of the target language features are derived from the monolingual training data, consisting of roughly 227 million words.

Gold Annotations Starting from the parallel training data, compounds are split using the compound splitting approach described in Chapter 7 and the whole text is trans-formed into the underspecified representation introduced in Section 11.1. The task of compound merging can be defined as a reversion of compound splitting. The compound splitting decisions are thus stored in the course of splitting and will be learned by the compound mergingCrf as merging decisions.

Experiments An overview of the different feature combinations we used is given in Table 12.1. More detailed feature descriptions can be found in Section 11.3 above. In order to motivate our new feature combinations and to make them comparable to previ-ous work, we trained one model (Sc) using only the target-language features described in Stymne and Cancedda (2011). Note, however, that this comparison only concerns the accuracy of the prediction model. We use the same morphology-aware compound split-ting approach for all merging experiments, including (Sc). Due to the underspecified representation we use, we excluded the n-gram character features described in Stymne and Cancedda (2011) in ourSc experiment81. BesidesSc, we trained four more models based on two different target-language feature sets: one full feature set, with (St) and without (T) source language features and one reduced target language feature set with (Str) and without (Tr) source language features.

80http://www.statmt.org/wmt09

81A closer re-implementation of Stymne and Cancedda (2011)’s work is beyond the scope of this work, as this would include using a different splitting approach, factored SMT, no modifier normalisation, no inflection prediction, and a noisification of the CRF training data.

Feature Experiment

No Short Description Type Sc T Tr St Str

1SC underspecified representation of the word string X X X

2SC mainPosof the word string X X X

3SC word occurs in a bigram with the next word freq. X X X 4SC word combined to a compound with the next word freq. X X X X X 5SC word occurs in modifier position of a compound freq. X X X 6SC word occurs in a head position of a compound freq. X X X 7SC word occurs in modifier position vs. simplex string X

8SC word occurs in head position vs. simplex string X

7SC+ word occurs in modifier position vs. simplex ratio X X X X 8SC+ word occurs in head position vs. simplex ratio X X X X 9N different head types the word can combine with freq. X X X X

10E Posof the corresponding English word string X X

11E English noun phrase bool. X X

12E English gerund construction bool. X X

13E English genitive construction bool. X X

14E English adjective noun construction bool. X X

15E aligned uniquely from the same English word bool. X X

16E like 15E, but the English word contains a dash bool. X X

17E like 15E, but not only unique links bool. X X

18E like 16E, but not only unique links bool. X X

Table 12.1.: Overview of Compound Merging experiments.

Crf features: SC = features taken from Stymne and Cancedda (2011), SC+ = improved versions,N= new feature,E= features projected from the English input.

Experiments:Sc= re-implementation of Stymne and Cancedda (2011),T= use full Target feature set,Tr= useTarget features, but only aReduced set,St= useSource language features plusT,Str= use Source language features plusTr

Evaluation We use the tuning set from the 2009 WMT shared task to evaluate the Crfmodels we trained on the respective training set of WMT 2009. It consists of 1,025 sentences. The gold annotations were obtained in the same way as the training data by remembering split points. The evaluation procedure consists of the following steps:

1. split compounds of the German wmt2009 tuning data set (= 1,025 sentences) 2. remember compound split points and store them as gold annotations

3. predict merging points with Crf models

→ calculate F-scores to indicate Crf prediction accuracies 4. merge predicted words into compounds using SMOR

→ calculate F-scores on how properly the compounds were merged F-scores are calculated using the following formula:

F =

2∗(precision∗recall)

precision+recall

all compounds particle verbs all 2 parts 3 parts 4 parts all 2 parts 3 parts

labels 1,427 1,151 967 172 12 276 154 122

words 1,272 1,057 967 86 4 215 154 61

Table 12.2.: Distribution of merging labels, and words to be merged in the German wmt2009 tuning set.