• Keine Ergebnisse gefunden

The problem of evaluating syllabification methods

Im Dokument The Induction of Phonological Structure (Seite 128-135)

Automatic Syllabification

5.4 The problem of evaluating syllabification methods

In Chapter 2, it has already been mentioned that this study is not mainly concerned with a quantitative evaluation of its results. This is because the main purpose of using the present method is to get an idea of how much information about the syllable struc-ture of words is already contained in the distribution of word-peripheral clusters and not to compete with existing syllabification methods in terms of overall syllabification results. Instead, the focus is on a qualitative analysis of the results and the ramifi-cations of the present approach for other (less well-documented) languages that seem to behave differently with respect to their syllabification principles. Another reason is that such an evaluation is not without difficulties. In the previous chapter, the results of the vowel/consonant discrimination methods could be given in full detail as they only contained a limited number of instances that had to be classified. In this chapter, however, the number of cases that could be looked at is more difficult to evaluate and due to the productive nature of the syllabification process very large (or rather infi-nite). I argue that an evaluation of the syllabification results is not as straightforward as it might seem. This is mainly due to the fact that these results differ depending

12The onset in the proper nameGwen is in some accounts analyzed as a single complex segment, viz. a labiovelar consonant.

13I never encountered such a case in my own experiments.

14Note, however, that this occurs very rarely and is not to be confused with the OMP mentioned above where intermediate consonants in VCV sequences are always attributed to the following sylla-ble. In the method presented here, such an analysis follows from the frequency of occurrence of the consonant in word-initial and word-final position.

on how the evaluation procedure is set up. The most important issue in this respect is what a gold standard with which the syllabification methods are compared should look like.

Table 5.3: Different analyses of the syllabification of the English word happy (cited from Duanmu 2009). Underlined consonants are ambisyllabic.

Analysis References

[hæ.pi] Hayes (1995), Halle (1998), Gussmann (2002) [hæp.i] Selkirk (1982), Hammond (1999)

[hæpi] Kahn (1976), Giegerich (1992), Kreidler (2004) [hæp.pi] Burzio (1994)

There are several reasons why a gold standard for syllabification is difficult to establish. Duanmu (2009) states that even for well-described languages like English linguists do not agree on the correct syllabification of comparatively straightforward cases. For the English word happy, for instance, four different analyses have been proposed by different authors, as summarized in Table 5.3. Existing gold standards for syllabification that are used to evaluate the various methods that have been proposed in the literature also differ regarding the correct placement of syllable boundaries.

Among others, the following divergent analyses can be found when comparing the gold standards in the NETtalk and CELEX database (Table 5.4). Sometimes, the gold standard for syllabification is incorrect because the transcription of the word is wrong.

In the German CELEX database, for instance, the wordIngwer‘ginger’ is transcribed as [IN-gv@r] (instead of [IN-v5]), which gives rise to a strange word-internal consonant cluster [gv] that would not show up with the correct transcription.

Table 5.4: Examples of divergent analyses in CELEX and NETtalk syllabifications.

Reproduced from Bartlett et al. (2009).

NETtalk CELEX

Ùaes-taız Ùae-staız chastise

rEz-ıd-@ns rE-zı-d@ns residence

Ar-pEÃ-io Ar-pE-Ãi-o arpeggio

Der-@-baU-t DE-r@-baUt thereabout

Obviously, the evaluation gives different results depending on which gold standard is used. Then the question is how to decide which syllabification should be considered the correct one. In the English examples given above, the question is difficult to answer.

The correct syllabification of a word can most easily be justified when there is some phonological or morphological operation in the language which can best be described in terms of the syllable structure of the language and thus gives indirect evidence how the correct syllabification of the word in light of this operation should be established

(that is, evidence from language games, edge-effects, domain-effects, co-occurrence restrictions etc.). In the case of the Australian language Arrernte [are], which has been argued to violate the tendency for CV as the unmarked syllable type (see below), Breen and Pensalfini (1999:6f) provide evidence from reduplication processes to support their analysis of the syllable structure in that language. This is particularly necessary for cases that do not conform to the common patterns that can be found in the languages of the world but rather display properties which are rarely attested cross-linguistically.

One of these cases is the observation that some languages seem to violate OMP by maximizing the codas of syllables rather than their onsets. Arguments in favor of such an analysis rest on a simpler description of morphological processes.15 Breen and Pensalfini (1999:7) claim that if the Arrernte syllable shape is VC(C), rather than (C)CV, reduplication is most straightforwardly described in terms of syllables. The attenuative prefix is formed by /-elp/ preceded by the first syllable of the base if VC(C) syllabification is assumed. The attenuative form of the baseempwar.’to make’

is thereforeempwelpempwar..16 A similar argumentation can be put forth for languages that show phonological operations that are based on the structure of syllables, e.g., syllable-final devoicing. If a voiced obstruent is realized unvoiced, the syllabification might suggest its position to be in the coda.17

Besides disagreement on the correct syllabification of words, another crucial aspect of evaluating syllabification methods is the question of whether the test set should consist of a random sample of words of the language or whether there should be any constraints on the composition of the evaluation data. If the evaluation consists of a huge number of monosyllabic words, the results are much better than with polysyllabic words because no consonant clusters have to be broken up. For the evaluation of their syllabification methods, Goldwater and Johnson (2005) distinguish words with any number of syllables from those with at least two syllables. Depending on the method that they test the differences in the percentage of correctly syllabified words range from a few to almost 30% for German and up to over 50% for English. It is therefore easier to get better results when applying the syllabification methods to languages with a large number of monosyllabic words and fewer consonant clusters, like Mandarin Chinese, for instance.

For these reasons, I will only present a quantitative evaluation for a few languages in terms of the precision and recall of syllable boundaries at every position in the word (rather than the percentage of correctly syllabified words). In the next section, results for a number of languages are provided which show that the syllabification method presented here—despite its simple technique—is fairly successful in determining syl-lable boundaries. After the quantitative evaluation, a few advantages of the present method are discussed when looking at data from less well-documented languages.

15See also Section 5.6 below for a discussion of the unusual cases of VC syllables and their relevance for the present approach.

16Reduplication patterns are usually described in terms of a CV-template rather than syllable struc-tures. However, in the case of Arrernte, a description in terms of syllables rather than VC(C) shapes would be more elegant and at the same time account for other operations as well.

17See below for such an argumentation of the correct placement of syllable boundaries in the Northern pronunciation of Standard German.

5.5 Evaluation

The method presented above was evaluated on the basis of 1,000 randomly selected words (types) in orthographic representation for a sample of five languages. To this end, I manually created a gold standard of the syllabifications. The results are given in Table 5.5. The individual values in the table refer to the transition from one symbol to another in the words of the sample. Precision thereby indicates the percentage of syllable boundaries that are correctly placed by the method whereas recall represents the percentage of boundaries in the gold standard that are detected by the method.

The F-score gives the harmonic mean of the recall and precision values.18

Table 5.5: Percentage of correctly classified letter transitions tested on 1,000 randomly selected words (types) in orthographic notation.

Language Precision Recall F-score

Basque 92.68 82.44 87.26

Finnish 98.07 94.33 96.16

Latin 91.68 82.15 86.65

Turkish 95.64 95.50 95.57

Wolof 97.46 97.46 97.46

As can be seen in Table 5.5, for all the languages in the sample in over 91% of the cases the syllable boundary was correctly placed in comparison to the gold standard.

With the exception of Basque and Latin, the method also detected over 94% of the syllable boundaries in the gold standard. The results have been achieved without any additional knowledge about the individual languages. In particular, all symbols have been automatically classified as vowels or consonants on the basis of Sukhotin’s algo-rithm (cf. Section 4.1). The results of the syllabification method on orthographic texts are very promising, which makes it interesting as a useful component for unsupervised morphology learning (see Section 2.5.3) where a simple and fast technique that does not depend on specific information on the language under investigation (supervised learning) might be useful.

5.6 Discussion

The previous section has shown that the syllabification method fares pretty well in a quantitative evaluation for a number of languages, although it does not reach the same level of performance as other unsupervised methods discussed in the literature (e.g.,

18The F-Score is calculated as follows (cf. Jurafsky and Martin 2009:489):

F = 2· P recision·Recall

P recision+Recall (5.1)

Goldwater and Johnson 2005; Bartlett et al. 2009).19 Yet the most interesting aspect of this approach is not its performance but the fact that it is able to account for those languages where intervocalic consonants are better be analyzed as belonging to the preceding syllable, thereby violating OMP (as mentioned above). Languages with this behavior are very rare. In fact, it has mainly been claimed for a few closely related Australian languages that VC is the preferred syllable type.20

Assuming that the analysis of VC for those languages is correct21 the present method would outperform other methods on these data as it does not presuppose OMP as a general principle but rather infers it from the data. Approaches relying on the onset maximization principle would get all of the VC.V syllable boundaries wrong by definition. Interestingly, Breen and Pensalfini (1999) note that Arrernte also has only VC in word-initial position.22 Consequently, an approach that is based on word-peripheral clusters can predict the lack of word-medial onsets correctly. In the Arrernte case, the fact that consonants are not to be found word-initially leads to a higher count for the division after the consonant in VCV sequences as these can be found in word-final positions.23 In lack of sufficient data for the Arrernte language, the usefulness of the present approach could not be tested. However, with a phonemic transcription of the language as described in Breen and Pensalfini (1999), the present method would obtain a correct syllabification of the VCV sequences, in contrast to approaches using OMP.

Another piece of evidence for the relevance of word-peripheral clusters with respect to the syllabification of word-medial clusters is given by Vennemann (1972:12-13) for the Northern pronunciation of Standard German. In this variety, a final devoicing rule operates on syllable-final obstruents and thus allows us to detect the syllable boundary in cases where an obstruent is followed by another consonant word-medially.

Interestingly, the North shows a contrast between those medial clusters that can be found in word-initial position (e.g.,glas inGlas ‘glass’ orblas in blond‘blonde’) and those that are not allowed initially (e.g., dl). As to the former, both consonants are affiliated with the following syllable in line with the Law of Initials mentioned above and therefore do not trigger the final devoicing rule: e.g.,liebeln [li:b@ln] ‘to flirt’ vs.

(ich) lieble [li:bl@] ‘(I) flirt’; segel-n [ze:g@ln] ‘to sail’ vs. (ich) segle [ze:gl@] ‘(I) sail’.

19In the literature on automatic syllabification, results are typically given in terms of word accuracy, i.e., the percentage of correctly syllabified words. Bartlett et al. (2009:312), for instance, report a word accuracy of 93.16% for their experiments on the CELEX dataset. For the purpose of ULM, however, an evaluation in terms of correctly placed syllable boundaries is more appropriate.

20SeeDas grammatische Rarit¨atenkabinett, where their preference for VC has been included in the collection of rare phenomena in the languages of the world:

http://typo.uni-konstanz.de/rara/nav/browse.php?number=101(accessed on September 1, 2011).

21See the discussion in Blevins (1996:230-232) about the controversial analyses of VC.V syllabifica-tion in Kunjen (e.g., Sommer 1970) and Borgstrøm (1937) for the Barra dialect of Gaelic.

22See Sommer (1970) for a similar argumentation of the syllable structure in the Kunjen family and Dixon (1970) for a different analysis concerning the lack of word-initial consonants. Although all stems begin with a vowel, this seems to be an inherently unstable structure as the same forms beginning with a consonant are in free variation.

23Goldwater and Johnson (2005) also remark that in one of their experiments initial parameter weights might lead in some cases to coda maximization rather than onset maximization in English.

However, this was an artifact of the method where the model learns to recreate errors that were produced in previous steps of the method.

However, in those cases where the medial cluster cannot be found in initial position, the first consonant is attributed to the preceding syllable, thereby getting devoiced by the final devoicing rule: radeln [ra:d@ln] ‘to go by bicycle’ vs. (ich) radle [ra:tl@] ‘(I) go by bicycle’. The correct pronunciation of these words in the Northern variety can thus be predicted when taking into account permissible word-initial clusters.24

The importance of word-peripheral clusters for the correct syllabification of words has also been acknowledged in earlier approaches (see the description of the Legality approach in Bartlett et al. 2009). It is also supported by findings in Goldwater and Johnson (2005) where a bigram model for syllabification improves after training with expectation maxmization (EM) whereas a positional model does not. The reason for this lies in the fact that the bigram model (unlike the positional model) can generalize whatever it learns about clusters no matter where they occur within words. That is, information on possible word-peripheral clusters is helpful in determining syllable breaks for word-medial clusters.

Moreover, the influence of peripheral clusters on the syllabification of word-medial consonant sequences is not restricted to syllable types only, but sometimes also holds solely for individual consonants. In Chamorro, for instance, Topping (1973:39) describes the syllabification of intervocalic consonants as observing OMP. However, this does not apply if the consonant is the glottal stop <’>, in which case the syllable division occurs after the consonant, leading to the syllabification <na’.i> ’to give’.

The interesting observation in this respect is that the glottal stop phonologically never occurs at the beginning of a word in Chamorro whereas all other consonants (with the exception of /w/) do (cf. Topping 1973:36).25 The fact that individual consonants (or consonant clusters) may behave differently with respect to syllabification is an argu-ment for using the approach taken here where not only syllable types (as in O’Connor and Trim’s 1953 approach) but the actual consonant clusters are taken into account.

This is also helpful for clusters with sibilant consonants that do not conform to the sonority principle (e.g., onsets with the soundsfollowed by a plosive). They do not have to be treated differently with the present method because their co-occurrence merely follows from the fact that these clusters are particularly frequent in word-peripheral position. In the sonority account, special constraints have to be assumed for such cases (cf. Bartlett et al. 2009:311) as the sonority value in such clusters is not strictly increasing and therefore violates the principle. But still, these clusters are found in languages such as English.

One of the disadvantages of the present method is that it is sensitive to frequencies of individual clusters and thereby sometimes breaks up clusters that are better be analyzed as tautosyllabic (one of the few examples in the Latin corpus wasteneb.rae).26

24In lack of a sufficiently large corpus of transcribed word forms, I could not test the method on actual language data of this variety.

25Topping notes that phonetically there is a glottal stop preceding every word-initial vowel, yet this is totally predictable in this position and therefore not phonemic. With respect to the semiconsonant /w/, Topping (1973:34) states that it could be eliminated from the phonemic inventory. In fact, the writing system uses the vowel <u> to mark its presence in words such asGuam.

26The correct syllabification in the Latin example can be tested when looking at the stress pattern of the language, which is determined by the penultimate syllable of the word. If it is heavy (with syllable-finalb), as in the suggested syllabification above, the penultimate syllable would be incorrectly stressed.

If it is light, as in the correct syllabificationte.ne.brae, the antepenultimate syllable is stressed.

This could be due to the fact that the method is dependent on the corpus and its coverage of word-peripheral clusters.

Even though other methods (especially supervised methods as described in Bartlett et al. 2009) perform better when compared to established gold standards for syllab-ification, it is interesting to see how much information about syllable structure can be inferred from the data without providing any additional language-particular knowl-edge. In addition, the present method can be used for unsupervised approaches of text data as no assumption about the phonological category of individual symbols is necessary.

5.7 Conclusions

A complete model of syllabification involves more than what has been presented in this chapter. The method proposed here is restricted to single words and does not take into account resyllabification across word boundaries as well as some other criteria that might influence the actual syllable structure of words such as stress and morpho-logical boundaries. Nevertheless, the discussion of our approach shows that expanding the range of languages to other families and areas of the world can challenge some of the well-established findings that are used for inferring linguistic knowledge. Many approaches developed in the field of computational linguistics are only tested and opti-mized for one language (mostly English) or a small set of closely related languages, but at the same time are often claimed to be applicable to any natural language in general (cf. Bender 2009, 2011). One of the aims of this chapter was to show how insights gained from typological research can inform the design of language-independent com-putational systems by providing information on the variation in language structures and their boundedness at the same time.

This chapter has shown that the constraints on the combination of sounds within words is also helpful in detecting syllable boundaries within words. Unlike previous approaches to unsupervised syllabification, the method presented here can be easily integrated into a language-independent system which aims at the induction of mor-phological structures, as described in Section 2.5.3.

Im Dokument The Induction of Phonological Structure (Seite 128-135)