• Keine Ergebnisse gefunden

The principle of similar place avoidance

Im Dokument The Induction of Phonological Structure (Seite 164-188)

Place of Articulation

7.2 The principle of similar place avoidance

It has been known for quite some time that there are systematic gaps in the phono-tactics of triliteral verbal roots of Semitic and that only a small proportion of roots which in principle could be possible according to the consonant inventories of the re-spective languages are actually attested in their lexicons. A number of authors have investigated these co-occurrence restrictions for Semitic languages (see Bachra 2001 for an overview). Arab and Hebrew grammarians already noted in the Middle Ages that certain pairs of consonants rarely or never occur in Arabic and Hebrew roots.

They observed that such restrictions are mainly to do with the place of articulation of the respective consonants. In his 1880 grammar of the Arabic dialect of Egypt, W. Spitta-Bey—referring to an older source—remarks that “the Arabic language tends to combine those letters in a word whose points of formation are remotely distant, such as gutturals and dentals.”1 The ubiquitous example of the Arabic root √

ktbis a perfect illustration of this principle as all three consonants are produced at different points in the vocal tract. Although this phenomenon is for the most part not absolute but allows for counterexamples, it is nevertheless a salient component of the language.

Arab lexicographers made use of it as a criterion for determining loan words in the Arabic lexicon (Greenberg 1950). More recent studies by Frisch (1996) and Frisch et al.

(2004) also demonstrate that native speakers of Arabic are aware of such statistical tendencies.

The basic observation of SPA is that successive homorganic consonants are avoided in non-derived forms. For instance, the English stem bit observes SPA because both consonants are pronounced in different places of the oral cavity (bis labial,tis coronal), whereas the word mapdoes not (bothm and p are labial and therefore articulated in the same position). The principle is usually not stated in absolute terms, prohibiting the co-occurrence of consonants sharing the same place feature. Rather, it is seen as a statistical tendency where the occurrence of such pairs is assumed to be underrep-resented in the data. In other words, languages are free to exploit all possibilities of consonant combinations, even those where both consonants are identical (most famous in child language with frequent words such asmumor dad) or occasionally with non-identical consonants such as in the above example ofmap.2 Yet across a large number of consonant sequences such cases are considered to be only marginal.

In later work, it was shown that such constraints are not confined to segmental ma-terial only. Looking at suprasegmental features in Bantu languages, Leben (1973) ar-gues that a similar restriction as to the avoidance of likes holds for the co-occurrence of tones in underlying representations. This was formulated in the framework of

autoseg-1My translation of Spitta-Bey’s (1880) original version: “Nun hat, wie schon l¨angst bemerkt ist . . . , die arabische Sprache die Neigung, solche Buchstaben in einem Worte zu vereinigen, deren Organe weit von einandern entfernt liegen, wie Kehllaute und Dentale.” (cf. Greenberg 1950)

2It is an interesting observation that the early speech of children is characterized by the opposite effect of SPA (cf. Hansson 2010:128 and references therein). Fikkert and Levelt (2010) note that in the early stages of language acquisition Dutch children are known to produce words where both consonants and vowels tend to share the same place of articulation feature, with greater place differentiation being achieved in the course of learning their mother tongue. In this respect, it is not surprising that the most frequent words that do violate the principle are present and salient from the very beginning (e.g., dad, mum, papa, etc.) in the acquisition process.

mental phonology (Goldsmith 1976) as the obligatory contour principle (OCP): “At the melodic level, adjacent identical elements are prohibited” (McCarthy 1986:208). The principle was subsequently interpreted not only in terms of its role in tonal phonology but as a constraint on the organization of segmental phonology in the representation of phonemic melodies and tiers in nonlinear morphology for the analysis of Arabic morpheme structure constraints (cf. McCarthy 1986; Clements and Hume 1994).

In the last sixty years, a number of scholars have investigated the phenomenon of SPA systematically for a variety of Semitic languages. In his own contribution to the topic, Bachra (2001) gives an overview of earlier work on co-occurrence restrictions in Semitic verbal roots, where he summarizes the results of studies on root structure constraints in Classical Arabic, Modern Standard Arabic, Biblical Hebrew, Akkadian and Modern Israeli Hebrew. These studies differ along several aspects in their investi-gation, among other things in the way in which consonants are subdivided into groups and the phonological frameworks in which their analyses are couched. It has to be noted that some of these works are not confined to the investigation of place avoid-ance but also consider the avoidavoid-ance of manner features, or more generally deal with co-occurrence restrictions of any combinations of sounds, irrespective of their feature specification.

What at some point may have been thought to be a genealogical trait of the Semitic family had been postulated very early for German by W. F. Twaddell (1939, 1940) on the basis of a large collection of phonologically transcribed word forms. To my knowledge, his study is the first to investigate a non-Semitic language with respect to the avoidance of place features in successive consonants. He examined stressed syllables in a list of 37,500 German mono- and disyllabic word forms and came to the conclusion that the consonants surrounding the stressed vowel in these forms do not only show a tendency for SPA but also for the avoidance of similar manner and voice features in what he called the repulsion of likes (Twaddell 1940:46). In fact, he found an under-exploitation of the combination of similar consonants before and after the stressed vowel for a variety of different distinctions. Twaddell’s work is important not only because it is a systematic analysis of co-occurrence restrictions in a non-Semitic language but also because it extends the earlier focus on place features to other phonological categories such as manner of articulation and voice distinctions. The restriction to forms of one and two syllables has been made based on purely practical considerations in order not to unnecessarily increase the bulk of material that has to be considered without any corresponding increase in the variety of phonological combinations. Twaddell notes that pilot experiments indicated that all the phenomena which appear in multi-syllabic word forms can also be found in mono- and disyllabic words.

A repulsion of likes in the sense of Twaddell has also been confirmed later for German and its ancestor language. Plank (1981:221f), for instance, observes that German tends to avoid identical consonants in initial and final positions of monosyllabic stems (allowing for differences in voicing). Those verbs with identical initial and final consonants which do exist are mostly onomatopoetic and all morphologically regular, indicating that they are not basic verbs, but represent a technique of word formation, perhaps derivative of reduplication as it is especially common in child or child-directed speech. Davis (1991) observes a constraint on English consonants insCVC sequences.

In a lexicon of 20,000 words there is only one word (skunk)3 where the two consonants are either both labial or both velar. In fact, only coronal /t/ is not subject to the constraint that prevents homorganic consonants to co-occur insCVC sequences (e.g., state, stitch, stood).4 With respect to the avoidance of manner features, Iverson and Salmons (1992) note among other things that Stop-V-Stop roots were very rare in Proto-Indo-European, representing only 3.5% of a lexicon of more than 2,000 items.

This also suggests that the repulsion of likes is not only restricted to place features but that SPA represents a special case of a more general principle.

More recently, a number of statistical studies have confirmed SPA for other non-Semitic languages; e.g., Kawahara et al. (2005) for Yamato Japanese [jpn] and Coetzee and Pater (2006) for Muna [mnb] (see also the references in Frisch et al. 2004:212-213 and Hansson 2010:127). Pozdniakov and Segerer (2007) found impressive support for it in their sample of Atlantic and Bantu languages of Niger-Congo and further tested its cross-linguistic validity on a genealogically and typologically diverse set of languages or language groups from all parts of the world (Mande, Kwa, Ubangi, Sara-Bongo-Bagirmi, Chadic, Malagasy, Indo-European, Nostratic, Mongolian, Basque, Quechua, Kamilaroi, Port Moresby Pidgin English) with a similar outcome.

In what follows, I will test the validity of SPA on a cross-linguistic sample of word forms in order to show that the principle is indeed more widespread than previously assumed. The focus will be on place features as they seem to be avoided most strongly in such contexts. Furthermore I will investigate the principle in more detail for the languages and the material in the CELEX lexical database (Baayen et al. 1995) in order to test some of the assumptions that have been made in the literature as to the context in which SPA is strongest in surface forms and to what extent other parameters that cannot be found in non-derived forms influence the result.

7.2.1 Testing SPA

The fact that SPA is a statistical tendency rather than an absolute constraint makes it necessary to employ an evaluation measure that shows the degree to which the constraint is active in the language. Before presenting the results for the individ-ual languages and data sources I want to explain in more detail how the statistical evaluation has to be interpreted.5

One of the most important decisions that have to be made when testing the princi-ple is which places of articulation are taken to be similar, i.e., which consonants fall into the respective subdivisions and how many of these subcategories are postulated. Ex-cluding combinations with non-place distinctions (such as place-manner combinations in a group like coronal stops) the number of subdivisions that have been made in the literature range from having only three (labial, coronal, dorsal) in Mayer et al. (2010b) to nine (labials, interdentals, sibilants, alveolars, prepalatals, postpalatals, velars, pha-ryngeals and lapha-ryngeals) in Cantineau (1946). In their cross-linguistic investigation of

3On the assumption that English has underlying velar nasals /N/ (Davis 1991:57).

4Davis (1991) takes this as further evidence for the special status of coronals (see Section 7.4).

5This section only describes those statistical techniques that are used in evaluating the strength of SPA. More information on the calculation and interpretation ofχ2and theφvalue is given in Section 3.2.

SPA, Pozdniakov and Segerer (2007) opted for a four-way distinction into labial (P), dental (and alveolar) (T), (alveo-)palatal (C) and velar (K) consonants. A closer look at their results, however, reveals that the medial categories ofT and C behave simi-larly with respect to the other categories in their tendency for SPA. For this and other reasons to be explained below, both categories were merged in earlier work (Mayer et al. 2010b) to end up having a three-way distinction into labial (L), coronal (C) and dorsal (D).6 This is also confirmed by the results on the clustering of individual consonants across languages in Section 7.3 where three major clusters emerge from the data (Figure 7.6). Therefore I decided to use a three-way distinction with labial (L), coronal (C) and dorsal (D) places of articulation (LCD), as is common in the phono-logical literature (cf. Paradis and Prunet 1991).7 The classification of IPA and ASJP symbols for the data in the ASJP database explained below is given in Table 7.1.

Table 7.1: Assignment of consonants to symbols (cf. Brown et al. 2008:26-28 for the ASJP orthography). All varieties of “click”-sounds have been ignored because they are lumped together in one symbol in the ASJP orthography and thus merge consonants with different place features.

LCD PTCK ASJP transcription IPA transcription L(labial) P p; b; m; f; v; w p,F; b,B; m; f; v; w C (coronal)

T 8; 4; t; d; s; z; c; n;

S; Z

T, D; n

ˆ; t; d; s; z; ts, dz; n;S;Z

C C; j; T; l; L; r; y; 5 Ù;Ã; c, é; l;Ï,í,L; r, R; j;ñ

D (dorsal) K k; g; x; N; q; G; X;

7; h

k; g; x, G; N; q; G; X, K,è,Q;P; h,H,

Additionally, analyses may also differ concerning the possibility of ignoring certain consonants with regard to their place specification. Twaddell, for instance, in his study of German word forms considered each of the consonants /j, h/ (which he calledinitials because of them being restricted to the initial syllable position) and /r, l/ as a separate group with respect to his analysis of place combinations. In the following analyses, they are treated under their respective place categories.

In each of the tables below, the occurrences of consonant pairs in different positions have been analyzed in terms of the place features which the individual consonants have.

In line with previous research in this area (e.g., Pierrehumbert 1993), I distinguish between different values that are of relevance for the statistical evaluation of the results.

These values can both be calculated for individual consonant combinations as well as for feature categories for which consonants are subsumed under a category and thereby contribute to the calculations. First of all, the number of times the respective feature combination has been counted in the data is given as the observed frequency. This observed frequency is compared to the expected frequency, which would be assumed

6C in the three-way distinction comprises bothT andCin the four-way distinction.

7The radical (or pharyngeal) consonants are grouped with the dorsals in this classification.

if there were no co-occurrence restrictions that would constrain the possibilities of combinations for consonants.

Let us go through an example to see how the expected frequency is calculated.8 In the example in Table 7.3 below, we find that, of the 2,010 combinations that have been extracted from the 1,005 triconsonantal Maltese roots (position one to two, and two to three) the category dorsal occurs in80 + 276 + 66 = 422 cases (all cells where dorsal is the second member of the combination). This is slightly more than 20%

of the total number of combinations considered. If there were an equal exploitation of combinations of categories in first and second position of the consonant pair, we would expect to find that about 20% of the combinations with any feature category in first position would have the dorsal category in second position. For example, we find that for 386 combinations with labial in first position in 80 cases the category dorsal is in second position, which is roughly 20%. Here, the number of combinations found is what we would expect under the assumption of independence. If we look at the category of dorsal consonants rather than labial in first position, however, we find 573 combinations with this category (127 + 380 + 66 = 573 cases in all cells where dorsal is the first member of the combination). Thus, we would expect to find 20% of those cases (120 combinations) to also have dorsal in second position. Yet we only observe 66 combinations of the dorsal category in first and second position. The observed frequency of dorsal-dorsal combinations of 66 is therefore much lower than the expected frequency of 120, which is exactly what SPA would predict for combinations of identical categories.

In general, the expected relative frequency of a combination is calculated by multi-plying the relative frequency of the category in first position by the relative frequency of the category in second position. The expected absolute frequency can then be ob-tained by multiplying the relative frequency by the number of combinations in total (2,010 in the example above). In our example, the absolute expected frequency for the dorsal-dorsal combination is573/2,010·422/2,010·2,010≈120. This value, however, is dependent on the number of combinations that are considered in total. In order to get an idea of the strength of the divergence between expected (E) and observed (O) frequencies, which is independent of the total number of combinations that are taken into account, we can also consider the proportionate discrepancy∆of both frequencies, which is computed with the following formula:

∆ = 100·O−E

E (7.1)

This discrepancy would be−45%for the dorsal-dorsal combination in the example above. In the case of negative discrepancies, i.e., when the expected frequency is lower than the observed frequency, this value is bounded to a lower limit of −100%. Yet in the positive case, there is no upper boundary on the discrepancy as the observed frequency can be higher than the expected frequency and thus∆higher than+100%.

In some studies (cf. Pierrehumbert 1993), an alternative value, the ratio of observed and expected frequencyO/E, is used, which suffers from the same disadvantage that it is not bounded to an upper limit. In the case of negative dependencies, this value is

8See Section 3.2 for a similar example.

between0and 1. If the observed frequency is higher than the expected frequency, the value is more than 1, with no upper limit. In what follows, the ∆value will be used as its sign already indicates the tendency and can therefore more quickly be assessed.

In contrast to the discrepancy values described above, theφcoefficient is normal-ized as it always falls in the interval of[−1; 1] (see Section 3.2) and has various other properties that make it more appropriate for a statistical evaluation. In the tables below, I also include theφvalue as it will be used later as the distance between conso-nants in the dissimilarity matrix on which the clustering and MDS methods are based.

In addition, each category combination is tested for its significance with aχ2 test for two binary variables with one degree of freedom. If the χ2 value is above 10.83, the combination is highly significant (p <0.001; marked with three asterisks ***); if it is above 6.64, it is significant with p <0.01 (**); finally, if it is above 3.84, it is signif-icant with p < 0.05 (*). The interpretation of this value is that the null hypothesis of independence of these combinations is rejected with the probability that this result could have been obtained by chance being less than one in a thousand (for p<0.001).9 7.2.2 Maltese results

In this section, I present the results for SPA on the list of Maltese verbal roots. Maltese is a Semitic language, which has been in intensive language contact with (Sicilian) Italian and English in the course of its history. To the best of my knowledge, Maltese has never been investigated with respect to co-occurrence restrictions as has been the case for other Semitic varieties.10 The following results are based on a recently compiled comprehensive list of 1,958 verbal roots, including recent borrowings that have been fully assimilated into the root-and-pattern system of the Semitic type (see Section 3.5 for a more information on how the data have been collected). The items in the list have been subdivided into three different classes: 3-consonantal (or triliteral), 4-consonantal and weak (involving a glide as one of the three consonants) roots. The bulk of the present investigations will be restricted to the class of triliteral roots, which contains 1,005 items, excluding weak triliteral roots with a glide (w or j) as one of their consonants.11

I opted for a three-way distinction of Maltese consonants into the categories labial, coronal and dorsal as given in Table 7.2. The list of verbal roots is transcribed in the standard orthography of Maltese, which shows remnants of earlier stages of the language, but in general is reasonably close to a phonemic transcription with respect to place of articulation distinctions. Note that the “silent” consonants <g¯h> and <h>

are classified according to the place of articulation of the respective sounds which they

9Likewise, there is only one chance in a hundred (p<0.01) or one chance in twenty (p<0.05) this could have been obtained by coincidence.

10The only study that I am aware of is Frisch et al. (2004:211-212), who investigated the structure of a list of Italian loanwords in Maltese which have been fully integrated into the root-and-pattern system. They found that those words that have been borrowed from Italian are more in accordance with SPA than a comparable sample of Italian words, suggesting that the language tends to only

10The only study that I am aware of is Frisch et al. (2004:211-212), who investigated the structure of a list of Italian loanwords in Maltese which have been fully integrated into the root-and-pattern system. They found that those words that have been borrowed from Italian are more in accordance with SPA than a comparable sample of Italian words, suggesting that the language tends to only

Im Dokument The Induction of Phonological Structure (Seite 164-188)