• Keine Ergebnisse gefunden

1.2 Models of Speech Perception and Word Recognition .1 The TRACE Model of Speech Perception

1.2.4 The Featurally Underspecified Lexicon Model

As the original Cohort model (Marslen-Wilson, 1987; Lahiri & Marslen-Wilson, 1992) was a localist rather than a distributed model, also the Featurally Underspecified Lexicon (FUL) model (Lahiri, 1999; Lahiri, 2000; Lahiri & Reetz, 2002; Lahiri & Reetz, in press) states that each morpheme has a single, unique lexical representation. This representation consists of hierarchically structured features that make up the segments of the words. There are no clear-cut boundaries between segments, as these are not present in uttered speech either. During speech processing, the perception system extracts rough acoustic features from the signal and transforms them into phonological features. These are mapped directly onto the featural word-representations at the lexical level. There is no conversion into segments and therefore no intermediate representation. The strength of the FUL model is that it represents the lexical items in a way that it can cope with a lot of variance in the acoustic signal and is supported by linguistic theories in that it can explain diachronic and synchronic language-phenomena. This is accomplished by not fully specifying all possible features of the phonemes. That means, for each phoneme in the signal, the mental lexicon stores sufficient features to clearly identify and distinguish it from all other phonemes. However, features that are redundant and can be derived by rule, or features that can vary for instance due to segmental or prosodic context or dialectal and speaker characteristics are not stored in the mental lexicon. They are underspecified. What features exactly are represented is determined by universal properties and language specific requirements. Therefore, the same segment can have different lexical representations in different languages (Lahiri & Marslen-Wilson, 1992; Winkler, Lehtokoski, Alku, Vainio, Czigler, et al., 1999) and the representation of a segment can undergo change, as the language itself changes over time (Ghini, 2001).

In the process of word recognition all features are extracted from the speech signal, regardless of whether they are represented in the mental lexicon or not. Then these features are compared to the lexical entries of all morphemes in the mental lexicon and all items that are compatible with the extracted feature information are activated. Usually, models of language processing distinguish between a match and a mismatch in lexical access. A match means that the feature in the signal is the same as the feature in the lexical representation. A mismatch occurs if a certain feature is extracted, but the lexicon contains a different feature that is incompatible with the extracted one. In case of a match, the lexical candidate receives activation, while in case of a mismatch, the lexical item is not activated or – if it has been activated before – it is removed from the cohort of

possible word candidates. The FUL model extends this binary logic into a ternary matching logic and adds the case of a nomismatch. Such a nomismatch condition is created either if no feature is extracted from the signal although there are features stored in the lexicon, or if a feature is extracted from the signal but not represented in the lexicon, i.e. there is an empty slot. In case of a nomismatch the item stays in the cohort of possible candidates, but receives less activation than in case of a full match between signal and representation. A scoring formula allows ranking of the candidates, depending on their goodness of fit:

Score = (Nr. of Matching Features)2 / [(Nr. of Features from Signal) x (Nr. of Features in Lexicon)]

For example, the place of articulation of a phoneme can be either labial, coronal or dorsal. As phonemes are perceived, the respective features [LABIAL], [CORONAL] and [DORSAL] are extracted from the acoustic signal. However, only the features [LABIAL] and [DORSAL] are assumed to be represented in the mental lexicon, while the feature [CORONAL] is underspecified2 and thus the slot for place of articulation stays empty in the mental representation. For example, as a labial phoneme (e.g.

/b/) is perceived, the feature [LABIAL] is extracted from the signal and mapped onto the mental lexicon. It will match the lexical representation of a /b/ with a specified labial place of articulation. It will mismatch with the representation of a /g/, which is specified for dorsal place of articulation. As a labial /b/ is mapped onto the representation of a coronal /d/, this leads to a nomismatch, because the feature [CORONAL] is not stored in the mental representation and consequently the feature [LABIAL] from the signal is mapped onto an empty place of articulation in the mental lexicon. Consequently a labial /b/ activates both, the lexical representation of a /b/ and of a /d/, the latter to a lesser extent than the former due to fewer matching features. In the opposite case, when we perceive a /d/, the feature [CORONAL] is extracted from the signal and mapped onto the lexicon. It will mismatch both, the representation of a /b/ and a /g/, because the feature [CORONAL] mismatches with the respective features [LABIAL] and [DORSAL] in the mental lexicon. The feature [CORONAL] is not represented in the mental lexicon and hence there is a nomismatch in terms of place of articulation between a /d/ in the signal

2 Several phenomena lead to the assumption that the feature [CORONAL] is not specified at a lexical level. [CORONAL] seems to be the default place of articulation in many languages, a coronal sound is far more likely to assimilate to non-coronal places of articulation than vice versa, coronal consonants are phonotactically less restrictive (they allow for more

combinations than consonants with other places of articulation) and within one language, coronal sounds can split up into several contrastive phonemes with different places of articulation (palatoalveolar, palatal, retroflex). See also Lahiri (2000) and Steriade (1995).

and a /d/ in the lexical representation. Consequently, there is an asymmetry in lexical activation. A coronal phoneme can only activate lexical entries of other coronal phonemes, while a non-coronal phoneme can activate its own representation as well as the representations of coronal phonemes. With this ternary matching logic, the system over-generates possible word candidates but it still removes impossible ones from the cohort. Since not the segments per se are stored, but their abstract phonological features, some feature-information in the incoming speech signal can be missing or influenced by phonological context, and still the listener is able to correctly identify the input word. This is very convenient in the case of place assimilation that leads to surface variation, something that frequently happens in speech production. In regressive place assimilation, the place feature of a coronal phoneme is assimilated to a following non-coronal place of articulation. For instance, the coronal /n/ in ‘Where could Mr. Bean be?’ is often pronounced as a labial /m/ because it is followed by labial /b/: ‘Where could Mr.

Beam be?’). The reverse is not usually true, that is, a non-coronal like the /m/ in the utterance ‘lame duck’ would not assimilate to the following coronal (*‘lane duck’).

Since the coronal /n/ in ‘Mr. Bean’ is unspecified for place, the lexical representation can be activated by both ‘Mr. Bean’ and ‘Mr. Beam’. A fully specified system, such as Trace, Shortlist or DCM would lead to a mismatch in the latter case and needed two separate lexical representations for the two surface forms. A model like Shortlist B is able to cope with assimilation as it works based on probabilities and prior experience. Please note, that in the FUL model all coronal phonemes are underspecified for place of articulation, not only those that are subject to surface variation, as for instance word final coronal consonants. This means that the underspecification of the feature [CORONAL] is not entirely experience-based. Other theories assume a more graded form of lexical underspecification, where only coronal phonemes with non-coronal surface variants are underspecified in a given morpheme (Inkelas, 1994).

What exactly is stored in the mental lexicon according to the FUL model?

The lexicon contains phonological, morphological, semantic and syntactic information for each word. Only the information about features is used to find word candidates in the lexicon. All additional data help excluding unlikely candidates on a higher level of processing. These higher-level processes do not wait until they are fed with a few remaining word candidates. They operate in parallel with the basic feature mapping procedure right from the beginning of word perception (Lahiri, 1999; Lahiri & Reetz, 2002). However, the FUL model is most

explicit about the phonological aspects of the mental lexicon and we will restrict descriptions to these.

In the lexicon a segment is represented with a root node and its hierarchically structured relevant features. “This hierarchical representation reflects the fact that phonological processes consistently affect certain subsets of features and not others. Individual features or subsets of features are functionally independent units and are capable of acting independently. (…) Features are organised into functionally related groups dominated by abstract class nodes (such as place). The phonological features are the terminal nodes, and the entire feature structure is dominated by the root node (made up of the major class features like [CONSONANTAL] and [SONORANT]) which corresponds to the traditional notion of a single segment” (Lahiri, 1999, p. 251). The feature tree of the FUL model is depicted in Figure 1.1.

Figure 1.1: Feature geometry following the FUL model. Taken from Lahiri &

Reetz (in press).

The place features are split into three independent nodes: an articulator node containing Place of Articulation, a Tongue Height node and a Tongue Root node.

In the study on the processing of vowels in Chapter 3 we are particularly interested in Place of Articulation, but also Tongue Height is shown to play a role. It is still a matter of debate in linguistics whether vowels and consonants can be defined and processed in the same way (Chomsky & Halle, 1968; Sagey, 1986; Clements &

Hume, 1995; Halle, Vaux & Wolfe, 2000; Lahiri & Evers, 1991; Lahiri & Reetz,

2002, in press). In the FUL model the root node distinguishes consonants from vowels, but both share the same place features. All features are monovalent, meaning that they are either present or absent and no negative feature values are assigned (Lahiri & Reetz, 2002).

The FUL model has been successfully tested on many phonological and morphological phenomena (Ghini, 2001; Lahiri & Reetz, 2002; Obleser, Lahiri &

Eulitz, 2003, 2004; Eulitz & Lahiri, 2003, 2004; Wheeldon & Waksler, 2004;

Lahiri, Wetterlin & Jönsson-Steiner, 2005, 2006; Felder, 2006; Scharinger, 2006;

Kabak, 2007; Friedrich, Eulitz & Lahiri, 2008; Hannemann, 2008; Scharinger &

Zimmerer, 2009; Wetterlin, 2009; Zimmerer, 2009; Cornell, Lahiri & Eulitz, subm.; Felder, Jönsson-Steiner, Eulitz & Lahiri, subm.). Some of the empirical evidence pro and contra the FUL model will be reviewed in Chapter 3. Before, we will consider methodological aspects in Chapter 2, particularly those methods that have been used in the experiments reported later. Thereafter, Chapter 3 reports two experiments on the mechanisms of lexical access, particularly the question of whether there are inhibitory links between word candidates, as some models predict, and one experiment on the FUL model’s hypothesis of lexical underspecification in the case of vowels, thereby testing a case that is not typically experienced in everyday language use. Chapter 4 then extends this work to a suprasegmental level of lexical processing, investigating word accents in Swedish.

C HAPTER 2