• Keine Ergebnisse gefunden

fragment usage

4.2 Information-theoretic constraints on language

4.2.2 Channel coding

4.2.2.2 UID effects on omissions

In order to illustrate how omissions in fragments may contribute to the opti-mization of utterances with respect to UID, consider again the taxi example that I discussed above. In this situation, a pedestrian hails a taxi, because he needs a ride to the university. In this simplified example, he can in principle choose between a full sentence (5a) and a fragment (5b) to communicate this message.10 (5) a. Take me to the university, please.

b. To the university, please.

In the taxi scenario it will be in general very likely that the passenger wants to go somewhere, so the material that is omitted in the fragment (take me) is very predictable. In contrast, it is unlikely that the driver knows the passenger’s

10Of course, this is highly simplified, because he could make use of a wide variety of different syntactic constructions and lexicalizations of fragments and sentences.

Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity

take me to the university please

Information

Figure 4.2: Hypothetical ID profile for the predictable sentencetake me to the universityand the meaning-equivalent fragmentto the university in the taxi scenario. The blue area illustrates the distribution of infor-mation in the fragment and the red area that in the full sentence.

destination, this destination is unpredictable and relatively informative. Figure 4.2, which shows the distribution of information over time,11illustrates this idea with hypothetical information density (ID) profiles for the fragment and the cor-responding full sentence.

If the pedestrian wants the driver to tell him the way to the university instead, he has to choose between the fragment in (5b) and the sentence in (6). In that case, tell me the way is probably less predictable thantake me, as Figure 4.3 suggests.

Of course, whether a word is predictable depends on properties of the utterance context. For instance, when an utterance like (6) is not produced by a passenger approaching the taxi, but by the driver of another car with a foreign license plate, it might be more likely that he would ask the local taxi driver for the way than that he wants to go somewhere. Similarly, when the passenger is brought to the university by the same driver on every Wednesday, or he wears a Denver Nuggets hat and shirt an hour before the match starts, the destination might be more likely and the utterance possibly even further reduced.

(6) Tell me the way to the university, please.

11The variable on the abscissa in principle is time, because channel capacity and transmission rates are defined as an amount of information transmitted per unit of time, e.g. in bits per second. In practice, however, specifically corpus-based work (see e.g. Levy & Jaeger 2007, Frank

& Jaeger 2008, Jaeger 2010) simplifies this to the amount of information transmitted per word, because duration measures for words or appropriate transcriptions into phonemes are not available in the corpora or would complicate the statistical analysis.

Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity Channel capacity

tell me the way to the university please

Information

Figure 4.3: Hypothetical ID profile for the unpredictable sentencetell me the way to the universityand the meaning-equivalent fragmentto the university in the taxi scenario. The blue area illustrates the distri-bution of information in the fragment and the red area that in the full sentence.

Figure 4.2 shows how the local information minimum, ortrough, in the ID profile that is caused by the redundanttake meis smoothed by omitting this expression.

From a UID perspective, omitting such predictable words optimizes the signal.

If these omissions target words that are obligatory in full sentences, this results in the preference of the fragment over the full sentence. In contrast, given the density profiles in 4.3,tell me the waydoes not yield a trough in the profile, hence there is no pressure to omit these words. Furthermore, its omission would result in apeakin the density profile that exceeds channel capacity.12Therefore, from the UID perspective, it is not beneficial to omit these words. Actually, even iftell mewas redundant, its insertion is preferred as long as it contributes to reducing the peak onto the university.

The tendencies to (i) to omit predictable expressions and (ii) to realize expres-sions that reduce peaks on upcoming material are the central predictions of UID on the well-formedness of linguistic expressions. Both of them are empirically supported by previous research, as Frank & Jaeger (2008) show for contraction in English, Kurumada & Jaeger (2015) for Japanese case markers, Levy & Jaeger (2007) for relative pronouns in English, Jaeger (2010) for complementizers, Nor-cliffe & Jaeger (2016) for relative pronouns in Yucatec Maya, Asr & Demberg

12Note that Figure 4.3 is not fully accurate, because I assigned the identical fragmentto the universitydifferent density profiles in the left and right panel for the purpose of illustration.

See below in this section for a discussion of this issue.

(2015) for discourse markers and Lemke et al. (2017) for articles. With the excep-tion of Kravtchenko (2014), who investigates the omission of subjects in Russian, these studies investigate semantically relatively vacuous function words. It is therefore reasonable to assume that UID constrains omissions in fragments too, but this does not necessarily follow from previous research. The finding by Tily &

Piantadosi (2009) that more predictable nouns are more likely to be pronominal-ized, i.e. reduced, points in a similar direction, if ellipsis as a more radical form of reduction of given material. Furthermore, the relationship between predictabil-ity and reduction has also been evidenced by studies which find that predictable expressions are more likely to be reduced in terms of duration and/or articula-tory effort both on the word level and on that of individual syllables (see e.g. Bell et al. 2003, Aylett & Turk 2004, Bell et al. 2009, Tily et al. 2009, Demberg et al.

2012, Kuperman & Bresnan 2012, Seyfarth 2014, Pate & Goldwater 2015, Brandt et al. 2017, 2018, Malisz et al. 2018).

Even though the concept is labeledUniform Information Density, at least in the version adopted in current psycholinguistics, the property of uniformity is an artifact of the assumptions made and not a goal of the encoding strategy in its own right. Uniformity only follows from the approximation of the transmission rate to the channel capacity, but a uniform distribution far below channel capac-ity will still be dispreferred compared to less uniform signals that make a more efficient use of channel capacity. This leads to an important distinction between the effect of peaks and troughs with respect to the choice between alternative ways of encoding a message. Troughs are inefficient and therefore always to be avoided, if possible. In contrast, peaks only dispreferred if they exceed channel capacity. In what follows, I will imply this interpretation of uniformity when stating that a signal is more or less compliant to UID, unless stated otherwise.

As for now, there have been no attempts to quantify channel capacity. This would not be a promising endeavor, because, as Shannon (1948) showed, channel capacity is not a constant, but varies as a function of the noise rate in the chan-nel. This indeterminacy of channel capacity is expected if UID is interpreted as a psycholinguistic constraint on communication. UID implies that speakers are engaged in audience design and adapt their utterances to expected properties (e.g. preferences and cognitive abilities) of the hearer, so this will necessarily in-volve inferences under uncertainty about channel capacity. Furthermore, from a methodological perspective, the absolute information estimate depends on the corpus used for this purpose. Since information is based on probabilities, a larger lexicon will result in lower average probabilities of individual items. What mat-ters for empirical research on UID is that, even if channel capacity is unknown,

on average, more informative words are more likely to yield a peak and more uninformative words are more likely to cause a trough in the ID profile.

Taken together, UID predicts that, given a set of possible signals, i.e. sentential and nonsentential utterances that can be used to encode a message and that com-ply with grammar,13the preferred utterance is that which distributes information most uniformly across the utterance. This leads to the two specific predictions in (7a,b), which in turn imply (7c): If omissions occur more often in predictive contexts, because average words are more likely, the signal will on average be shorter in such situations.

(7) Predictions of UID on fragments

a. Avoid troughs: The more likely a word is in context, the more likely it is to be omitted (within the limits of grammar).

b. Avoid peaks: Uninformative words can be inserted before very infor-mative words in order to lower the surprisal of the latter (within the limits of grammar).

c. Densification: Shorter signals, like fragments, are preferred in predic-tive contexts.