• Keine Ergebnisse gefunden

2.2 Fundamental Entities of Rhythm

2.2.1 The Perception of Fundamental Beats

The task of identifying fundamental beats in the acoustic stream is to detect the be-ginnings and ends of rhythmically relevant units6. The often continuously changing nature of auditory (speech) events shows that this mechanism is not self-evident, e.g. the acoustic representation of a sequence “lalalala”, it is difficult to detect the clear boundaries between various possible individual beats (cf. Figure 2.4). How-ever, the high frequency of usage of “lalala” in singing provides evidence for its usefulness to transmit a rhythmic impression to a listener. Thus, the listener is obvi-ously able to estimate the fundamental beats contained in the sequence and group these into a rhythmic pattern. In order to perceive auditory events as separate units, it is essential to hear when the separate events begin and when they end—even while they possibly partly overlap physically and acoustically. E.g. a vowel that is followed by a nasal consonant may carry traces of nasality without being perceived as part of the nasal — a normal phonetic phenomenon known as coarticulation.

Besides, in reality we often have to segment auditory events of much higher com-plexity, e.g. when different speakers talk simultaneously or an orchestra plays.

Bregman(1990) calls this cognitive processstream segregation. According to him, at an early stage of neural processing different auditory streams are segregated on the basis of the following principles:

• Sounds standing in no relation to each other, do not start or end simultane-ously.

• Sounds originating in a single source tend to change their frequency charac-teristics continuously rather than abruptly.

• Regular vibrations of a single source result in an acoustic pattern where the frequency components are integer multiples of the fundamental frequency.

6In Section2.1.6it has been discussed that plosives provide useful acoustic cues for segmenting a continuous speech signal into syllables. However, this is an insufficient strategy because syllable boundaries are not always marked by plosives.

Time (s)

0 2.10141

0 5000

Frequency (Hz)

l a: l a: l a:

Time (s)

0 2.10141

–0.08632 0.09586

0

l a: l a: l a:

Figure 2.4: This figure illustrates the difficulty of segmenting the continuous acoustic speech signal into meaningful chunks that make up the fundamental beats necessary for a rhythmic pattern. The top signal shows the annotated spectrogramme of the sequence “lalala”, the bottom signal shows the corresponding oscillogramme.

• Many changes occurring in an acoustic event affect all of its subcomponents of the resulting sound in the same manner and at the same time.

These effects combined lead to the phenomenon that in orchestral music or other types of polyphonic music, certain sounds coming from different instruments blend into a novel auditory image while others are perceived as different melodies, orig-inating from different voices or instruments. When a composer obeys the laws of contrapuntal composition, the different melodies will not blend, if she prefers a ho-mophonous style of composition, the auditory effect will be of one leading melody which is accompanied by the other instruments or voices. Similar effects, that some-thing originates in a different source, or makes up a different auditory/melodic stream, also govern the way that listeners segment the continuous time stream into chunks of quasi-similar events, e.g. fundamental beats. Althoughstream segregation is mostly concerned with the task of identifying different sound sources, e.g. speak-ers talking simultaneously, similar cues may be used by listenspeak-ers to segregate audi-tory events in time. For the segmentation of fundamental beats, the second principle seems to be fundamental, namely that “Sounds originating in a single source tend to change their frequency characteristics continuously rather than abruptly” (Darwin (1997)). In many sequences of speech, the places showing abrupt spectral change tend to be syllable boundaries, where the spectral characteristics of a voiced vocalic event with clear harmonics is often suddenly followed by a voiceless obstruent, con-taining noise rather than harmonics. However, our example above shows that such an abrupt change is not necessary for the detection of rhythmic events — here the spectral changes are smooth rather than abrupt, especially in the lower frequency regions.

Zwicker and Fastl(1999) propose a psychoacoustic model aimed to identify fun-damental rhythmic beats based on changes in loudness. When looking at the oscil-logramme of example2.4, we see a clear change in the intensity parameter. Thus, a chunking based on both intensity and spectral change may be quite promising.

Zwicker and Fastl(1999), however, suggest an identification of rhythm events based on the following simple rules:

• Only events of more than 0.43 relative to the the maximum loudness level are relevant

• The number 0.43 is used relative to the loudness level within a meaningful analysis window, e.g. a musical phrase

• The maxima detected with this analysis method must be separated by more than 120ms to be distinct

Their model indicates the number and location of beats, but does not calculate their beginnings and ends. This implies that beats are points in time rather than events with an inherent duration. Such a view is implausible, however, concerning the fact that in musical notation, beats are described as having duration. Alterna-tively, a point based view of beats implies their duration extension are limited by the point of occurence of a subsequent beat. When performing a thought experi-ment where an event identified as a loudness maximum is followed by a pause of an hour before the next beat occurs. It is highly unlikely that in this case, we would perceive the beat as lasting one hour.

In Figure 2.4 we can see that an loudness based approach to beat detection is promising as it identifies syllables in the acoustic stream, but it is clear that the dynamical intensity related aspects which can be regarded as more or less abrupt changes in the sense ofBregman(1990) play a role as well in the perception of beats.

Zwicker and Fastl (1999) point out that the perception of an event beginning may differ from its acoustic beginning as a function of its initial tuning curve. The flatter and gradual the tuning curve’s slope, the earlier is the beginning of the related event perceived. A preceding click may also influence the point of time, where the begin-ning of an event is perceived. We will see in the next section, that this phenomenon is also relevant for the detection of rhythmical event onsets in speech.