• Keine Ergebnisse gefunden

Cross-Modal Interaction at Phrase and Sentence Level

in stimulus identification or due to an expectation-modulated interaction in stimulus discrimination — or a combination thereof — has not been answered conclusively.1 In analogy to the combination of bottom-up and tdown processes believed to op-erate during visual object recognition, we hypothesise that the McGurk effect also results from a convergence of bottom-up and top-down processes acting in parallel.

In the context of this thesis we classify the effect as a primarily sensory phenomenon which can experience top-down modulation under special conditions. The robust-ness of the effect in the absence of expectation- or knowledge-driven top-down effects further supports the interpretation in terms of a bottom-up integration. As such, we choose to exclude it from further consideration in our model of the influence of visual context understanding upon linguistic processing.

Summarising the cross-modal interactions between vision and language at word and sub-word levels, we can say that the Stroop effect and Cooper’s visual world experiments provide convincing evidence for the involvement of a semantic repre-sentation in the cross-modal interaction between visual and linguistic processing.

Cooper’s experiments suggest that the interaction between modalities is such that visual processing aims to identify entities in visual context which are conceptually related to the concepts activated linguistically. Huettig and Altmann refined this view to show that only semantic relatedness gives rise to the effect. The observa-tions of the Stroop effect suggest that the degree of conceptual overlap between the concepts processed in each modality has an effect on the ease with which certain tasks can be performed. For tasks exhibiting a Stroop effect, conceptual congruence results in task facilitation and conceptual incompatibility results in an interference.

The following section investigates the effect of non-linguistic information obtained from visual understanding upon the processing of more complex linguistic structures such as phrases and entire sentences.

other object with a similar name was present, subjects must have integrated the information from the auditory stimulus and the visual scene to accomplish object identification prior to hearing the end of the word, i.e., prior to completing the perception and processing of the respective auditory stimulus. Visual distractor ob-jects which exhibited no referential connection to the linguistic stimulus — such as a pencil for the sentencePut the apple on the towel in the box. — had no observable effect upon fixation patterns.

In their central experiment, Tanenhaus et al. presented subjects with locally am-biguous instructions of structure V NP PP1{PP2}. The prepositional phrase (PP1) could be interpreted either as a modifier to the sentence initial verb (V) or to the noun phrase object (NP). The local ambiguity was resolved either by the unfolding of PP2, in which case PP1 was interpreted as a modifier to NP, or by the end of the sentence, in which case PP1 was interpreted as a modifier to V.

If initial syntactic processing were modular in the Fodorian sense (see Footnote1on page 15) — and as such informationally encapsulated against visual scene informa-tion —, no effect of visual scene context on early syntactic processing and hence on eye fixations should be observable. Tanenhaus et al. found, however, that subjects’

fixation patterns differed significantly depending on whether the visual scene con-tained a single or two possible referents for the NP. In the case of a single potential referent for NP in the visual scene, PP1 was initially interpreted as a modifier to V.

This initial interpretation had to be revised when a PP2 subsequently followed. In the case of two potential referential candidates for NP in the visual scene, PP1 was always initially interpreted as a modifier for NP.

The authors interpret the observed eye movements as direct reflections of the progress of syntactic processing. The different fixation behaviours induced by the difference in visual contexts show that the same transient ambiguity of PP1 can give rise to different syntactic starting hypotheses. The authors interpret this as evidence for an access to visual context information during the earliest moments of linguistic processing. Their observations provide substantial support for the hypothesis of a close andcontinual interaction between visual and linguistic processing. A continual interaction between visual and linguistic processing is postulated by the proponents of strongly interactive models of sentence processing that contrast with the modular processing architecture suggested by Fodor.

From this most influential investigation we extract a number of modelling require-ments related to the interplay between visual and linguistic processing. The obser-vation of successive eye movements to linguistically relevant referents in synchrony with the unfolding of the linguistic stimulus shows that linguistic processing pro-gresses over time and is incremental.

Requirement R2

In a model for the interaction between visual context and linguistic understand-ing, linguistic processing must be incremental.

As revealed by the strong time correlation between eye fixations and linguistic pro-cessing, the interactions between the two modalities occur in close temporal align-ment.

Requirement R3

A model for the interaction between visual scene context and linguistic process-ing must be based on temporally synchronised interactions between the visual modality and linguistic processing.

The immediate interactions between visual and linguistic processing observed by Tanenhaus et al. support a strongly interactive model of sentence processing based on continual cross-modal interactions at parse time. These interactions enable what Tanenhaus et al. refer to as the “rapid and nearly seemless integration of visual and linguistic information”.

Requirement R4

A model for the interaction between visual scene context and linguistic processing must be based on continual interactions between non-linguistic information and linguistic processing.

Tanenhaus et al.’s experimental findings further support the view that the interaction between visual and linguistic processing is bidirectional. We capture this as two separate requirements, one for each direction of the interaction. Given the same syntactic material in different visual scene contexts, fundamentally different fixa-tion patterns were observed. This is clear evidence for the influence of visual scene context upon the early stages of linguistic processing.

Requirement R5

A model for the interaction between visual scene context and linguistic processing must include the influence of visual understanding upon linguistic processing.

The experiments also demonstrate the influence of linguistic upon visual processing:

the mention of linguistic entities in the auditory input immediately directed eye fixations to the corresponding referent in the visual scene.

Requirement R6

A model for the interaction between visual scene context and linguistic processing must include the influence of linguistic processing upon visual understanding.

From the fact that referentially unrelated visual distractor objects had no observ-able effect upon linguistic processing we conclude that referentially unrelated visual

information remains neutral with respect to linguistic processing.

Requirement R7

In a model for the interaction between visual scene context and linguistic pro-cessing, referentially unrelated visual context information must leave linguistic processing unaffected.