• Keine Ergebnisse gefunden

In this chapter we have presented a number of significant experimental findings that permit important conclusions as to the interaction between the processing of visual stimuli and linguistic processing in the context of natural language under-standing. The Stroop effect demonstrated that the interaction between vision and language was automatic and mediated by lexical semantics. Cooper’s visual world experiments provided further evidence for a semantic mediation in the interaction between vision and linguistic processing. By careful control of experimental stimuli, Huettig et al. refined our understanding as to which kind of semantic interaction leads to cross-modal interaction with language: cross-modal interaction with language is mediated by semantic category similarity rather than by associative relatedness.

Tanenhaus et al. argue convincingly for a bidirectional, incremental and closely time-locked interaction between vision and language. Finally, Altmannprovides ex-perimental support for a representational view of the cross-modal interaction with language. All of these findings have been formulated as requirements for our model of cross-modal interaction of vision with language.

This chapter has also provided an overview over extant historical and more re-cent computational implementations of the interaction between vision and language.

Socher et al. use a Bayesian network to achieve reasoning capabilities in late cross-modal integration. Roy and Mukherjeereport a successful implementation of visual priming in speech recognition during incremental parsing. Mayberry et al. describe a successful – but presumably not scalable – connectionist system for anticipating binary thematic role assignment decisions during incremental sentence processing.

Finally, Brick and Scheutz provide a remarkable account of coupling robot action with incremental and contextually aware sentence processing. All of the discussed models, however, restrict themselves to the extraction of object information and spatial relations. They fail to utilise visual context to extract information about the thematic roles that the recognised entities take in the context of an observed situation. As for visual understanding, this constitutes another, higher level of com-plexity which is still to be integrated into modelling. Furthermore, none of the reported models are motivated by or integrated into a general, implementation-independent theory of cross-modal cognition.

In the next chapter, we set out to discuss a general theory of cognition that attempts to account for the mechanisms that enable a cross-modal interaction between non-linguistic and non-linguistic modalities. We intend to base the specification of our model upon requirements from these three sources: 1) the body of experimental findings presented in this chapter, 2) the discussion of extant models in this chapter and 3) the cognitive theory to be presented in the following chapter.

Conceptual Semantics — An Integrated Theory of Cognition

The interaction of non-linguistic modalities with language is a mental process that occurs quite effortlessly in our brains. We therefore know the effects of cross-modal interaction from our own experience. Yet, we are mostly unaware of the mechanisms that underly this interaction. The preceding chapter has provided important exper-imental findings about the interaction between vision and language. What we are missing to this point is a unified cognitive theory capable of providing an integrated account of the observed phenomena.

In this chapter we outline Ray Jackendoff’s theory of Conceptual Semantics in as far it pertains to the cross-modal interaction of non-linguistic modalities with language.

Conceptual Semantics takes a representationalist view of cognition and offers a per-spective on the interaction between non-linguistic modalities and language.

We begin this chapter with an argument in favour of representationalism as a pre-requisite to the discussion of Conceptual Semantics. In Section 3.2 we introduce important constituents of the cognitive architecture Jackendoff develops in the con-text of his theory of Conceptual Semantics. Section 3.3 discusses to what extent encoding is representation-specific. Sections 3.4 and 3.5 describe the elements for semantic representations by addressing Conceptual Structure and thematic rela-tions. Sections 3.6 and 3.7 outline the fundamental issues of grounding and cross-modal matching in the interaction between linguistic and non-linguistic cross-modalities.

Throughout the course of our discussion, we continue to identify further modelling requirements, now from the perspective of an overarching theory of cognition.

3.1 Representationalism

The questions whether our senses show us reality or just a filtered projection thereof has intrigued philosophers since antiquity. The systematic study of perceptual illu-sions has been used extensively to gain insight into how perception is represented and processed in our minds. In cognitive psychology, a large number of visual illu-sions are known that induce multistable or even apparently dynamic visual percepts as the result of visual ambiguity. Famous representatives from this class of visual

33

(a) The Necker cube. (b) Jastrow’s duck-rabbit. (c) Apparently rotating circles.

Figure 3.1: Examples of visual illusions in which a constant visual stimulus results in a multistable or even dynamic visual percept.

illusions are the Necker cube in Figure 3.1 (a), Jastrow’s duck-rabbit illusion1 in Figure3.1 (b), or the apparently rotating circles in Figure 3.1 (c). All of these illu-sions have in common that a temporally invariant, static visual stimulus produces a non-static visual percept.

The occurrence of perceptual illusions as such — and the observed perceptual dy-namics resulting from a static stimulus in particular — are important arguments to support the view that we do not actually perceive the world as it is but only the way that our senses tell us about this world. This characterisation of cogni-tion is advocated by the school of representationalism. Its central tennet is that human cognition and consciousness are based on internal mental representations of the world in the mind of the perceiver rather than the real world itself. While causally connected, the real world and its mental representation clearly are distinct from each other.

Mental representations are construed based on input from the sensory modalities in combination with the results of subsequent processing by the higher cognitive faculties. Cognitively experienceable in the view of representationalism is only what has been mentally represented before.

Other well-known, though not necessarily fully understood, cognitive phenomena providing support for a representationalist view of cognition are visual mental im-agery, dreams and hallucinations. For all of these, subjects report mental states that are very similar – if not identical – to the states that result from the regular sensory perception of the corresponding external stimuli. The mental image of one’s office chair, for example, is largely congruent with the actual visual percept attained when looking at that chair in the real world. Representationalism holds that the information about this chair is encoded as a mental representation. It is the very same representation that is activated irrespective of whether we are visually per-ceiving or just imagining that very chair.

1Wittgenstein(1953, p. 22) also discusses this illusion, albeit on the basis of a graphically somewhat simplified version. For this reason, some sources in the literature, e.g. Jackendoff(1983, p. 25), refer to the illusion asWittgenstein’s duck-rabbit.

In the subsequent development of our model for the interaction between non-linguistic modalities and linguistic processing we follow Jackendoff (and many others) in adopting a representationalist view of cognition. We consequently treat the rep-resentations of linguistic and visual understanding as representational modalities.