• Keine Ergebnisse gefunden

Both tasks of discrimination and identification can be performed convincingly by a suitably trained connectionist system. What is missing in a purely connectionist system is the capability to transfer the systematic properties arising from category membership to the non-symbolic level of representation. A connectionist system may well be capable of associating a given sensor input in the form of an iconic representation with a conceptual category, e.g. horse.1 A connectionist system, however, cannot assign the systematic properties of the categoryhorseas inherited in a hierarchy of concepts to the iconic representation of an instantiation horse01. In order to achieve such a rule-governed transfer of systematic properties, we re-quire a symbolic representation of the conceptual category that permits rule-based operations of symbol manipulation. A purely connectionist system has no symbolic level of representation and hence offers no such symbolic operations.

This discussion of the fundamental capabilities and limitations of symbolic and connectionist systems shows that both connectionist and symbolic systems can con-tribute important aspects to the complex tasks of grounding and systematic symbol manipulation — but neither of them can perform discrimination, identification and symbolic manipulations all on its own.

Harnadproposes to resolve the strict dichotomy between symbolic and connectionist systems by means of a hybrid architecture that combines a symbolic and a connec-tionist component to complement each other such as to compensate for the individual weaknesses. The suggested framework first processes sensory input in a connection-ist component such as an artificial neural network to discriminate and identify the sensory input. Systematic symbolic manipulation is then performed as a rule-based combination of grounded elementary categories. The result of these two steps is a systematically manipulable symbolic system grounded in sensory perception. Saf-fiotti and LeBlanc(2000),Coradeschi and Saffiotti(2001,2003a,b), andChella et al.

(2004) report successful implementations that ground symbolic representations in the sensory representations of real-world objects based on Harnad-like hybrid archi-tectures.

It is worth mentioning that, in the literature, the term symbol grounding is used primarily to denote the process of associating an abstract symbol with the sensory representation of a corresponding entity in the real world. The term anchoring, as formalised byCoradeschi and Saffiotti(2003a), is used in a similar fashion to denote the association of an abstract symbol with the representation of the corresponding world object over time. In principle, the association process between the real-world object’s sensory representation and its symbolic representation can procede bottom-up, top-down or in a hybrid fashion. In this thesis, we followHarnad(1990) in his view that“there is really only one viable route from sense to symbols: from the ground up”. We adopt Harnad’s term bottom-up grounding to denote the process of grounding in which a sensory stimulus is linked to its corresponding symbolic representation by bottom-up processing of the sensory input.

1We henceforth adopt the convention of representing concepts in small capitals suchexample, and concept instances by their indexed category label such asexample 01.

In summary, grounding representations of sensory perception requires two capabil-ities, namely discrimination and identification. Discrimination permits to evaluate the degree of similarity between two iconic sensory representations. Identification reduces iconic representations down to the corresponding categorical representations of distinctive and invariant features based on which the representatum can be cat-egorised as a member of a particular conceptual category. Once identification has been accomplished, the sensory stimulus is said to ground the corresponding con-ceptual category. In terms of Concon-ceptual Semantics, the stimulus can now project into Conceptual Structure as an instance of the identified conceptual category.

For our model of the cross-modal interaction between vision and language, we re-quire the capability of discrimination. In the visual modality, discrimination allows us to distinguish between the iconic representations of visual perception. In the linguistic modality, discrimination permits us to distinguish between the different tokens in linguistic input. Our model also requires the capability to perform iden-tification in order to achieve classification of entities in linguistic and visual input as belonging to one or more conceptual categories. We capture these aspects as modelling requirements R27 and R28:

Requirement R27

A model for the interaction between non-linguistic modalities and linguistic understanding must have the capability to discriminate individuating features of visual and linguistic input at sensory level.

Requirement R28

A model for the interaction between non-linguistic modalities and linguistic understanding must be capable of categorising sensory input in conceptual categories based on a set of individuating features.

With respect to our focus on the cross-modal interaction of representational modal-ities it is important to note the difference between assigning a meaning to tokens of natural language and the bottom-up grounding of a sensory stimulus representation.

For language, the categorisation of the initial sensory stimulus — be it auditory, vi-sual or haptic — results in the identification of a particular word. This word is an arbitrary linguistic symbol and has a range of associated lexical properties, one of them being the representation of word meaning as defined in the mental lexicon. Ac-cording to Jackendoff, word meaning is represented in terms of semantic structures of concepts and predicates in Conceptual Structure. In contrast to the processing of a sensory stimulus, an identified word does not project into Conceptual Structure directly. Rather, the meaning of the word first needs to be retrieved as a property associated with the identified symbol. As L¨obner (2003, Chapter 2) describes it, the retrieved word meaning is a conceptual expression that denotes a set of possible instantiations. It it this conceptual expression that is instantiated in Conceptual Structure. The processing of the representational modality hence includes an addi-tional decoding step in which the meaning of the arbitrary symbol is decoded. In essence, however, both processes result in the assignment of meaning to a sensory

Categorical Representation of a Sensory Stimulus

F eature

−−−−−−→

M atching

Identification of Category Instance Categorical Representation

of a Linguistic Stimulus

F eature

−−−−−−→

M atching

Identification of Word Instance

Access to

−−−−−−→

M eaning

Identification of Denoted Category Figure 3.5: The difference between grounding concepts in sensory and linguistic stimuli.

stimulus. We therefore will refer to the process of assigning meaning to a sensory stimulus encoding linguistic symbols as linguistic grounding while we refer to the process of grounding in sensory stimuli as sensory grounding. The difference be-tween processing a categorical representation in sensory grounding and in linguistic grounding is summarised in Figure 3.5.