• Keine Ergebnisse gefunden

6.2 Representing Situation-Invariant Semantic Knowledge

6.2.3 Modelling Domain and Domain Modelling

The modelling domain of the T-Box is based on the concepts activated by the con-tent words in the set of structurally ambiguous sentences for which the influence of cross-modal context upon linguistic processing has been studied. A detailed de-scription of the sentences and the sources from which they have been extracted is provided in Section 8.2.

1We adopt this procedure as a general design guideline for restricting the use of relations in the T-Box: Rather than to impose a global domain or range restriction on a relation, we restrict the use of the relation via class-specific properties and hence localise the effect of the restriction. In this manner, we can formulate class properties that, for the members of this class, have the same effect as a global domain restriction on the corresponding relation. For instance, in asserting the cardinality restriction (has exactly 1,property,concept) as a class property, we achieve that members of this class must engage in exactly one property relation with a member from the class concept. As a result, members of the restricted class cannot enter apropertyrelation with a member of any other class – just as if a global range restriction had been imposed upon theproperty relation.

VK-011 ‘Er wusste, dass die Magd der B¨auerin den Korb suchte.’

‘Er’ gives rise to the concepthuman.m.

‘wusste’ gives rise to the conceptetw.wissenag th.

‘Magd’ gives rise to the conceptmaid.

‘B¨auerin’ gives rise to the conceptfarmer.f.

‘Korb’ gives rise to the conceptbasket.

‘suchte’ gives rise to the concepts null.suchenag, etw.suchenag th, and jmd.etw.suchenag re th.

Figure 6.3: The selection of content words for conceptualisation in the T-Box (underlined) from one of the studied globally ambiguous sentences.

We have conceptualised content words such as verbs and nouns as well as function words such as personal pronouns in the input sentences by the corresponding situ-ation and entity concepts in the T-Box. An illustrsitu-ation of this process is given in Figure 6.3. Instantiations of these T-Box concepts are modelled in the situation-specific A-Boxes to represent a disambiguating cross-modal context. A detailed description of situation modelling is provided in Section6.4. At the time of writing, the T-Box contains 427 classes, 310 individuals, and 14 relations. A representation of the asserted concept hierarchy in the T-Box is given in Appendix II.

On the first hierarchy level, the T-Box contains four structure concepts that cate-gorise entity concepts, helper concepts, meta data and situation concepts. The class entity.concept subsumes all concepts that can act as an argument to the rela-tion is ROLE f or.1 The class helper.concept subsumes lexicalised.concept which, in turn, subsumes all concepts that have been assigned a lexicalisation in the T-Box.

The classsituation.conceptis disjoint from the classentity.conceptand sub-divides into classes containing the unary, binary and ternary situation concepts of our model implementation. Each of these subclasses further subdivides into classes that contain situation concepts of the same situation valence only.

meta.datasubsumes the abstract concepts grammatical.numberandgender. The latter have been introduced to facilitate the adequate modelling of synttically relevant information. The inclusion of these concepts permits a more ac-curate semantic representation of visual scenes and thereby increases the speci-ficity of reference formation during cross-modal matching. Without the concepts singular and plural as subsumed by grammatical.number, concept instan-tiations would always be underspecified with respect to grammatical number as illustrated in Figure 6.4 (a).

The interpretation of 6.4 (a) shows that the omission of grammatical number from the representation of visual scene context results in a rather crude approximation of the visual scene contents. Cognitively, it is virtually impossible to conceive a scenario in which the accuracy of visual perception is so strongly degraded that information

1InA−−−−−−relationB we refer toAas therelation argumentandB as therelation value.

(a) man 01

is instance of

−−−−−−−−−→ man

(b) man 02

is instance of

−−−−−−−−−→ manu singular

(c) man 03

is instance of

−−−−−−−−−→ manu plural

Figure 6.4: Concept instantiations in our model representing (a) an unspecified positive number of men,(b)precisely one man and(c)several men.

about the grammatical – not the actual – number of concept instances observed can-not be extracted from the visual modality.1 Such unusual conditions may, perhaps, be encountered in the presence of extremely poor lighting or in extreme physical distance to the observed scene. At any rate, these are fringe phenomena of marginal importance to a general model for the interaction between visual understanding and linguistic processing. Even if we admit such percepts as possible – which in our model, we do – without incurring undesirable consequences for the more specific representations of visual scene context, it remains questionable how accurate the classification of individuals could be under such limited visibility conditions. As dis-crimination temporally precedes classification in perceptual bottom-up grounding (cf. Section 3.6), it is plausible that the grammatical number of participants can be assessed prior to their conceptual categorisation.

The reverse, i.e., the classification of participants without a precise knowledge of their grammatical number, seems improbable since grammatical number could, in principle, always be inferred from the number of concept instantiations that have projected into Conceptual Structure during classification. The representations in Figure 6.4 (b) and (c) illustrate how the concepts singular and plural can be employed to specify concept instances of well-defined grammatical number.

In our model, the expression of quantification and quantifier scope for concept def-inition is determined by the expressivity of the OWL language. For our modelling purposes, we express quantification as conjuncts of entity concepts with concepts from the class grammatical.number. Presently, this class only containts the subclasses singular and plural. Other types of quantification cannot be ex-pressed conceptually in the current version of our model. We consider Requirement R17, which demands the capability to express quantification and quantifier scope, partially implemented in our model. A comprehensive coverage of all facets of quan-tification is likely to require an extensive elaboration of the model.

Whilst the list of concepts subsumed by meta.dataclearly is incomplete (cf. Foot-note 4 on page 117), the inclusion of these concepts into the T-Box forms an im-portant first step towards a more precise representation of referentially relevant

1Due to the very restricted range of values that grammatical number can adopt in most languages, the perception of the grammatical number of concept instances clearly is considerably easier than the perception of the actual number. In German, the perception of grammatical number only requires the cognitive discrimination betweennone,one, andmany.

information in models of visual context. grammatical.number is an obvious candidate for inclusion in the T-Box since its manifestation is overtly detectable by sensory perception. As far as cross-modal reference to people is concerned, this argument can also be extended togenderin most cases. The experimental findings for context integration with sentence SO-9681 in Experiment 3.4 (to be discussed in Section10.5) also support the view that the inclusion of meta-data such asgender can further improve the specificity of bottom-up grounding and cross-modal match-ing. An analogous argument can be developed for grammatical.number.1