• Keine Ergebnisse gefunden

expected to propagate from linguistic grounding into cross-modal matching (cf.

Section 7.2): a preference for one specific conceptualisation of a homonym should also result in a preference of the homonym’s cross-modal match. Future extension of our model should incorporate the ability to propagate semantic saliencies from homonym grounding into the process of cross-modal matching.

As outlined above, cross-modal matching on the basis of conceptual compatibility may result in cross-modal matching ambiguity in cases in which the mapping from homonym to concept instance is not injective. Since this seems to be the norm rather than the exception, natural systems apply additional criteria to reduce ambiguity in cross-modal matching. One factor to establish cross-modal matching preferences is the degree of conceptual fit: a homonym that matches several context entities will preferentially match that entity which exhibits the largest conceptual overlap with the homonym’s preferred conceptualisation. An implementation of this notion in our model would require a gradable measure of conceptual overlap in addition to a weighted representation of word meaning. At the level of implementation described in this work, neither of these has been included.

We finally need to explicate the minimal conditions under which our model can produce a cross-modal influence of visual context upon linguistic processing. Our model is based on the notion that only those semantic relations in visual context can give rise to dependency score predictions which have been asserted between entities that match homonyms in the input sentence. Our model thus requires at least two homonyms from different slots to have different cross-modal matches in order for the context model to be able to affect linguistic processing. Otherwise, the PPC cannot make a context-based prediction and all context-based predictions for semantic dependencies will default to unity. We now give a complete description of the PPC’s scoring algorithm in the following section.

As discussed in Section 4.2.5, a predictor influences the dependency assignment in the parser by penalising certain dependencies. Assigning an acceptance score of 1 as such does not have a constraining effect on the assignment of dependencies in the parser. To be able to derive constraints from visual context information, we need to adopt the closed-world assumption as introduced in Section6.4.1. Imposing a closure on information that has not been asserted in the representation of vi-sual context enables us to derive constraints from positive and negative evidence in the context model. As an example, consider two homonyms that match contextual entities between which no thematic relations have been asserted. The open-world assumption that normally applies in OWL reasoning does not permit to draw any constraining conclusions as long as no explicit negative evidence is available. Under the closed-world assumption, we can now infer that all semantic dependencies must be penalised between those homonyms. The closed-world assumption implies that the admission of a semantic dependency in linguistic processing is only possible based on the explicit assertion of the corresponding thematic relation in the con-text model. Conversely, we can infer from the absence of such an assertion that the thematic relation is not detected in the visual scene. The absence in the visual scene consequently motivates the veto of the corresponding semantic dependency in linguistic processing.

Situations evolve over time, and so does the perception of them. It is therefore a common occurrence that our knowledge of a visual scene increases or changes over time. In this context the question arises how the closed-world assumption is com-patible with incremental changes to the representation of visual context. There is no immediately apparent reason why a visual scene containing a number of partici-pants — say a person buying a book, which involves an AGENT and a THEME for the binary buying situation etw.kaufen — should not be expandable by another participant such as aRECIPIENT, at a later stage of observation. The later inclusion of an additional participant into the situation changes the the situation arity from binary to ternary. We fully acknowledge the cognitive reality of such incremental expansions of visual scene interpretations and their corresponding mental represen-tations. The point is that the addition of further participants to the interpretation of the visual scene gives rise to a semantically different scene interpretation and hence a different scene representation in its own right. In our model, a binary situation concept is treated as genuinely distinct from the corresponding ternary situation concept, even if both of them are lexicalised by the same verb.1 Conceptually, they differ because they describe situations of different situation valence, i.e., situations involving different sets of participants with a different number of mandatory ar-guments in their syntactic and semantic representations. This semantic difference is reflected in the concepts’ different syntactic realisations: Higher situation arity is typically realised syntactically in the subcategorisation of additional arguments on the situation verb. We expect that the instantiation of these different situation concepts in visual context should also give rise to different cross-modal interactions with linguistic processing.

1The lexicographic discussion of whether the transitive and the ditransitive form of a verb are indeed manifestations of thesame verb or of twodistinct verbs is outside of the scope of this thesis.

The CIA permits to model such extensions of situation valence over time, e.g., from ag th to ag re th as in the example above. However, it is not possible to model this extension as a single interaction between visual scene context and linguistic processing. Instead, we need to render the two different visual contexts as discrete and distinct representations. Each context model can then engage in a separate – and potentially different – cross-modal interaction with linguistic processing in the course of a separate parse run. Admittedly, this is a work-around since the change in contextual information does not effect a re-analysis of an existing parse in our model as would be the case in a natural system; rather, the effect of the temporally posterior context model upon linguistic processing is modelled as a completely new cross-modal interaction in an independentab initio parse run. Chapter11discusses our empirical investigations of the effect of situation arity in the visual scene upon the interaction between context and linguistic processing.

In conclusion, our model is based on the assumption that context models are infor-mationally complete representations of the output of visual scene comprehension.

By ‘informationally complete’ we mean that all semantic information required for the cross-modal interaction with linguistic processing is encoded in the represen-tation of visual context. Clearly, this is a modelling idealisation since real-world perception may be subject to uncertainty.

Entities and relations that have been represented can participate in a cross-modal interaction with linguistic processing while those that have not been represented cannot. Imposing the closed-world assumption on the representation of visual con-text permits us to formulate constraints on linguistic analysis based on both positive and negative evidence from visual context. Based on the closed-world assumption we can exclude any information outside of the context model from consideration in the cross-modal interaction with linguistic processing.

Given this general overview over the PPC’s use of positive and negative contextual evidence, we now provide a more detailed discussion of the complete set of inferences and underlying assumptions used in the PPC’s computation of score predictions between two homonyms. We then integrate the declarative description of the scoring process into a procedural context and describe the PPC’s scoring algorithm in its entirety. In our description we use the following symbols:

Si the i-th slot in the sentence, Hi.j the j-th homonym in Si,

t an arbitrary thematic relation in the modelling scope of WCDG2’s context models,

the empty, non-thematic relation which is assumed to hold between two entities in the context model for which no thematic relation t has been asserted,

T the set of all thematic relationst in WCDG2,

T the set of all thematic relations tin WCDG2’s modelling scope extended by the empty relation,

δ(t) the semantic dependency in WCDG2 that corresponds to the thematic relationt in the context model,

p(Hi.j, Hm.n, δ(t)) the PPC’s score prediction for the dependency δ(t) be-tween dependantHi.j and regent Hm.n,

v(δ(t)) the penalty score assigned in WCDG2 for the semantic dependencyδ(t),

M(Hi.j) the set of all concept instances in the context model that match Hi.j cross-modally,

M(Hi.j) an arbitrary element of M(Hi.j), and

θ(M(Hi.j), M(Hm.n)) a thematic or empty relation that holds betweenM(Hi.j) and M(Hm.n) in the context model.

As discussed in Section 3.4, a major postulate of Conceptual Semantics – and a direct consequence of Jackendoff’s Conceptual Structure Hypothesis (cf. p.38) – is that representationallyδ(t) and tare the same type of relation, namely semantic re-lations in Conceptual Structure assigned on the single and uniform level of semantic representation for both linguistic and non-linguistic input. In the description of our model we deliberately list them as separate entities since the two relation types are assigned in different technical components of the model: t-relations are asserted in the A-Box whileδ(t)-relations are assigned on the semantic levels of analysis in the parser. The assertion in different technical components, however, does not permit the conclusion that the assigned relations represent conceptually different relation types in our model. The fact that they are assigned in different components is the result of purely technical constraints in the implementation. The identity of these relations is encoded in our model; without this identity a cross-modal interaction between visual context and linguistic processing could not be achieved. We exploit the identity of these relations when assigning δ(t)-dependencies in the parser: The assignment is made based on PPC score predictions that reflect the result of queries to a context model containing onlyt-type relations. Consulting t-relation scores for the assignment of δ-relations is a meaningful procedure if and only if we assume t-and δ-relations to be of the same nature. This is the technical realisation of the central tennet of Conceptual Semantics, namely that cross-modal context and the semantic part of linguistic analysis project into the same semantic representation and consequently also make use of the same type of semantic relations between pro-jected entities.

Given θ ∈ T such that θ = θ(M(Hi.j), M(Hm.n)), the PPC draws the following inferences to compute its score predictions:

Inference 1. Veto all unscored semantic dependencies for this dependant-regent pair.

These dependency vetoes are based on the fact that only those seman-tic dependencies shall be admitted for which positive evidence has been asserted in visual context. If a given thematic relation from Thas not been asserted in the context model, we veto its corresponding semantic dependency in the linguistic analysis, provided that dependency has not been scored yet. This inference applies regardless of whether θ is a thematic or the empty relation. If visual context provides positive evidence for a thematic relation, this pre-assigned veto will be over-written in Inference 2. If no positive evidence is found to overrule this pre-assigned veto, the veto persists.

∀t ∈T: p(Hi.j, Hm.n, δ(t)) =NULL

=⇒ p(Hi.j, Hm.n, δ(t))←−v(δ(t))

If a non-empty thematic relation has been asserted in the context model between M(Hi.j) andM(Hm.n), continue with the following inferences:

Inference 2. Admit the semantic dependency that corresponds to the contextually asserted thematic relation.

The central idea of the PPC’s scoring policy is to admit those seman-tic dependencies for which positive evidence in the form of an asserted thematic relation from T could be found in visual context. Note that the admission of positive evidence in this step is performed regardless of whether any scores have been assigned to δ(θ) before. Previously assigned vetoes will thus be overwritten.

θ∈T

=⇒ p(Hi.j, Hm.n, δ(θ))←−1

Inference 3. Veto the reverse direction of the admitted semantic dependency, pro-vided it has not been scored yet.

This dependency veto is based on the fact that our semantic depen-dencies are unique in a given situation, i.e., a dependency can only be admitted once per situation. Admitting the semantic dependency in the forward direction inInference 2 therefore permits us to exclude the admittance of the same dependency in the reverse direction.

θ∈T ∧ p(Hm.n, Hi.j, δ(θ)) =NULL

=⇒ p(Hm.n, Hi.j, δ(θ))←−v(δ(θ))

Inference 4. Vetoall unscored semantic dependencies for dependants from the same dependant slot.

Only those semantic dependencies for the dependant slot can be ad-mitted for which positive evidence is found in the context model. All other semantic dependencies for the dependant slot are vetoed.

θ ∈T,∀t 6=θ,∀ k6=j,∀ o,∀ p: p(Hi.k, Ho.p, δ(t)) =NULL

=⇒ p(Hi.k, Ho.p, δ(t))←−v(δ(t))

Inference 5. Veto all unscored semantic dependencies for regents from the same regent slot.

Only those semantic dependencies with the regent slot can be admitted for which positive evidence is found in the context model. All other semantic dependencies with the regent slot are vetoed.

θ ∈T,∀t 6=θ,∀ k6=n,∀ o,∀ p: p(Ho.p, Hm.k, δ(t)) =NULL

=⇒ p(Ho.p, Hm.k, δ(t))←−v(δ(t))

The veto scores from Inference 2 to Inference 5 are inferred whenever a pair of homonyms has cross-modal matches between which a thematic relation has been asserted in the context model. The complete PPC scoring algorithm is given as pseudocode in Algorithm 1. Note that the inferred vetoes on the semantic depen-dencies can only be imposed because of the closed-world assumption. Essentially, we are using the positive evidence for θ to infer a whole range of other semantic de-pendency scores. With these inferences in place, our model meets Requirement R8 which demands that the cross-modal interaction between visual context and lingu-istic processing be based on a representation of the visual context information.

A point worth discussing is the scope of the vetoing we apply. Inference 4 and Inference 5 impose specific vetoes that allow the context model to have a powerful yet selective influence upon linguistic processing. Concretely, these two inferences leave p(Hi.j, Hm.n, δ(t)) untouched and veto all other semantic dependencies that originate from a dependant homonym in Si or that are directed towards a regent homonym inSm.

An alternative approach would have been to extend vetoing to all homonyms in the entire sentence such that only those semantic dependencies would be admitted for which a corresponding thematic relation has been asserted in visual context.

A crucial effect of this approach is that semantic dependencies are vetoed between homonyms which refer to entities entirely unrelated to the situation encoded in the context model. This constitutes a significant challenge when multiple situations are expressed in a single sentence, as frequently is the case in unrestricted natural language. A simple example is shown in Figure7.4. If vetoes were applied to all se-mantic dependencies across the entire sentence, a thematic relation asserted between she01 and null.tanzen01would have an effect upon the dependency assignment between ‘Er’ he and ‘beobachten’ observe.

Algorithm 1 The PPC scoring algorithm for semantic dependency scores.

Require: Sentence

1: fori= 1 to number of slotsdo

2: for j= 1 to number of homonyms in dependant slotSi do

3: form= 1, m6=ito number of slots do

4: for n= 1 to number of homonyms in regent slot Sm do

5: if M(Hi.j)6={} andM(Hm.n)6={} then

6: for allt∈Tdo

7: if p(Hi.j, Hm.n, δ(t)) =NULL then

8: p(Hi.j, Hm.n, δ(t))←−v(δ(t)) // Inference 1

9: end if

10: end for

11: for allD∈M(Hi.j), R∈M(Hm.n) do

12: θ←−θ(D, R)

13: if θ∈Tthen

14: p(Hi.j, Hm.n, δ(θ))←−1 //Inference 2

15: if p(Hm.n, Hi.j, δ(θ)) =NULL then

16: p(Hm.n, Hi.j, δ(θ))←−v(δ(θ)) // Inference 3

17: end if

18: for allHo.p in the sentencedo

19: for allt∈T do

20: for all Hi.k 6=Hi.j inSi do

21: if p(Hi.k, Ho.p, δ(t)) =NULL then

22: p(Hi.k, Ho.p, δ(t))←−v(δ(t)) // Inference 4

23: end if

24: end for

25: for all Hm.k 6=Hm.n inSm do

26: if p(Ho.p, Hm.k, δ(t)) =NULL then

27: p(Ho.p, Hm.k, δ(t))←−v(δ(t)) // Inference 5

28: end if

29: end for

30: end for

31: end for

32: end if

33: end for

34: end if

35: end for

36: end for

37: end for

38: end for

In principle there are two ways to address this challenge: We can limit the scope of the vetoes applied or we can choose to represent all situations expressed lingu-istically in the context model. The latter is undesirable for two reasons: 1) A sentence may express situations that are inaccessible to visual perception or that do not refer to the co-present visual context. 2) We do not wish to impose constraints

on the amount of visual context to be represented. In some cases, the visual infor-mation available at the time of sentence processing may be limited to one specific situation; in other cases, visual context may provide a plethora of visually accessible information with a multitude of observed thematic relations. In either case, visual information should only affect semantic dependencies between those homonyms that are directly or indirectly related to the entities between which the thematic relation is being observed. Bydirect relation we mean a cross-modal reference relation based on concept compatibility, byindirect relation we refer to a connection via the infer-ence mechanisms just outlined.

For our model this means that the effect of cross-modal context must remain neutral with respect to linguistic processing unless the asserted thematic relation θ holds between two concept instances D and R, respectively, such that D ∈ M(Hi.j) and R∈M(Hm.n). The example in Figure7.4 illustrates that there is no reason why the vetoes resulting from the AGENT-relation between she 01 and null.tanzen01 as asserted in the context model should give rise to a veto on the AGENT-dependency between ‘Er’ He and ‘beobachtet’ observes in the introductory main clause. The restriction of vetoing scope is hence a modelling decision of particular relevance to the processing of longer sentences. The latter are frequently encountered in unre-stricted natural language input and typically contain several verb forms, each of which require independent semantic dependency assignment.