Discussion - A Computational Model for the Influence of Cross-Modal Context upon Syntactic Pars

syntactic level of analysis via correspondence rules. In summary, this experiment has shown for the CIA that propositional semantic context information effects syn-tactic modulations mediated by a shared level of semantic representation.

Apart from these structural modulations affecting verb valence, Table 9.2 lists an-other structural difference as a result of context integration: the systematic absence of the AGENT dependency between Slot.1 and Slot.2 in the context-integrated struc-tures. The reason for this is also a direct consequence of context integration via hard constraints, albeit a somewhat less obvious — and, in terms of the integrity of the semantic analysis, a less desirable one.

The observation shows that context integration has an adverse effect on the semantic analysis of the introductory main clause for which no information had been included in the context model. The absence of the incoming AGENT dependency on the verb with semantic valence ag thmay appear surprising at first glance. Firstly, it seems easy to fix, namely by just assigning the missing AGENT dependency between Slot.1 and Slot.2. Secondly, the absence incurs a comparatively hard constraint violation penalty of 0.1 which persists unremedied throughout all binary and ternary situ-ation parses.

Still, the parser curiously prefers not to assign the missing AGENT dependency. The only plausible explanation for this observation is that the assignment of the de-pendency would give rise to an even more severe constraint violation. Indeed, the observed preference arises from the fact that the inclusion of an AGENT dependency would cause a hard constraint violation on the AGENT integration constraint. This hard constraint violation results from the PPC veto on this specific relation which, in turn, has been assigned on the basis of information in the context model. The mechanism via which this veto is imposed is as follows: The word ‘er’ he in Slot.1 grounds the concept hewhich has been modelled as a rather general concept in the ontology:

he≡personal.pronounumaleusingular

For most concepts in our ontology no assignment of natural gender or number has been made. Nor have any superclasses corresponding to gender or number been defined for the classes instantiated in the context models (see Appendix V.1for the detailed context models used in this experiment). Since personal pronouns can refer to any type of entity, be it concrete or abstract, animate or inanimate, we have not defined any disjoint classes in the ontology for the class personal.pronoun. This conceptual underspecification is responsible for the fact that in most sentences personal pronouns have several cross-modal matches in the context model, some of which less obvious than others. Once a thematic relation has been asserted for one of those matches, the PPC imposes a veto on all other thematic relations.¹ This is a general property of our model: A dependant in the input sentence can only engage in those semantic dependencies which are equivalent to the thematic relations that have been asserted for its cross-modal matches. All other semantic dependencies for that dependant are vetoed by the PPC. The fact that the AGENT dependency

1See Sections7.3and 7.4for details on the PPC’s algorithm for cross-modal matching and relation scoring, respectively.

between Slot.1 and Slot.2 is missing throughout the context-integrated parses is a direct consequence of the fact that a cross-modal match for ‘er’ was found in every context model. We have confirmed the successful cross-modal matching of ‘er’ he by diagnostic output from the PPC.

As regards the number of structural candidates in WCDG2’s hypothesis space, Figure 9.4 shows that the PPC’s introduction of hard penalties on a number of semantic dependencies effects a drastic reduction of the size of the hypothesis space.

This is in line with expectation for hard integration since WCDG does not include candidate structures in the hypothesis space that give rise to a violatation of hard integration constraints. The hypothesis space for empty context integration, which we list for comparison, reflects the size of the hypothesis space in the absence of contextual influences.

We further observe that both binary and ternary contexts give rise to a similar num-ber of structural candidates. In our view this is due to the relatively large similarity of the context models integrated. In our model, the number of structural candidates that are eliminated from the hypothesis space as a result of hard context integration depends on the following factors:

1. The number of cross-modal matches for each word

2. The number of words with cross-modal matches in the sentence 3. The number of thematic relations asserted in the context model

Despite the drastic size reduction of the hypothesis space, Figure 9.3 shows that the average processing times under hard context integration were longer than under default conditions — which, at first sight, may seem counterintuitive. It may seem more reasonable to expect that a smaller hypothesis space should also make it easier, i.e.: faster, to locate the optimal solution.

Since both the default and the context-integrated parses are evaluated on the same constraint set, a difference in the constraint base upon which the evaluation is per-formed in the different conditions can be ruled out as a possible cause for the obser-vation. Figure 9.4 shows that – in line with longer processing times – the number of constraint evaluations also increased under context integration, i.e., WCDG2 had to evaluate more structural candidates in order to arrive at the global optimum.

With solution candidates removed from the hypothesis space due to their violating a hard constraint, transformation pathways to the optimal solution can become ob-structed — or in some cases even blocked completely. As outlined in Section 4.2.4 frobbing gradually modifies the best known solution in its search for the global opti-mum. In the course of this process, frobbing only attempts those transformations that do not incur excessively severe constraint violations. Frobbing will therefore not be able to progress to the global optimum directly when the best known solu-tion candidate is separated from that optimum by interim transformasolu-tion structures that areunacceptably bad. While the global optimum may still be reached via other round-about pathways through the hypothesis space, longer processing times are required to compute the additional interim structures along those alternative path-ways. In some cases frobbing may even fail to find the global optimum altogether.

Moreover, there is another way in which hard integration affects the progress of frobbing: Frobbing always attempts to remove the most severe containt violation in a solution candidate first (see Section 4.2.4). The list of constraint violations therefore provides important guiding information for the direction that the frobbing process takes through the hypothesis space. As WCDG rejects structures that vio-late hard constraints as invalid, none of the interim structures in frobbing will contain hard constraint violations. Consequently, frobbing under hard integration faces the challenge that none of the interim structures may violate an integration constraint.

Under hard integration frobbing hence has to proceed without the guiding infor-mation of which integration constraints were violated and thus may take longer to locate the local optimum.

The presented experimental evidence supports the view that hard context integra-tion forces frobbing to perform addiintegra-tional structural transformaintegra-tions – and hence constraint evaluations – in order to find the global optimum. As a result, pro-cessing times increase under hard integration, despite the reduction in size of the hypothesis space. The default analysis reflects preferences arising from the entire constraint base. In order to arrive at the non-default analysis, some of these prefer-ences need to be overridden by the integration constraints.

The reason for why binary context integration takes longer to process than ternary context integration is because the default context for the studied sentences is struc-turally almost identical with the parse obtained from ternary context integration.

It is therefore likely that the interim structures evaluated during ternary context integration are very similar to those for the default analysis. Binary context integra-tion, in contrast, produces a parse output that is structurally significantly different from the default parse such that different interim structural candidates need to be evaluated by frobbing.

To wrap up this discussion, let us briefly address the degree of conceptual specificity with which the context models in the experiment have been designed. We acknowl-edge that it is a significant idealisation to assume that the output of the process of visual understanding will be a representation that contains instances of concepts which precisely correspond to the concepts activated by the linguistic input. In our view it is indeed highly unlikely in most cases that visual understanding can provide representations that are conceptually so fine-grained as to differentiate between very similar situation instances in the same way that language can. This holds true in particular for cases in which there are no top-down expectations regarding the classi-fication of the observed visual scene. As an example, consider the ternary visual context models for sentences VK-151 and VK-306 in Figure 9.6.

These context models contain instantiations of the concepts jmd.etw.schicken and jmd.etw.senden, respectively. In the ontology, these concepts are modelled as disjoint. Semantically, however, these concepts are so closely related that they are even rendered by the same verb in the English translations. It is highly unlikely in any case that a visual observer would be able to tell ajmd.etw.schicken situation from a jmd.etw.senden situation by visual inspection alone.

VK-151 ‘Er wusste, dass die Bergsteiger der Referentin die Warnung schickten.’

He knew that the mountaineers sent the speaker the warning.

mountaineer.m01 ^isAGENT f or

−−−−−−−−−−−→ jmd.etw.schicken01

speaker.f01

isRECIPIENT f or

−−−−−−−−−−−−−−→ jmd.etw.schicken01

warning01

isTHEME f or

−−−−−−−−−−−→ jmd.etw.schicken01

VK-306 ‘Er wusste, dass die Managerin der Unternehmerin den Vertreter sendete.’

He knew that the manager sent the entrepreneur the sales rep.

manager.f01 ^isAGENT f or

−−−−−−−−−−−→ jmd.etw.senden01

entrepreneur.f01 ^isRECIPIENT f or

−−−−−−−−−−−−−−→ jmd.etw.senden01

sales.rep.m01

isTHEME f or

−−−−−−−−−−−→ jmd.etw.senden01

Figure 9.6: Ternary context models representing the scenes described in the sentences VK-151 and VK-306, respectively.

A more realistic approach to modelling the representations from visual understand-ing, in our view, must accommodate concept generalisation and perceptual uncer-tainty. Our model permits to approach this modelling challenge by instantiating conceptually underspecified concepts as illustrated in Section6.4.2. An experimen-tal validation of this approach is given in Chapter11 which addresses the influence of grounding and conceptual specificity upon our model’s capability to achieve syn-tactic disambiguation. Suffice it to say for now that our model is indeed capable of exploiting the ontological properties of the concepts involved such that context-modulated syntactic disambiguation can be achieved, even under integration of con-ceptually underspecified context models. We will see in due course that the type of syntactic ambiguity to resolve determines the degree of permissible conceptual generalisation that we may adopt in the representation of visual context. For the contextual resolution of ambiguities affecting verb valence, such as the genitive-dative ambiguity, situation arity is vital information to be extracted from visual context. Syntactic disambiguation can be achieved as long as this information is provided.

As a final remark we need to comment on the cognitive plausibility of hard context integration in this experiment. As outlined above, a context compliance of 1.0 in this experiment enforces that any solution acceptable to the parser must have a semantic representation which is compatible with the context model. This is another way of saying that a context compliance of 1.0 enforces an absolute dominance of visual context information upon the semantic analysis in the linguistic modality.

We can, of course, easily conceive a number of situations in which visual context information should be subordinate to linguistic interpretation. Typical examples would be conditions of limited visibility, the presence of unknown or unidentifiable

entities in the visual scene or cases of visual ambiguity, such as in a snapshot of a dynamic, potentially bi-directional event which makes it impossible to tell in which direction the scene is evolving. It would be cognitively highly ineffective if humans always integrated visual information with the same strength at all times. More to the point, the degree to which humans rely on visual information to support their linguistic processing is dynamic and adjusts situation-specifically. With the introduction of the modelling parameter context compliance, we have incorporated precisely this aspect as an important feature in our model.

Im Dokument A Computational Model for the Influence of Cross-Modal Context upon Syntactic Parsing (Seite 182-187)