• Keine Ergebnisse gefunden

This chapter concludes the description of the individual components making up the CIA. We have provided a comprehensive description of how the PPC, as the central component in the CIA, uses non-linguistic information from the semantic repre-sentation of cross-modal context to compute its contextually-informed dependency score predictions.

PPC processing starts with the pre-processing of the linguistic input received from WCDG2. Based on the three lexical features normalisation, semantic valence and number, homonyms from the input sentence are assigned a set of conceptualisations from the T-Box. In the subsequent process of cross-modal matching, the PPC maps each homonym in the input sentence to a set of concept instances asserted in the context model. A cross-modal match is established if and only if at least one of the homonym’s conceptualisations is conceptually compatible with the concept instan-tiated in the context model. The overall flow of cross-modal matching in the PPC is summarised diagrammatically in Figure 7.6.

Figure 7.6: Overall process flow in the PPC.

Prediction scoring is triggered if both dependant and regent homonyms have cross-modal matches. For homonym pairs in which both the dependant and the regent have a cross-modal match, the PPC admits the semantic dependency that cor-responds to the thematic relation asserted between the cross-modal matches in the context model. Based on the closed-world assumption, all other semantic dependen-cies originating from the same dependant slot or directed towards the same regent slot are vetoed. When the dependency score predictions have been computed for all eligible homonym pairs, the PPC returns these predictions to WCDG2 where they can be accessed homonym-specifically by the integration constraints in the role-assigning grammar.

A significant strength of our model is that the steps in the process of establishing cross-modal referential links between words and concept instances in visual context are essentially language-independent. Language-independence is also a central claim that Jackendoff makes for Conceptual Structure. While the individual features used in linguistic grounding may be language specific in our model, the overall process which connects input words to conceptualisations via linguistic bottom-up ground-ing and checks these concepts for compatibility with the concepts instantiated in visual context, generalises to languages other than German.

Model Validation and Conclusions

135

The argument for our model of the cross-modal influence of visual scene context upon linguistic processing presented in this thesis is structured into three main parts: Part I was dedicated to the identification and formulation of modelling re-quirements. Part II served the purpose of providing a detailed specification of the model as well as an in-depth discussion of the extent to which the requirements from Part I have been met by our model implementation. The third part of the thesis now addresses the empirical investigation and validation of the implemented model and discusses the model’s behaviour under different experimental conditions.

Each of the subsequent chapters has a specific experimental focus. Chapters 9, 10 and 11build upon each other in that each subsequent chapter releases one simplify-ing assumption that was maintined in the precedsimplify-ing chapter or chapters. Chapter8 describes a pre-experiment to the actual study of our model’s behaviour. As con-text integration is mediated by the semantic level of analysis, the model’s integration success crucially depends on the quality of semantic analysis. We therefore evaluate the effect that adding a semantic level of analysis has upon the quality of syntactic analysis in WCDG in the absence of any contextual information. Chapter 9reports the first actual integration experiment with the CIA: We demonstrate the technical feasibility of context-driven syntactic modulations by enforcing an absolute dom-inance of visual context over linguistic analysis. This is achieved by integrating visual context information into linguistic analysis via hard integration constraints.

Chapter 10discusses the effect of relaxing the integration constraints while leaving all other experimental conditions unchanged. In Chapter 11 we remove the sim-plifying assumption that visual and linguistic representations be of the same level of conceptual specificity. The chapter examines how conceptually underspecified representations of visual percepts can still contribute to syntactic disambiguation in the linguistic analysis. We also discuss an experimental investigation into how con-cept instantiations of different concon-ceptual specificity vary in their ability to induce syntactic modulations under context integration. Chapter 12, finally, concludes this thesis with a summary of the central claims, a collection of the conclusions we draw and an outlook to future directions of research that arise from the work presented here.

Semantic Grammar Evaluation

This chapter describes two pre-experiments in preparation of the actual study of our model’s context integration behaviour. We evaluate the effect that the addition of semantic levels of analysis in WCDG2’s extended grammar has upon the quality of syntactic parsing. We also evaluate the extended grammar on two other corpora and select sentences for the subsequent study of context integration phenomena. As regards the overall line of argument for our model, this chapter takes a preparatory function to motivate the selection of the studied linguistic material in the forth-coming context integration experiments.

This chapter is structured into two main sections that correspond to the evalu-ations of the extended grammar we conducted: Section 8.1 describes the extended grammar evaluation on 1,000 sentences from the NEGRA corpus. Section8.2reports the evaluation of the extended grammar on three smaller sets of globally ambiguous sentences that were extracted from a psycholinguistic examination and the SALSA corpus.

8.1 Evaluation on the NEGRA Corpus

8.1.1 Experimental Motivation

In the preceding chapters, we have argued extensively for a context-integration model based on the propagation of non-linguistic context information into syntactic analysis via a shared semantic representation. The interaction between the semantic and the syntactic representations in this model is enabled by correspondence rules in the syntax-semantics interface. Ideally, this interface should propagate referentially relevant context information into syntactic representation while remaining neutral with respect to syntactic analysis in case of referentially unrelated contextual asser-tions. These modelling aspects have previously been captured as Requirements R5 and R7. Before we set out to study the effect of non-empty context models on syntactic analysis in the following chapters, we need to understand whether the ad-dition of semantic processing has an influence on syntactic analysis in the absence of a contextual bias.

139

8.1.2 Approach

To see whether the addition of the semantic levels of analysis in WCDG2 has an in-fluence on syntactic analysis, we compare the syntactic parsing accuracy of WCDG2 under integration of an empty context model with the results obtained for WCDG1 on the same corpus. An empty context model contains no assertions of concept instances or thematic relations.1 We refer to the WCDG2 parse runs as Experi-ment 1.1. For convenience of expression, we use the general termaccuracy to denote parsing quality which, more accurately, is quoted in terms of the standard measures precision, recall and their resulting f1-measure. For our evaluations, we parse sen-tences from the NEGRA corpus (Skut et al., 1997;NEGRA Homepage, 2006). The NEGRA corpus is a standard corpus of German which has been used extensively for parsing evaluations of WCDG1 on previous occasions. We compare our results against the accuracies reported by Foth and Menzel (2006b) and Khmylko et al.

(2009) for WCDG1 evaluations on the same set of sentences.

8.1.3 Setup

We parse sentences 18,602 to 19,601 from the NEGRA corpus. This set of sen-tences was also used in the reference evaluations of WCDG1 by Foth and Menzel and Khmylko et al. The sentences are parsed with WCDG2’s extended grammar under integration of an empty context model. Evaluations are performed against a manually corrected version of the gold standard annotations. Manual correction removed some known orthographic mistakes and amended a few obvious annotation inconsistencies. Following the practice adopted in the cited prior work, we report the structural and thelabelled measures precision, recall andf1-measure. The structural measures refer to edges that have been structurally correctly attached, irrespective of whether they have been labelled correctly. The labelled measures refer to edges that have been correctly attached and correctly labelled. We evaluate parsing accuracy on all sentences with and without punctuation marks to ensure the comparability of our results with prior work. While Foth and Menzelreports parsing accuracy for all edges including those originating from punctuation marks, Khmylko et al. ex-clude those edges from their evaluation. The latter approach is becoming standard evaluation practice nowadays.

8.1.4 Results

Of the 1,000 sentences in the parsed corpus subset, WCDG2 was found to pro-cess only 865 sentences to completion. For the remaining 135 sentences, the parser aborted processing for technical reasons prior to completion. An analysis of the num-ber of the tokens per sentence reveals a clear trend: with an average of 34.8 tokens

1Note that the effect of integrating an empty context model with respect to the context-driven mod-ulation of syntactic dependencies is equivalent to parsing with the extended grammar without invoking the PPC at all. An empty context model contains no potential referents for the homonyms in the input sentence. Consequently, an empty context model offers no cross-modal match candidates and hence does not give rise to any constraining PPC predictions.

Figure 8.1: A plot of sentences processed against the number of tokens per sentence for the studied 1,000 NEGRA sentences under empty context integration.

per sentence1, the sentences that were not processed to completion were considerably longer than the average sentence in the studied corpus subset with 16.4 tokens. The average length of the 865 sentences that did process to completion was 13.9 tokens.

The plot in Figure 8.1 clearly illustrates the increasing tendency of WCDG2 to fail for sentences longer than approximately 20 tokens. The graph plots the sentence counts against the number of tokens for the 1,000 sentences processed. Colour-coding distinguishes between sentences that were processed to completion (green) and sentences that were not processed to completion (red).

We suspect that processing for the latter sentences requires more working memory than was available on the standard hardware used. Another possibility might be that the implementation of WCDG2 contains a memory leak whose adverse effect remains unnoticed for sentences requiring moderate processing effort but becomes noticeable in more complex analyses. The system errors received for the sentences that did not process to completion did not permit to determine the exact cause.

Further investigation is warranted here to determine the exact cause of WCDG2’s failure to process these sentences to completion.

We report precision, recall and f1-measure for the 1,000 NEGRA sentences in Table 8.1. The WCDG1 evaluation results including punctuation marks are quoted from Foth and Menzel (2006b), evaluation results excluding punctuation marks are quoted from Khmylko et al. (2009).

1All measures quoted here include punctuation marks as tokens.

WCDG1 WCDG2

Punctuation + Punctuation – Punctuation + Punctuation –

str lbl str lbl str lbl str lbl

Recall [%] 92.5 91.1 91.3 90.0 65.0 63.1 63.8 61.6

Precision [%] 92.5 91.1 91.3 90.0 90.4 87.8 88.8 85.9 f1-Measure 92.5 91.1 91.3 90.0 75.6 73.5 74.2 71.7 Table 8.1: The structural (str) and labelled (lbl) results for 1,000 Negra sentences with WCDG1’s standard grammar and WCDG2’s extended grammar. Evaluation results including and excluding punctuation marks are listed separately (Punctuation + and Punctuation –, respectively).

8.1.5 Discussion

Compared with WCDG1’s syntax-only analysis, the extended grammar results in an overall degradation of syntactic parsing quality both with regards to precision and recall. The drop in recall to a value substantially lower than for the standard grammar in WCDG1 is drastic but not surprising in view of the fact that WCDG2 did not complete processing for 135 longer-than-average sentences. A comparison of the parsing precisions for syntactic analysis in Table8.1 shows that the addition of semantic to the syntactic analysis only reduces precision by 2.1% to 4.1%.

Considering that no attempt has been made in this research project to optimise the role-assigning grammar for full coverage of unrestricted input, we consider these pre-cision values on unrestricted text encouraging, even if they fail to meet the overall expectation of matching or superceding the challenging baseline set by WCDG1. It should be kept in mind that the role-assigning grammar has been developed with the objective to ensure correct syntactic and semantic analysis for a small set of specific sentences, typically considerably less complex than most of the Negra sentences stud-ied in this evaluation. WCDG1’s standard grammar, in contrast, has been improved continually over a period of years with the express goal of achieving a substantial coverage of German. The large differences between the good precision values and the disappointing recalls are an accurate reflection of the extended grammar’s history:

the grammar achieves good grammatical precision but suffers from limited coverage.

In conclusion, this evaluation has shown that the addition of the semantic levels of analysis in WCDG2 results in an overall degradation of syntactic analysis quality compared with WCDG1. With good to very good precisions that almost reach the level of the standard grammar, and a significantly lower recall, the primary issue to address in our model’s grammar for full compliance with Requirement R7 is the extended grammar’s coverage. To achieve the required improvements, significant further grammar modelling effort is needed. We estimate the additional modelling effort to be in the order of magnitude of one to three person years.

To achieve robust and wide coverage of German at a level comparable to that of the syntactic analysis of WCDG1, any effort to improve the grammar also needs to include a systematic validation of WCDG2’s implementation integrity. Specifically,

it needs to be ensured that the inability to complete the 135 sentences, most of which longer than the average in the corpus subset, was not caused by a memory leak as this could nullify the benefits expected from further grammar development.

With respect to the selection of linguistic stimuli for the further study of our model’s context-integration behaviour, we conclude that the selection of arbitrary linguistic stimuli from a corpus of unrestricted natural language is not a viable option with the present version of the role-assigning grammar. To be able to predict and analyse our model’s context integration behaviour systematically, we hence need to study context integration on sentences for which correct syntactic and semantic analysis has been ensured prior to context integration. The following section discusses the selection of suitable linguistic input for our context integration investigations in the subsequent chapters.