• Keine Ergebnisse gefunden

In this chapter we have achieved a significant objective in the preparation of the detailed description of our model: we have identified Conceptual Semantics as a general theory of cognition which provides an integrated account for the cross-modal interaction between non-linguistic cross-modalities and language. Based on the description of Conceptual Semantics we have formulated further requirements for our model.

Conceptual Semantics is based on a cognitive architecture in which non-linguistic modalities and language interact at a representational level. The interaction of non-linguistic modalities with syntax is mediated by Conceptual Structure, a single and uniform level of mental representation that encodes both conceptual knowledge and linguistic semantics. Conceptual Structure interfaces with the representations of non-linguistic modalities as well as with syntax. All of these representations are

representationally encapsulated and project into Conceptual Structure via modality-specific interfaces. The role of these interfaces is to translate between the encodings of the different representational levels by applying a finite set of correspondence rules.

We have further outlined the view of Conceptual Semantics on the significance of thematic roles. According to Jackendoff, thematic roles mark prominent argument slots in the Conceptual Structure representation of verbal concepts. Our discussion of grounding has led us to the insight that discrimination and identification are im-portant tasks to achieve conceptual grounding in our model. Finally, cross-modal matching was introduced as the process by which cross-modal referential links be-tween concept instances are established bebe-tween concept instances from different modalities. As such, it constitutes an indispensable cognitive process for cross-modal interaction. We have argued that the compatibility of the concepts instantiated in different modalities is a key requirement for establishing cross-modal co-reference.

In the following chapter we shift our focus to language processing. We present symbolic constraint-based parsing as a suitable formalism for the integration of non-linguistic contextual constraints upon syntactic parsing and motivate an existing parser implementation as a suitable candidate for the natural language processing component in our model.

Constraint-Based Analysis of Natural Language with WCDG

A model for the influence of cross-modal context upon syntactic parsing requires a parser that is capable of receiving and processing external context information in some form or another. The majority of syntax parsers today, however, are informa-tionally encapsulated in the sense that they only accept linguistic input which they process based on their intrinsic linguistic resources. Those parsers that do permit to impose additional constraints on linguistic analysis typically employ unfication such that the additional constraints are added as hard constraints on linguistic analysis rather than as biasing preferences. The weighted-constraint dependency parser WCDG constitutes a notable exception in this respect. It comes with a generic interface that permits the inclusion of parser-external non-linguistic infor-mation into linguistic processing. WCDG is an attractive candidate for the parsing component in our model because its interface permits to influence linguistic decision making by introducing external, possibly non-linguistic information into the parsing process. WCDG is also based on weighted or graded constraints which, as we shall see, are highly suited for modelling linguistic and contextual preferences.

This chapter provides an introduction to WCDG and its approach to the analysis of natural language as a symbolic constraint satisfaction problem. While the preceding chapters focused on the development of the cognitive requirements for our model, this chapter sets out to identify further, more implementation-related requirements.

The primary focus in this chapter is on the derivation of the technical requirements for the parser component in the context of our modelling framework.

Section4.1 begins with an outline of the major differences between generation-rule-based and weighted-constraint parsers to motivate the use of WCDG in our model.

Section 4.2 describes WCDG’s relevant standard capabilities. Section 4.3 offers a discussion of why some of the central limitations of WCDG’s standard implemen-tation necessitate modifications to the implemenimplemen-tation in order to meet our specific modelling objectives. Section 4.4 summarises the central points in this chapter and lists the resulting conclusions. This chapter concludes Part Iof this thesis, and with it, the requirements collection process for our computational model.

53

4.1 Generation Rules vs. Constraints

Approaches to natural language analysis can be broadly categorised into two funda-mentally different classes, depending on their method for defining the set of accept-able solution structures: generation-rule-based approaches and constraint-based approaches. The majority of the existing parser implementations follow a generation-rule-based approach. Generation-generation-rule-based systems span open the space of well-formed sentences based on a set of generation rules.

Constraint-based systems, on the other hand, constrain the set of all possible struc-tures by excluding ungrammatical strucstruc-tures, leaving only the set of desired solu-tions. Importantly, therefore, the set of constraint-based systems not only comprises connectionist, i.e. non-symbolic, approaches but also symbolic constraint parsers.

Symbolic constraint parsers encode syntactic properties in variables and constrain the assignment of values to these variables by means of suitable constraints. In the following, we mean symbolic constraint parsers when we refer to ‘constraint-based systems’.

4.1.1 Generation-Rule-Based Parsers

A generation-rule-based parser tries to assess whether a given input can be gen-erated from a set of generation rules that stipulate the procedures for generating well-formed sentences. The result of the parser’s analysis is a Boolean decision on the grammaticality of the input with respect to the set of generation rules. If the input is classified as grammatical, the input’s syntactic structure resulting from the successful procedural application the generation rules is known as well.

Effectively, a generation-rule-based parser acts as a theorem prover attempting to prove if the input theorem can be derived from a set of axioms stated in the for-mal system constituted by its grammar rules and the additional information in the lexicon. If the input is generable from the rules, it is rated as grammatical, otherwise as ungrammatical.

In contrast with this binary decision on grammaticality, the human analysis of natural language also comprises the central ability to discernpreferences of accept-ability, be they syntactic, semantic or pragmatic. It is this capability that lets humans accept a given construct as perfectly grammatical in one context while re-jecting it as ungrammatical in another context (Crain and Steedman, 1985).

A natural language analysis application designed with the intent to model human language processing behaviour should therefore include the capability to discern de-grees of acceptability rather than just to categorise solution candidates as correct or incorrect. In a generation-rule-based system, however, the inability to derive a given input from the grammar and the lexicon cannot always be attributed to the violation of a specific grammatical axiom; the input simply cannot be deduced from the for-mal system constituted by the grammar and the lexicon as a whole. Consequently, a generation-rule-based system cannot provide detailed diagnostic information on specifically which property of the input was responsible for its classification as un-grammatical.

Another limitation of the generation-rule-based approach is its handling of unknown input. Even the largest of today’s grammars and lexicons are inevitably limited in their modelling scope and hence do not cover the totality of natural language expressiveness. To a generation-rule-based parser ‘outside of modelling scope’ is equivalent to ‘ungrammatical’. However, not every input that cannot be gener-ated by the formal system must necessarily be ungrammatical; unrestricted natural language abounds with multi-word expressions, metaphors, creative word or ex-pression formations whose underlying formation patterns are not always easy to predict. In rejecting input beyond the boundaries of the known as ungrammatical, generation-rule-based parsers are limited in their capability of handling unknown input robustly. Given the high productivity of natural language, the constructive handling unknown input is a key feature for the robust processing of unrestricted natural language input.

4.1.2 Symbolic Constraint-Based Parsers

Symbolic based systems approach the task of parsing as a constraint-satisfaction problem over the assignment of values to variables representing syntac-tic properties. The degree of complexity of the represented features depends on the formalism. In the case of WCDG, the words in the input sentence form the nodes of a constraint net whose edges correspond to the dependencies between words. Every node and every edge corresponds to a variable in the constraint system to which a value needs to be assigned. Well-formedness rules define the permissible relations between words hence act as constraints upon the values that can populate the edges in the constraint net. Edge values violating constraints are removed from the con-straint net until no further restrictions can be imposed. The remaining edge values in the constraint net describe the set of structures classified as grammatical with respect to the constraint set. This approach was first described byMaruyama(1990).

In analogy to the ancient Roman legal guideline Nulla pœna sine lege1 a constraint-based parser will admit every solution candidate as correct unless it violates a well-formedness rule in the grammar. The set of constraints therefore needs to be specific enough such as to admit only grammatical sentences and general enough such as not to exclude acceptable structures from the solution set. A major advantage of this approach is the system’s robustness to unknown input. Every structure, includ-ing those which have not been considered by the grammar-writer, can pass as an acceptable solution as long as it does not violate a given structural constraint.

A significant difference compared with generation-rule-based systems for language analysis is that constraint-based parsers can also provide very specific feedback on which constraints in its grammar are being violated by a given input structure. Be-cause of this, constraint-based systems are good candidates for providing diagnostic support in language analysis.

1‘No penalty without a [corresponding] law’

Finally, the constraints defined in the grammar all apply to a solution candidate simultaneously rather than sequentially. This aspect makes the evaluation of con-straint satisfaction in concon-straint-based systems amenable to parallel processing.

Harper and Helzermann (1995, p. 199) review a number of implementation efforts aimed at parallelising constraint-dependency parsing.

An important refinement to the constraint-based approach outlined so far is moti-vated by the insight that the well-formedness rules do not all contribute equally to the acceptability of the overall solution structure. The constraint-based systems de-scribed so far cannot yet express degrees of preference amongst solution candidates.

Graded acceptability assessments can be incorporated by expressing the severity of a violated well-formedness rule as a numerical weight. In case of a constraint vi-olation, rather than removing the structural candidate from the set of acceptable solutions altogether, we can retain the candidate structure as a potential solution and assign it a penalty score for each constraint that it violates. As an example, a sentence may well contain a determiner-noun incongruence and still be acceptable overall while the absence of a full verb may result in a much more severe degrada-tion of its grammatical acceptability. A weighted constraint formalism is capable of expressing such graded acceptability ratings.

Weighted constraint-based parsers typically define a measure for the overall accept-ability of a solution candidate as a function of the constraint-violation penalties it incurs. This overall measure allows the system to rank solutions and compare their acceptability against each other. The most preferred solution is the one with the best overall acceptability rating. To identify the most preferable, i.e. the least penalised, solution candidate in the potentially very large search space, we require a search algorithm that provides complete coverage of the search space. In case a complete search is infeasible due to the sheer size of the search space, we need to employ an efficient search heuristic to identify a local optimum as our preferred solution candidate.