Ambiguity and Linguistic Preferences

(1)

Ambiguity and Linguistic Preferences

Gregor Erbach Universität des Saarlandes FR 8.7 Allgemeine Linguistik

Computerlinguistik Im Stadtwald W-6600 Saarbrücken

Germany

erbach@coli.uni-sb.de

Abstract

Attempting to treat ambiguities in typed feature formalisms presents a dilemma. The exploitation of linguistic knowledge for adding additional constraints (e.g. word order, selectional restrictions) to the grammar may indeed help disambiguation, but it also rules out some perfectly grammatical non-ambiguous strings. Ways out of this dilemma are discussed in this paper. They include

- reliance on processing strategies without addition of additional knowledge - processing guided by statistical probability

- leaving disambiguation to external knowledge sources

- using additional constraints only if they are needed for disambiguation.

We propose the addition of preference values to typed feature structures. Using preference values has the effect that violation of the additional constraints needed for disambigua tion only decreases the preference value, but does not make the sentence unacceptable. Disambiguation is achieved by selecting the reading with the highest preference value. We think that it is possible to define a processing strategy (preference- driven linguistic deduction) which finds the preferred reading first.

Introduction: Approaches to Ambiguity

The treatment of linguistic ambiguity in typed feature formalisms presents a dilemma. On the one hand, it is easily possible to express the knowledge needed for disambiguation (word order regularities, selectional restrictions etc.) in typed feature formalisms, and to use it for disambiguation. On the other hand, if these kinds of knowledge are added as additional constraints to natural language grammars, some unambiguous sentences will also be ruled out. We illustrate this dilemma with a simple example from German word order.

(1) Peter (nom/acc) küsste Maria (nom/acc).

Peter kissed Mary

(2)

Peter kissed Mary / Mary kissed Peter

In English, subject and object are distinguished by their position, whereas in German they are distinguished by case marking. The subject is in the nominative case, and the object in the accusative. Sentence (1) is ambiguous because both the subject and the object are proper names, which have the same us becauseer nameseh Am8 e subject and

(3)

Processing strategies that produce preferred readings first

The second approach to disambiguation has been the design of processing strategies that are intended to explore the search space in such a way that preferred readings are found before the less preferred ones. This is achieved by defining decision criteria for handling non-deterministic processing steps. These decision criteria can be statistical probability or structural considerations.

In Prolog-based grammar formalisms like DCG (Pereira and Warren 1980), the search can be controlled by ordering the clauses in such a way that the preferred clauses are tried first. (Haugeneder and Gehrke 1986) and (Erbach 1991) define arbitrary parsing strategies by assigning priorities to parsing tasks for a chart parser, based on statistics about previous parse results.

An example for the exploitation of structural properties for resolving non-determinism is deterministic parsing (Marcus 1980), where every choice that is made is deterministic in the sense that no alternatives are considered. Another example is Shieber's use of a shift- reduce parser to model attachment preferences by shifting in case of shift-reduce conflicts, and performing the "longer reduction"² in case of reduce-reduce conflicts (Shieber 1983).

The approaches mentioned have been applied in more traditional parsing frameworks, where the effect was mostly to order the rules of the grammar. With the increasing trend towards principle -based grammars, the ordering of rules does not make very much sense, because only very few rule schemata are used.

Controlled Linguistic Deduction (Uszkoreit 1991) adds control information to declarative grammars in typed feature formalisms, and allows to mix depth-first and breadth-first search by assigning preferences to disjuncts. The effect is to derive a set of preferred readings first, and to cut off unlikely paths in the search space. The preferences are based on the success potential of a disjunct, i.e., "the disjuncts that have the highest probability of success are processed first". For lexical ambiguity, preferences are assigned dynamically to disjuncts by means of a spreading activation net, based on "a combination of factors such as the topic of the text or discourse, previous occurrence of priming words, register, style, and many more."

The use of specialized processing strategies for disambiguation is appealing because the search space is reduced and the efficiency of the system increased.

The drawbacks of ordering readings by means of a processing strategy are that

• the criteria for ordering are hidden in the processing model and not specified in a declarative way. A change of the processing model may change the ordering of the analyses.

• Additional knowledge (like word order constraints or selectional restrictions) is not exploited for ordering.

Probabilistic approaches

This class of approaches is characterized by the fact that statistical probability, obtained from a language corpus, is used as the basis for resolving non-determinism. Probabilistic approaches are popular in speech understanding, but they have also been used for syntax

2The "longer reduction" is the reduction that involves a grammar rule with more elements on its right -hand side.

(4)

in the form of probabilistic context-free grammars (Fujisaki et al. 1991, Garside and Leech 1987). Probabilities are not used extensively in typed feature formalisms, although Controlled Linguistic Deduction (Uszkoreit 1991) makes use of statistical probabilities. A problem with probabilistic approaches is that the probabilities for different structures may differ in different contexts. This problem could in principle be attacked by making use of conditional probabilities, but it is not obvious which structural conditions these probabilities should be sensitive to.

Additional constraints for disambiguation

This class of approaches can be characterized as knowledge-based. An additional set of constraints is introduced, which need not be applied, but can be applied if needed for disambiguation. These additional constraints are linguistic generalizations about word order, selectional restrictions, quantifier scope defaults etc. Conflicts between different constraints may arise, for example between word order and selectional restrictions.

Therefore, a model which incorporates such additional constraints must provide for conflict resolution by weighing the constraints against each other.

Technically, the additional constraints can be formulated as disjunctions: either the constraint is satisfied or it isn't, which is a useless logical tautology unless the choice of the less constraining disjunct has the consequence that the corresponding reading is ranked lower in the ordering of the readings than a reading in which the more constraining disjunct is satisfied. The notion of preferences that we will introduce in the remainder of this paper addresses this problem.

Preferences and Feature Structures

In this section, an extension to typed feature formalisms is proposed, and illustrated with the grammar formalism STUF (Dörre and Seiffert 1991). The extension allows the declarative specification of preferences for feature structures.

The basic idea is that every feature structure is associated with a preference value. The preference value is intended to model the degree of confidence that the feature structure is an appropriate representation of a linguistic utterance. In the case of ambiguity, several feature structures can be found each of which has a preference value. The feature structure with the highest preference is the one which is given most confidence. The ordering imposed on the feature structures is just the numerical order of the associated preferences.

For example, the preference for a reading will decrease if it violates word order constraints or selectional restrictions. In our view, the effect of such constraints is to change the preference of a feature structure, but not to rule out such structures completely. In this way, these constraints can help disambiguation, without changing the language described by the grammar.

In STUF, a type is defined by a set of feature terms through defining clauses of the form A(t1,...,tn) ==> t0 where all ti are feature terms.

The preference value of a type is a function of the preference values of the feature structures defined by the feature terms t_i. The specification of this function is added as additional syntax to the formalism. The function must be defined by the grammar writer (or derived by analysis of a corpus), and cannot be built-in, because this function determines the weight that is given to various constraints. For example, violation of selectional restrictions might be considered worse than violation of word order constraints.

(5)

In order to demonstrate how the above is put to use, we illustrate our preference calculation scheme with some examples from (German) word order and selectional restrictions.

Example 1: Word Order

The adequate treatment of word order is still an open problem in linguistics. For the purposes of this paper, we consider two approaches: first, a model which assumes an unmarked (default) argument order, and gives deviations from this order a lower preference, and second a recent proposal that encodes LP rules in such a way that violation of an LP-rule results in unification failure (Engelkamp et al. 1992). HPSG is used as the linguistic background for the examples (Pollard and Sag 1987, Pollard and Sag in press).

For the first model, the unmarked argument order is the order of the elements of the subcategorization (SUBCAT) list³, as in the simplified lexical entry for the ditransitive verb gibt (give).

lexicon(gibt) ==>

synsem: local: (head: cat: v &

subcat: [np(acc), np(dat), np(nom)] ).

Elements of the subcat list are discharged by the SUBCAT-principle. We assume that only one element of the SUBCAT list is taken at a time, so that binary branching trees result.

subcat_principle ==>

synsem: local: subcat: Subcat &

dtrs: (head_dtr: synsem: local: subcat: insert(Comp_Dtr,Subcat) &

comp_dtrs: [Comp_Dtr] ).

The value of the relational type insert/2 is a list, in which the first argument is inserted at any place into the second argument, which is a list.

insert(H,T) ==> [H|T].

insert(H,[X|T]) ==> [X|insert(H,T)].

It is this relational type insert/2 which is responsible for the preference value of the sentence, with respect to deviation from the unmarked word order. The idea is that the preference value is highest if the first element is taken from the SUBCAT list of the head daughter. The further away the element is from the head of the list the element is, the worse the preference.

We write pairs of feature terms (FT) and preference values (PV) as FT^PV. Each type definition is followed by a formula which specifies how the preference for that type is calculated⁴. Note that anything not specified for a preference is assumed to have the preference value 1. Given below are definitions of insert/2, subcat_principle and phrasal_sign that involve functions for preference calculation.

3We use Prolog notation for lists. The elements of the SUBCAT list are given in reverse surface order to facilitate processing in head-final structures, where the head takes arguments from right to left.

4Pref, Pref0, Pref1, Pref2 denote variables for preference values. Pref is the preference value of the defined type. The notation is used for the purposes of this paper, and may be changed in a subsequent implementation of the ideas presented here.

(6)

insert(H,T) ==> [H|T]^Pref0 Pref = 1.0 * Pref0 insert(H,[X|T]) ==> [X|insert(H,T)^Pref0]. Pref = 0.8 * Pref0

subcat_principle ==>

synsem: local: subcat: Subcat &

dtrs: (head_dtr:synsem:local:subcat: insert(Comp_Dtr,Subcat)^Pref0 &

comp_dtrs: [Comp_Dtr] ).

Pref = Pref0

phrasal_sign ==>

head_dtr: X^Pref1 rule_schema &

subcat_principle^Pref2 &

head_feature_principle. Pref = Pref1 * Pref2

The preference calculation function in the definition of phrasal_sign is responsible for percolating preference values in phrase structures, which results in the following preference assignments for the constituents of permutations of sentence (3), which is given in the unmarked word order.

(3) (weil) der Mann dem Mächen das Buch gibt (because) the man (nom) the girl (dat) the book (acc) gives

gibt Pref: 1

das Buch gibt Pref: 1 (first element of subcat list) dem Mädchen das Buch gibt Pref: 1 (first element of the remaining SL) der Mann dem Mächen das Buch gibt Pref: 1

der Mann das Buch gibt Pref: 0.8 (second element of the remaining SL) dem Mädchen der Mann das Buch gibt Pref: 0.8

dem Mädchen gibt Pref: 0.8 (the second element of SL) das Buch dem Mädchen gibt Pref: 0.8 (first element of remaining SL) der Mann das Buch dem Mädchen gibt Pref: 0.8

der Mann dem Mädchen gibt Pref: 0.64 (second element of remaining SL) das Buch der Mann dem Mädchen gibt Pref: 0.64

der Mann gibt Pref: 0.64 (the third element of SL) dem Mächen der Mann gibt Pref: 0.64 (first element of remaining SL) das Buch dem Mädchen der Mann gibt Pref: 0.64

das Buch der Mann gibt Pref: 0.512 (second element of remaining SL) dem Mädchen das Buch der Mann gibt Pref: 0.512

Under the view presented here, word order constraints are not really constraints that are violated, but rather preferences for choosing one alternative clause in the definition of insert/2⁵.

Equivalently, the above treatment of unmarked word order can be handled in the lexical entry of the verb. In this case we assume that the SUBCAT principle always takes the first

5There may be other uses of insert/2, in which there is no preferred order. For these cases, an alternative definition of insert/2 must be written, in which all solutions have the same preference value.

(7)

element of the SUBCAT list. The lexical entry for the verb differs from the one above in that its SUBCAT value is the permutational closure of the SUBCAT list.

lexicon(gibt) ==>

subcat: permute([np(acc), np(dat), np(nom)])^Pref0

). Pref = Pref0

The closer the permuted list is to the original list, the better the preference value that the relation permute/1 assigns.

permute() ==> . Pref = 1

permute([H|T]) ==>

insert(H,permute(T)^Pref1)^Pref2 Pref = Pref1 * Pref2

As a result, we get the following pairings of SUBCAT lists and preference values:

[np(acc),np(dat),np(nom)] 1.0 [np(dat),np(acc),np(nom)] 0.8 [np(acc),np(nom),np(dat)] 0.8 [np(nom),np(acc),np(dat)] 0.64 [np(dat),np(nom),np(acc)] 0.64 [np(nom),np(dat),np(acc)] 0.512

The second approach to word order we want to consider encodes LP constraints by a set of binary features and makes use of an LP-store and left and right context restrictions (Engelkamp et al. 1992). The LP-store of a phrase indicates which constituents with properties that LP rules make reference to are contained in that phrase. The left and right context restrictions indicate which constituents may not come to the left or right, respectively. This approach is interesting for our treatment of preferences because it unifies the LP-store of a head with the context restriction of its complement to detect violation of LP rules by unification failure.

Let us consider the sentence weil den Lehrer der Schüler sieht (because the teacher (acc) the student (nom) sees) which violates the LP-rule nom < acc.

The LP-store value of the constituent der Schüler sieht encodes that a nominative NP, but no accusative NP is contained in it.

nom: + &

acc: –

The right context restriction of den Lehrer encodes that no nominative NP must occur to the right.

nom: – &

acc: TOP

In the original proposal, unification of the LP -store and context restriction would fail because the atomic sorts + and – do not unify. To overcome this problem, we assume that all sorts are associated with a preference. If two sorts are unified, the resulting subsort may have a lower preference.

We define the sorts + and – as primitive sorts with a preference value of 1.0, and the sort

± as the intersection of the sorts + and – with a preference value of 0.5, for example. The

(8)

unification of the LP-store with the context restriction contains the sort ±, which leads to a reduced preference value.

nom: ± &

acc: -

In this way, the operation of unification is also used in preference calculation. The preference value of the unification of two feature structures is the sum over all features of the weight of that feature multiplied with the preference value of the unification of the feature's values. The weight of each feature must be specified by the grammar writer.

Example 2: Selectional Restrictions

Selectional restrictions model the semantic compatibility of a functor and its arguments.

Functors like verbs or adjectives impose sortal restrictions on their arguments, for example the verb eat would require that the subject be animate and that the object be a concrete object. However, it is always possible to find counter-examples to selectional restrictions.

For the most part, they involve kind of metaphorical language use, e.g., institutions acting as persons, machines regarded as animate, information treated as flowing or travelling and so on. Hence, selectional restrictions cannot be treated as absolute constraints.

In practice, it might be a simple, but useful disambiguation strategy to prefer literal readings (in which the selectional restrictions of the functors are satisfied) over metaphorical readings (in which selectional restrictions are violated).

We illustrate this with the verb hate, which typically requires a human subject.

lexicon(hate) ==>

subcat: [np(acc), np(nom)&sel_rest(human)^Pref0]

).

Pref = Pref0.

The semantics of the subject NP must be compatible with the relational constraint sel_rest(human), which models a selectional restriction that can be violated, with a corresponding decrease in preference. The relational constraint is defined in such a way that it can be satisfied with a high preference if the constituent’s semantic content is identical to the value specified as the argument, and with a lower preference otherwise.

sel_rest(X) ==> synsem: local: content: X. Pref = 1.0

sel_rest(Y) ==> synsem: local: content: Z & Y Z. Pref = 0.5

The above approach to selectional restrictions can be refined to take metaphoric language conventions (Martin 1991) into account. For example, the interpretation of the sentence My car hates unleaded fuel can be interpreted by anthropomorphizing the car, i.e., by regarding it as human. A first approach would be to handle this phenomenon in the sort system, by introducing a sort thing_as_human, which is a subsort of thing and human, but has a lower preference. The preference resulting from the selectional restriction would then be the preference of the subsort of the sort of the object and the sort specified in the selectional restriction. A literal interpretation is always preferred in this way.

sel_rest(X) ==> synsem: local: content: X^Pref0. Pref = Pref0

(9)

This approach is problematic because it does not take into account the unidirectionality of metaphorical relations. Not only does the subsort of ^human and ^thing allow to view things as human, but it would also allow to view humans as things.

In order to avoid this effect, we define a relational type for selectional restrictions, which assigns different preference values for literal and metaphorical interpretations.

sel_rest(human) ==> synsem: local: content: human. Pref = 1.0 (literal interpretation)

sel_rest(human) ==> synsem: local: content: animal. Pref = 0.7 (metaphorical relation: animal as human)

sel_rest(human) ==> synsem: local: content: thing. Pref = 0.5 (a bit worse: thing as human)

sel_rest(human) ==> synsem: local: content: TOP. Pref = 0.1 (anything goes, but with very bad preference!)

While these suggestions do not attempt to capture the whole intricacy of metaphorical interpretation, they show how such phenomena may be exploited for disambiguation.

Conclusion

We have presented an extension to typed feature formalisms in order to allow the declarative specification of preferences. The ordering of ambiguous readings is achieved by giving each reading a numerical preference.

The preference mechanism proposed here makes it possible to model knowledge needed for disambiguation either in a very simple-minded way or by adding additional knowledge, as the examples concerning selectional restrictions have demonstrated, where one can either simply punish the violation of a selectional restriction or explain why selectional restrictions are violated by making reference to metaphorical language conventions. The choice of the two approaches will depend on the intended application and on the knowledge available about the phenomenon in question.

For the formal semantics of the extended formalism, it would be interesting to examine whether preferences can be interpreted as probabilities. In that sense, it seems quite plausible that word order constraints or other rules that linguists have come up with are just statistical generalizations over observed language use. On the other hand, statistical probability values for a particular disjunctive constraint that have been obtained by a corpus analysis can be used as preferences in the absence of deeper linguistic insights into the phenomenon in question.

Any processing strategy for typed feature formalisms will produce the same solutions, and the same preference values. However, we want to develop a processing strategy that exploits the information contained in the preferences (preference-driven linguistic deduction). The basic goal of a preference-driven algorithm is to achieve a high preference, and it is this goal that determines the behaviour of the algorithm in case of non-determinism.

(10)

The algorithm will always choose the disjunct that promises the highest preference value.

In cases where a preference cannot be calculated because variables are not instantiated, the algorithm will first try to fulfill the prerequisites for preference calculation, thereby achieving some of the benefits of information-driven linguistic deduction, e.g. head-driven parsing (Kay 1989).

The introduction of preferences into feature structures has applications in NLP beyond the treatment of ambiguity. The most important are the choice of paraphrases in generation, and processing ill-formed input, if some of the constraints that define grammaticality are made less constraining, with a corresponding decrease in preference.

(11)

References

Backofen, Rolf and L. Euler (1990). Towards the Integration of Functions, Relations and Types in an AI Programming Language. Proceedings of GWAI 90, Springer.

Dörre, Jochen and A. Eisele (1989). Determining Consistency of Feature Terms with Distributed Disjunctions. Proceedings of GWAI 89, Springer.

Dörre, Jochen and R. Seiffert (1991). Sorted feature terms and relational dependencies.

IWBS Report 153. IBM Germany, Stuttgart.

Eisele, Andreas and J. Dörre (1990). Disjunctive Unificatio n, IWBS Report 124. IBM Germany, Stuttgart.

Engelkamp, Judith, G. Erbach and H. Uszkoreit (1992). Handling linear precedence constraints by unification. Proceedings of ACL, Newark, Delaware.

Erbach, Gregor (1991). An environment for experimenting with parsing strategies.

Proceedings of IJCAI 91. Morgan Kaufmann, Los Altos, CA.

Fujisaki, T., F. Jelinek, J. Cocke, E. Black and T. Nishino (1991). A probabilistic parsing method for sentence disambiguation. In: M. Tomita (ed.) Current issues in parsing technology. Boston, Kluwer.

Garside, R. and F. Leech (1987). The UCREL probabilistic parsing system. In: G. L. R.

Garside and G. Sampson (ed.) The Computational Analysis of English: a corpus-based approach. Longman, London. 66-81.

Haugeneder, Hans and M. Gehrke (1986). A user friendly ATN programming environment (APE). Proceedings of COLING-86,

Kay, Martin (1989). Head-driven parsing. Proceedings of International Parsing Workshop, Carnegie -Mellon University.

Marcus, M. P. (1980). A Theory of Syntactic Recognition for Natural Language. MIT Press, Cambridge, MA.

Martin, James (1991). MetaBank: A Knowledge -Base of Metaphoric Language Conventions. In: D. Fass, E. Hinkelman, J. Martin (eds.): Proceedings of the IJCAI workshop “Computational Approaches to Non-Literal Language: Metaphor,

Metonyny, Idiom, Speech Acts, Implicature”, Sydney, Australia

Maxwell, J. and R. Kaplan (1989). An overview of disjunctive constraint satisfaction.

Proceedings of International Parsing Workshop, Carnegie -Mellon University.

Pareschi, Remo and M. J. Steedman (1987). A lazy way to chart-parse with categorial grammars. Proceedings of ACL Proceedings, 25th Annual Meeting,

Pereira, F. C. N. and D. H. D. Warren (1980). Definite clause grammars for language analysis - a survey of the formalism and a comparison with augmented transition networks. In: Artificial Intelligence 13(3): 231-278.

Pollard, Carl and I. Sag (1987). Information-based syntax and semantics; volume 1:

fundamentals. Center for the Study of Language and Information, Stanford, CA.

(12)

Pollard, Carl and I. Sag (in press). Information-based syntax and semantics; volume 2:

binding and control. Center for the Study of Language and Information, Stanford, CA.

Shieber , Stuart M. (1983). Sentence disambiguation by a shift-reduce parsing technique.

Proceedings of IJCAI-83.

Uszkoreit, Hans (1991). Strategies for Adding Control Information to Declarative Grammars. Proceedings of ACL 91, Berkeley, CA.