Application to Disambiguation - Bottom-Up Earley Deduction for Preference-Driven Natural Langua

3.5 Incrementality

4.1.5 Application to Disambiguation

A typical natural languageunderstanding(NLU) system will have to decide which of several analyses of a given string is the appropriate one, i.e. the one intended by the (human or artificial) system that generated the string. The question is which parts of an NLU system are responsible for this step.¹⁰

9+def, -def, ¬def are the usual abbreviatory conventions. The preference value 0.5 for the case where the principle is violated is randomly chosen, and has not been determined by a statistical analysis. Since 1 is interpreted as truth and 0 as falsity, 0.5 can be interpreted as the probability the the clause being true given the input arguments.

10In the construction of the LILOG system, there was at one time a situation where the group responsible for syntactic analysis delivered all possible readings of an input string and counted on the knowledge processing group to take care of the disambiguation, whereas the knowledge

ordering-preferences(hF|Ri)#P1∗P₂← ordering-preferences2(F,R)#P₁ ∧ ordering-preferences(X)#P₂. ordering-preferences(h i)#1.

ordering-preferences2(X,hF|Ri)#(W₁∗P₁+W₂∗P₂+W₃∗P₃)∗P4← check-heaviness(X,F)#P₁ ∧

check-definiteness(X,F)#P₂ ∧ check-focusaccent(X,F)#P₃ ∧ ordering-preferences2(X,R)#P₄. ordering-preferences2(X,h i)#1.

check-definiteness(+def,−def)#1.

check-definiteness(−def,−def)#1.

check-definiteness(+def,+def)#1.

check-definiteness(X,¬ def)#1.

check-definiteness(¬def,X)#1.

check-definiteness(−def,+def)#0.5.

Figure 4.7: Procedures for checking ordering preferences

We can roughly characterise the architecture of an NLU system as consisting of three major parts.

1. Syntactic (and morphological) analysis

2. Contextual interpretation (anaphora resolution etc.) 3. Knowledge representation and inference

We claim that the task of disambiguation cannot be solved by any of these components alone, but all of these three parts can play a role in disambiguation, as sentences 10 to 17 illustrate.

The German example sentence (10) has four readings, since Schauspielerin can fill the roles of subject and direct object, and Moritz and Lisa can fill the roles of subject, indirect object and direct object. All four readings are equally plausible, but word order preferences provide strong evidence for the reading in whichMoritz is the subject, Lisathe indirect (dative) object andSchauspielerin the direct (accusative) object. This is a case where syntactic preferences determine the choice of a reading, unless there is evidence to the contrary.

(10) Moritz zeigt Lisa die Schauspielerin Moritz shows Lisa the actress

Sentence (11) shows a lexical ambiguity of the wordmouse, which is ambiguous between rodent and computer input device. In the first sentence, both readings are equally possible. Coindexation of the pronounitin the second sentence with mouse, together with selectional restrictions of the predicative adjectivedead, lead to a preference for the rodent reading. The same effect occurs in sentence (12), and the device reading is preferred in sentence (13). In this case, anaphora resolution and selectional restrictions determine the choice of a reading.

(11) On the table there was a mouse. It was dead.

(12) Bill found a mouse. The animal was half-starved.

(13) Bill found a mouse. The device was in need of repair.

The following sentences can only be disambiguated by inference and world knowledge. In sentence (14), the locative adjunctin the riversuggests the auxiliary verb reading of can, since rivers are locations where fishing is possible, and in sentence (15), the main verb reading of can is suggested by the locative adjunct since factories are locations where goods are canned.

(14) We can fish in this river.

processing group assumed that they would get only one reading because disambiguation i takes place during syntactic processing.

(15) We can fish in this factory.

In the sentences (16) and (17) knowledge processing is required to resolve the lexical ambiguity ofwindowbetween an opening in a wall and a human-computer interface widget. The process of fitting the definite descriptions the pane and the title bar into the discourse representation will assist in the disambiguation of window if knowledge is available that a pane is part of a framed window in the wall, and that a title bar is part of a computer window.

(16) John stared at the window. The pane was broken.

(17) John stared at the window. The title bar showed weird characters.

By viewing syntactic analysis as a deductive process, it can be integrated with the other two processes. The same preference scheme, and even the same deductive techniques can be applied to all three stages, so that the final preference ordering among different readings is made up from results of all three processes.

4.1.6 Application to Generation

In an ideal situation, one can assume that the input to a generator is always fully specified so that only one unique result is generated that is optimally suited to a achieve the desired communicative effect in a given communicative situation.

However, in realistic NLP applications, there is often not enough information available and decisions must be made in the face of this incomplete knowledge.

Such situations arise for example in systems that must deal deal with a wide variety of input, such as the generation component of large-coverage machine translation systems, which normally do not make use of a discourse model. In order to arrive at a reasonably acceptable output of the generator when no information is available to make an informed decision for a non-deterministic choice, one can use statistical information and simply take the most probable choice. For example in the case of lexical selection or the selection of syntactic variants, the most frequent one (either in terms of absolute probabilities or of conditional probabilities such as n-gram models) is chosen. Whenever the information for making a choice is available, it is used, and preferences (probabilities) are only used to resolve those choices which are left underspecified in the input.

4.1.6.1 Preferences and Self-Monitoring

Neumann and van Noord have developed an algorithm for self-monitoring of syn-tactic generation [Neumann and van Noord, 1992]. The purpose of the algorithm is to avoid the generation of ambiguous utterances. The algorithm starts out by generating a sentence which is then parsed in order to determine whether it is am-biguous. If a sentence is found to be ambiguous, its analysis trees are traversed to

find the place in which the derivations differ. At this choice point, another choice is taken, and the result is then again checked for ambiguity. This process is repeated until a non-ambiguous paraphrase of the utterance is generated. The algorithm is quite efficient because it reduces the search to the parts of the derivation which are responsible for the ambiguity of the utterance.

The classic example is the ambiguous sentence (18), which can be paraphrased by the non-ambiguous sentences (19) and (20).

(18) Remove the folder with the system tools (19) Remove the folder by means of the system tools (20) Remove the folder containing the system tools

In realistic language models, there are often situations where it is impossible to generate non-ambiguous utterances. This is due to lexical ambiguities (most lexical entries have a number of different readings), structural ambiguities (which are often not even noticed), and scope ambiguities (it is hardly possible to avoid ambiguities in quantifier scope). In such “deadlock” situations, it is desirable to generate a phrase that is unlikely to be misinterpreted. In terms of preference values this means that the preference value of the most preferred reading must be significantly higher than that of the second most preferred reading.¹¹

An example would be the German sentence Kohl kritisierte Chirac(Kohl crit-icised Chirac), which has one reading in whichKohlis the subject andChiracthe object, and another reading in which the roles are reversed. However, the second reading is so unlikely that there is hardly a chance of a misunderstanding. In this case the word order preferences are so strong that there will be a large distance in the preference values.

An example in which there is not enough difference is the sentence Florian l¨ost sein Problem mit exzessivem Drogenkonsum(Florian solves his problems with excessive drug-consumption) which has one reading the the drug consumption is the problem that is solved and another — equally plausible — one where the problem is solved by means of drug consumption. In this case, unambiguous paraphrases must be generated.

Of course the preference values (viewed as probabilities) can be different for different sublanguages. In the computer domain, the word mouse is more likely to refer to an input device than to a rodent, and the wordwindow is more likely to refer to user interface widget than an opening in a wall, while the probabilities will be reversed in texts from other domains (e.g., pest control, construction).

11We won’t attempt to formalise what “significantly higher” means since it depends strongly on the given purpose of the generation system. The higher the cost of misunderstanding (the potential damage), the greater the difference in preference values should be.

Im Dokument Bottom-Up Earley Deduction for Preference-Driven Natural Language Processing (Seite 154-159)