• Keine Ergebnisse gefunden

10 Summary and Perspectives

10.2 Extrinsic Approaches

In the second part of the dissertation, we consider the relationship be-tween textual entailment and other semantic relations bebe-tween texts.

Chapter 6 presents a generalization of the RTE task, which leads to a classification of four relations, Paraphrase, Entailment, Contra-diction, and Unknown. Then three numerical features are proposed to characterize them, relatedness, inconsistency, and inequality.

Before doing the classification of textual semantic relations, the cor-pora construction is introduced in Chapter 7. An overview of several existing corpora is given, as well as a discussion of the methodologies used during the construction. Then the work on constructing two alter-native corpora for textual semantic relations is presented, one constructed by manual annotation and the other using a crowd-sourcing technique to collect data from the Web. Based on inter-annotator agreement and analysis of the sampled data, both corpora show comparable quality to other existing corpora. These corpora are all used as datasets in our experiments.

Chapter 8 describes the approach of using relatedness recognition as an intermediate step for entailment recognition. The evaluation confirms that the two-stage classification method works better than three-way classification on the RTE data. We further extend the system with two other measurements, inconsistency and inequality, and use them to clas-sify multiple semantic relations (Chapter 9). The results show that not only one single recognition task (i.e., RTE) can benefit from the search space reduction, but also multiple tasks can be accomplished in one uni-fied framework, like paraphrase acquisition and contradiction detection.

Among the three features we considered for the TSR classification, inconsistency and inequality are difficult to measure. For the former measurement, after adding the modules dealing with negation and modal verbs, the performance can be improved. For the latter, we can compare several lexical resources as we did for relatedness recognition, although the directionality between words is also not trivial to obtain (Kotlerman et al., 2009).

We can also do detailed feature engineering for acquiring these three measurements. For instance, synonyms and antonyms are important to relatedness, but probably do not have any impact oninequality, since they are both bi-directional relations. Inconsistency can be detected when one contradictory part is discovered, while relatedness has to go through all

10.3. APPLICATIONS 197 the information contained in the text pair. These suggest that we should take different approaches to acquire different measurements.

In addition, for entailment recognition, we currently use intersection to combine all the results from comparing two semantic units. However, that does not always provide the best result. The union operator is also interesting to explore, since we aim at identifying all possible semantic relations between the two texts.

In fact, the semantic unit itself (i.e., the meaning representation) can be extended to incorporate more information. For instance, the named-entities and their relations can also be represented in the dependency style, or even the scope information of the quantifier in formal semantics.

Furthermore, if we properly combine the results from research on lexical semantics with our current architecture, the monotonicity issue may also be systematically handled.

An even more attractive method is to integrate the intrinsic approaches with the extrinsic ones. For each TSR recognition, we may have several specialized modules. Each specialized module can select a subset of the data and deal with it. The external knowledge resources can also aim at different subsets. Accordingly, the voting strategy needs to be “up-graded” to handle conflicts between different semantic relation decisions.

10.3 Applications

We have discussed the motivation for tackling the textual entailment problem at the beginning of this dissertation, but we have not elaborated on using the system as a component for other downstream applications1. Here, we briefly introduce three tasks, answer validation, relation vali-dation, and parser evaluation, where we use previously developed RTE systems as valuable components to tackle the problems.

Answer Validation is a task proposed by the Cross Language Evalua-tion Forum (CLEF) (Pe˜nas et al., 2007, Rodrigo et al., 2008) and it aims at developing systems able to decide whether the answer of a question answering system is correct or not. The input is a set of pairs <answer, supporting text> grouped by question. Participant systems must return

1We described several work of applying the existing RTE system to other NLP tasks in Section 2.6.

one of the following values for each answer: validated, selected, and re-jected. The first and the last ones are straightforward, and the second one marks the best answer when there is more than one correct answer to a question.

Our system uses the RTE module (Section 5.4.2) as a core component.

We adapt questions, their corresponding answers, and supporting doc-uments into T-H pairs, assisted by some manually designed patterns.

Then, the task can be cast as an entailment recognition task. The an-swer is correct when the entailment relation holds and vice versa. We achieved the best results for both English and German languages in the evaluation2 (Wang and Neumann, 2008a).

Relation Validation can be described as follows: given an instance of a relation between named-entities and a relevant text fragment, the system is asked to decide whether this instance is true or not. We also made use of the RTE module (Section 5.4.2) as the core component and transformed the task into an RTE problem, meaning that the relation is validated when the entailment holds and vice versa. We set up two different experiments to test our system: one is based on an annotated data set; the other is based on real web data via the integration of our system with an existing information extraction system. The results sug-gest that recognizing textual entailment is a feasible way to address the relation validation task as well (Wang and Neumann, 2008b).

Parser Evaluation using Textual Entailment (PETE) (Yuret et al., 2010) is the SemEval-2010 Task3 #12, which is an interesting task con-necting two areas of research, parsing and RTE. The former is usually concerned with syntactic analysis in specific linguistic frameworks, while the latter is believed to involve more semantic aspects of the language, although in fact no clear-cut boundary can be drawn between syntax and semantics for both tasks. The basic idea is to evaluate (different) parser outputs by applying them to the RTE task. The advantage is that this evaluation scheme is formalism independent (for the parsers).

The RTE module used in our participating system is mainly described in Chapter 9. Instead of using the 3-D model for TSR recognition (Sec-tion 9.2.2), we directly recognize the entailment rela(Sec-tion based on the

2For the German language, we applied a German dependency parser (Neumann and Pisko-rski, 2002) for the preprocessing.

3http://semeval2.fbk.eu/semeval2.php

10.3. APPLICATIONS 199