• Keine Ergebnisse gefunden

5.2 Extraction with Predicate-Argument Analysis

5.2.1 Predicate-Argument Structures

Chapter 5. Concept and Relation Extraction

necessarily present in a sentence. For our example sentence, the resulting representation looks like this:

Caffeine , which is a mild CNS stimulant , reduces ADHD symptoms .

be.01 reduce.01

root nsubj

punct nsubj

rcmod det

amod nn xcomp punct

amod dobj

punct

R-A1

A1 A2

A0 A1

Arcs above the tokens show syntactic dependencies. Below the tokens, the two sense-disambiguated predicates be.01and reduce.01are shown with arcs pointing to the heads of their arguments and role names describing the arguments’ function. A small set of role names, such as A1 and A2, exist in PropBank. They can have slightly different meanings for different predicates which are defined in the lexicon. For reduce.01, the A0-argument describes the agent and theA1-argument the logical subject.

This approach is the most sophisticated type of analysis in our study, as it tries to map a natural language sentence to another layer of abstract semantic representation defined by the predicate and argument role inventory. Because that inventory is independent of specific syntactic realizations, predicates and arguments are always mapped to the same symbols. For instance, in bothCaffeine reduces ADHDandADHD is reduced by caffeine, the predicatereduce.01is found and has an argumentcaffeinewith roleA0. The different re-alizations of the same underlying proposition are thus unified in the SRL representation, allowing us to handle both cases with the same extraction logic. However, the SRL rep-resentation also has disadvantages: First, spans of roles are strictly bound to full subtrees in the dependency parse, as shown in the example, where the relative clause becomes part of theA0-argument of the predicatereduce, although the proposition would also be valid without that part. And second, predicates and arguments that are not covered by the lexi-con cannot be represented and thus can also not be extracted for a lexi-concept map.

For our experiments, we use the SRL functionality of Mate Tools (Björkelund et al., 2009), a freely available system that was one of the best in the CoNLL 2009 shared task (Hajič et al., 2009). To obtain binary predicates from the representation, we ignore predicates that have just one argument, and if more than two are present, we form propositions with all pairs of them. We further ignore referential arguments (e.g. role R-A1), as only a direct mention of an argument is useful for a concept map. For the example sentence, this strategy yields the following propositions:

(1) (Caffeine - is - a mild CNS stimulant)

(2) (Caffeine, which is a mild CNS stimulant, - reduces - ADHD symptoms)

5.2. Extraction with Predicate-Argument Analysis

5.2.1.2 PropS

As the second approach, we use PropS, a rule-based converter that turns dependency trees into typed predicate-argument graphs (Stanovsky et al., 2016b). In addition to identify-ing predicates and arguments, it also canonicalizes the representation of propositions, e.g.

by unifying variations such as active and passive or copula and appositive constructions.

However, compared to SRL, PropS works exclusively on the lexical level and does not map tokens to any symbols from an external inventory. That has the advantage that the repre-sentation can always cover the full content of a given sentence, as opposed to SRL. PropS also classifies predicate-argument relations into a small set of labels such assubj anddobj.

For the example sentence, PropS yields the following graph representation:

Caffeine be mild CNS stimulant reduce ADHD symptoms

subj

comp mod subj

dobj mod

Here, it identified two predicates (dark nodes) that both have two arguments with different roles (subj, dobj andcomp). Similar to SRL, arguments can be trees of several tokens, but due to the transformations applied to the dependency tree, its original structure is typically largely simplified. In the example, punctuation and determiners have been dropped and the noun compound has been collapsed into a single graph node. In the graph, predicates are represented by their lemmas, but the original surface form is stored for later use.

To create binary predicates, we traverse the graph from every predicate node and select as its arguments the subgraphs of the directly connected argument nodes. We remove unary predicates and break down higher-arity predicates by creating all possible pairs except if they have the same edge label (e.g. two objects). As mentions, we use the original surface forms of predicates and arguments. We obtain the following propositions for the example:

(1) (Caffeine - is - a mild CNS stimulant) (2) (Caffeine - reduces - ADHD symptoms)

5.2.1.3 Open Information Extraction

As the third approach for predicate-argument analysis, we use OIE. As we already described in Section 2.3.3, OIE systems extract tuples that represent propositions from a given sen-tence. Every tuple consists of a relation phrase and two arguments, which is already exactly the representation in which we need concept and relation mentions. In contrast to the other two approaches, OIE systems in general do not create an intermediate representation of a specific type. However, they still serve our purpose as the direct creation of propositions

Chapter 5. Concept and Relation Extraction

also alleviates the need for mention extraction approaches to deal with different syntactic realizations when looking for concepts and relations.

In our experiments, we use OpenIE4 (Mausam, 2016). It is one of the state-of-the-art OIE systems (Stanovsky and Dagan, 2016b). Using it, the extracted propositions for the example sentence are the same as for PropS:

(1) (Caffeine - is - a mild CNS stimulant) (2) (Caffeine - reduces - ADHD symptoms)

With regard to the motivation of avoiding laborious definitions of large sets of rules, we note in passing that PropS as well as many OIE systems do indeed make their extrac-tions using hand-written rules. Thus, relying on them for concept and relation mention extraction does not completely remove the need for rules, instead, it shifts the responsibil-ity for rule creation from researchers working on the specific task of concept and relation extraction to the authors of more generally applicable predicate-argument analysis tools.