• Keine Ergebnisse gefunden

Constructing of a Dependency Graph

Once we have a group of transformed dependency trees, we aim at finding the best node alignment for those trees to further build a graph expressing all the content from the input.

We use a simple, fast and transparent method and align any two nodes provided that the nodes they contain

• are content words;

• have the same part-of-speech;

• have identical lemmas or are synonyms.

4.3 Constructing of a Dependency Graph 43

Mathematik und

an in

studierte

subj obja pp

pp

kon cj

pn

pn Bohr

Niels

app Universitaet Kopenhagen

der det Physik

(a) Original tree

Mathematik und studierte

subj obja an

in

kon cj Bohr

Niels app

Physik

Universitaet der

det

Kopenhagen

(b) AfterPREP

studierte

subj an

in

Bohr Niels

app

Universitaet der

det

Kopenhagen Mathematik Physik

obja obja

(c) AfterCONJ

studierte

subj an

in

Bohr

Universitaet der

det

Kopenhagen Mathematik Physik

obja obja Niels

(d) AfterAPP

studierte

subj an

in

Bohr

Universitaet Kopenhagen Mathematik Physik

obja obja Niels

(e) AfterFUNC

studierte

subj an

in

Bohr

Universitaet Kopenhagen Mathematik Physik

obja obja Niels

root s

(f) AfterROOT

studierte

subj an

in

Universitaet Kopenhagen Mathematik Physik

obja obja root

s

bio

(g) AfterSEM

studieren

subj an

in

Universitaet Kopenhagen Mathematik Physik

obja obja root

s

bio

(h) AfterLEMMA

Figure 4.3: The transformations of the dependency tree of the sentence in (4.6)

We prefer this very simple method to bottom-up ones (Meyers et al., 1996;Barzilay & McK-eown, 2005;Marsi & Krahmer, 2005) mainly for two reasons. Firstly, pursuing local subtree alignments, bottom-up methods may leave identical words unaligned and thus prohibit fusion of complementary information. On the other hand, they may force alignment of two unrelated words if the subtrees they root are largely aligned. Although in some cases it helps to discover paraphrases, it also considerably increases chances of generating ungrammatical output which we want to avoid at any cost. For example, even synonymous verbs such as say and tell have different subcategorization frames, and mapping one onto another would include the possibil-ity of generating *said him or *told to him. In case of multiple possibilities, i.e., in cases when a word from one sentence appears more than once in a related sentence, the choice is made randomly. It should be noted, however, that such cases are extremely rare in our data.

By merging all aligned nodes we get a dependency graph which consists of all the de-pendencies from the input trees. If the graph contains a cycle, one of the alignments from the cycle is eliminated. Root insertion during the transformation stage guarantees that the graph obtained as a result of alignment is connected. Recall the sentences (4.3) and (4.4) from Section4.1. Figures 4.4aand4.4bshow their transformed dependency trees and Figure 4.4c presents the graph obtained as a result of their alignment (nodes shared by the input trees are in blue).

The graph we obtain covers all the dependency relations from the input sentences. More-over, these are no longer relations between words but between entities and concepts as some nodes cover several words which may differ. For example, the node bio in Figure4.4c rep-resents an entity (namely, Niels Bohr) referred to with er and Niels Bohr in (4.3-4.4). Given that some of the tree transformations reveal implicit semantic relations, the graph is not purely syntactic but is also semantically motivated. Constructing such graphs is important because it brings sentence fusion one step closer to abstractive summarization which, as the reader might remember from the introduction, proceeds by “understanding” the text – i.e., creating a semantic representation for it – and by generating a summary from this representation.

Apart from the apparent advantages, the dependency graph representation also has a num-ber of serious disadvantages. For example, time and temporal relations are not treated prop-erly. Consider, e.g., a set of two similar sentences such as (4.7-4.8)3which concern the same activity (marriage) but two different events. From the graph emerging after alignment of the respective trees (see Fig.4.5) it is no longer possible to infer whom Albert Einstein married in which year.

(4.7) [...]

[...]

am on the

2.

2nd Juni June

1919 1919

heiratete married

er he

Elsa, Elsa,

die who

ihre their

T¨ochter daughters

Ilse Ilse

und and Margot

Margot mit

VERB-PREFIX in in

die the

Ehe marriage

brachte.

brought.

3See Files 10199 and 13199 in CoCoBi.

4.3 Constructing of a Dependency Graph 45

Abitur Schule Gammelholm

Physik

Mathematik Chemie

Astronomie Philosophie 1903

Universitaet Kopenhagen bio

studieren root s

nach an

in an

subj zeit

obja obja obja

obja obja

(a) The transformed tree of the sentence in (4.3)

studieren root s

und

erlangen dort

Doktorwuerde bio

Universitaet Kopenhagen subj

1911 s

an kon

adv

zeit obja cj

(b) The transformed tree of the sentence in (4.4)

und

erlangen dort

Doktorwuerde 1911 adv

zeit obja Abitur cj

Schule Gammelholm

Physik

Mathematik Chemie

Astronomie Philosophie 1903

Universitaet Kopenhagen bio

studieren root s

nach

an

in an

subj zeit

obja obja obja

obja obja

kon s

(c) The dependency graph obtained from the trees above

Figure 4.4: Transformed dependency trees of the sentences (4.3) and (4.4) and the graph built from them

heiraten root s

2. Juni 1919 Elsa am

Mileva Maric bio obja

subj obja 1903 zeit

bringen in die obja

Tochter

Ehe subj

s

rel

Figure 4.5: Graph built from the trees of the sentences (4.7-4.8)

’[...] on the 2nd of June 1919 he married Elsa with whom he had two daughters Ilse and Margot.’

(4.8) 1903 1903

heiratete married

er he

Mileva Mileva

Maric¸, Maric¸,

eine a

Mitsch¨ulerin classmate

am at the

Polythechnicum.

Polythechnicum.

’In 1903 he married Mileva Maric¸, a classmate at Polythechnicum.’

Thus, it is important that only sentences concerning the same event are aligned and grouped together, otherwise a sentence inconsistent with the input can be generated. Fortunately, even a shallow word-based alignment algorithm turns out to be robust enough in practice, and alignment of inherently different sentences is unlikely. Still, it is important to keep in mind this limitation of our graph-based representation. Other disadvantages are due to dependency syntax in general. For example, it is impossible to express that a certain subtree modifies the proposition as a whole and not just the verb because there are no non-terminal nodes in dependency grammar.