• Keine Ergebnisse gefunden

7.4 Memory-based Graph Manipulation Models

7.4.3 Experiments

7.4. Memory-based Graph Manipulation Models

Relation Mention Extraction and Grouping The extraction and grouping of relation men-tions is handled similar to concepts, using the corresponding memory matrix. By updating the source and target node matrices, a relation is associated to its concepts.

Concept and Relation Labeling Since the model is generative, labels are created during decoding and can be any sequence of tokens from the vocabulary. Thus, the model is able to generate abstractive concept maps.

Importance Estimation No explicit importance estimation is done. Instead, due to the lim-ited capacity, the model always keeps just a summary-sized subset of the input in its memory. This can be seen as an alternative approach to summarization where the main task is to make local decisions of what to keep in the memory, add to it or what to override when reaching capacity. As we mentioned earlier, this is inspired by Kintsch and van Dijk (1978)’s cognitive model and summarization systems designed accordingly (Fang and Teufel, 2016, Fang et al., 2016). In addition, some part of se-lecting the summary-worthy subset of the multi-document input is also handled by the pre-summarizer in our setup.

Concept Map Construction The construction of the summary concept map is handled by the model’s decoder and the subsequent ILP which makes sure a connected graph is created. Due to the limited memory size, the selection of concepts and relations already happens during encoding and the size constraint is always satisfied.

While these considerations are the motivation behind the different parts of the architecture and provide some intuition as to why the model should be able to handle the task in its full scope, there is of course no guarantee that the model, when being trained end-to-end, is actually able to learn the task nor that the different subtasks are handled as intended.

Chapter 7. End-to-End Modeling Approaches

singular forms and different determiners. The propositions are connected with different conjunctions or punctuation. An example of input and output is shown in Figure 7.3.74 The full dataset contains 10k pairs of which all input sequences and output graphs are unique.

Input sequences have on average 26.7 tokens. The vocabulary consists of 48 distinct tokens.

Model We trained a sequence-to-graph model with 32-dimensional embeddings, encoder states, memory vectors and decoder states on 9k of the generated examples. Embeddings are learned from scratch with the other parameters. The total number of parameters is 51,432. Memory matrices have a capacity of 5 nodes and 4 edges. We train the model with Adam (Kingma and Ba, 2015), using a learning rate of 0.001, and a batch size of 50.

Results When trained on the 9k training instances, the sequence-to-graph model con-verges to a loss of almost 0 after 6 epochs and is then able to produce correct graphs for all inputs. The performance is only slightly lower on the unseen 1k test instances, as both parts are drawn from the same small input and output space. Nevertheless, this demonstrates that the architecture can be successfully trained and that the model is able to produce labeled graphs that correspond to the input sequences.

In Figure 7.3, we show how the trained model processes an input sequence. The four columns on the right illustrate the memory addressing vectors computed by the model for each of the input tokens shown in the first column. The example demonstrates that the model learned to recognize concept mentions, as the addressed node memory cell (second column) always changes when the first token of a concept mention is encountered (e.g.

alternativeorritalin). When concepts are mentioned again, as in the last sentence, the corresponding memory cells are also addressed again. Similarly, the model also learned to use a new edge memory cell (third column) for each relation, which, in this simple example, corresponds to recognizing sentence boundaries.

Finally, the last two columns show that the model is also able to correctly determine source and target nodes of edges. When processingmight reduce, for instance, the source node vector (fourth column), which is a weighted combination of the node addressing vec-tors computed for the preceding steps, focuses on the second cell as desired. The target node vector (fifth column) addresses the fourth cell, which is where the adhd concept is stored. Note that while the correct source and target nodes are not necessarily identified at all tokens of a relation mention, the resulting final memory state still leads to a correctly decoded graph in this example (and almost all other instances of this generated dataset).

While we think that training models and illustrating predictions on this simple dataset is very useful to better understand the capabilities and inner workings of the model, it is of course not the ultimate goal. In contrast to the data used here, the actual task will

74For brevity, we show an example with only 4 concepts and 3 relations which is therefore not part of the actual dataset used in the experiment, but it has been generated in the same way.

7.4. Memory-based Graph Manipulation Models

Herbal supplements are part of alter-native treatments. Ritalin can treat ADHD. Alternative treatments might

reduce ADHD. herbal

supplements alternative treatments

ritalin ADHD

are part of can treat

might reduce

Token Node Memory Edge Memory Source Node Target Node 𝛼𝑁𝑡 (Eq. 7.6) 𝛼𝐸𝑡 (Eq. 7.6) 𝛼𝑆𝑡 (Eq. 7.12) 𝛼𝑇𝑡 (Eq. 7.12) herbal

supplements are

part of

alternative treatments .

ritalin can treat adhd .

alternative treatments might reduce adhd .

Figure 7.3: Memory addressing vectors computed by our sequence-to-graph model. The model capacity is 5 nodes and 4 edges, resulting in 6- and 5-dimensional addressing vectors (last for the NOP cell). The model learned to use a new node memory cell whenever a new concept is mentioned and successfully readdresses them at coreferent mentions. The model further learned to address a new edge cell at every proposition boundary and to determine source and target nodes.

Chapter 7. End-to-End Modeling Approaches

have longer input sequences and output graphs, a much larger vocabulary on both sides, a larger variety of mentions for the same concept, more tokens in the input that are neither a concept nor a relation mention and a noisier relationship between input and output. The next section carries out experiments on data that exhibits more of these properties.