• Keine Ergebnisse gefunden

Chapter 9. Conclusions

creating

can support

included in zooms into

consist of to generate

tries to keep on using to be visualized to be beneficial for concept

maps users

summary concept maps

concept

propositions maps models

approaches

many

studies CM-MDS

Figure 9.1: Summary concept map automatically created for the content of this thesis using the pipeline presented in Section 6.5 trained on Educ with a size limit ofℒ𝐶 = 10.

9.2. Future Research Directions

more sophisticated than simply choosing the most frequent mention. With the explored neural models that can freely generate labels we already made a step in this direction. Cor-responding techniques could also be integrated in the pipeline approach. We have so far not focused on these additional subtasks because we consider the subtasks that we did ad-dress to be more important for the overall performance, but fully solving the CM-MDS task in the future would entail that solutions for all subtasks have to be developed.

In light of the variety of open challenges and potential for future improvements, we would like to point out a few directions that we consider to be particularly promising:

• Our experiments in Chapter 7 showed that the application of neural networks to CM-MDS is currently difficult and that this is mostly due to the lack of sufficient high-quality training data. To leverage the potential of neural models for the task, it seems worthwhile to explore options of training such models in low-resource set-tings. Other training paradigms, such as incidental supervision (Roth, 2017) or unsu-pervised learning (Dohare et al., 2018), could be solutions. Specifically, recent ideas of transfer learning based on contextualized embeddings (Peters et al., 2018) or fine-tuning language models (Howard and Ruder, 2018, Devlin et al., 2018) seem to be promising directions as they have been shown to be very powerful for other NLP tasks. Once better ways to successfully train a neural model for CM-MDS have been found, subsequent work can further study the use of graph-based neural networks for the task as proposed with our sequence-to-graph architecture. Architectures of this kind are currently explored for many different tasks (Battaglia et al., 2018) and future progress in this area can potentially also be transferred to CM-MDS.

• Instead of trying to model the whole CM-MDS task end-to-end with a single neu-ral model, one could also approach the different subtask individually with neuneu-ral models. This would yield a pipeline as described in Section 6.5 but with potentially more powerful models for each of the subtask. In addition, many of the problems faced in end-to-end modeling, such as the complex structure of the output and the large size of the input, are already handled by how the pipeline decomposes the task, leaving “simpler” subproblems for which models have to be learned. For those sub-problems, the collection of suitable training data is potentially also easier and can rely on already existing datasets for related tasks such as entity recognition, keyword ex-traction or textual summarization. In addition, the alternative training paradigms discussed above would be equally applicable in this setup.

• Since we showed in Section 6.3.3 that a large gap exists between the upper bound and current performance in importance estimation, this subtask could be a particular fo-cus of future work. However, most of the standard techniques for traditional summa-rization have already been applied. To make further improvements, additional ideas

Chapter 9. Conclusions

such as incorporating external world knowledge seem to be promising. Several au-thors noted that the inherent importance that humans assign to certain propositions or concepts independent of a specific text plays an important role in summarization (Louis, 2014, Zopf et al., 2016a, Peyrard, 2018). In particular in the case of our bench-mark corpus, for which importance annotations have been collected through crowd-sourcing with minimal context, this notion of importance presumably plays a major role. Large background corpora or structured knowledge bases could be sources to obtain corresponding features that make a better importance estimation possible.

• An alternative direction to approach the problem of limited importance estimation performance would be to create personalized summary concept maps. A limitation of most summarization methods — no matter if they have been designed for SDS, MDS or CM-MDS — is that given the input text, they will always produce the same sum-mary. In practice, however, different users approach a collection of documents with different information needs, background knowledge and preferences. They would therefore ideally need summaries focusing on different aspects of the content. Meth-ods to create personalized textual summaries have been studied by Zhang et al. (2003), Berkovsky et al. (2008) and Park and An (2010). While these approaches need addi-tional inputs describing a specific user’s interests, there is also recent work by P.V.S.

and Meyer (2017) which tries to derive that information through interaction with a user. That is particularly interesting if data about a user is not available a pri-ori or in our scenario of exploratory search, where it is often difficult for users to express their information need precisely (see Section 2.1.1). As our experiments in Section 6.3.3 showed, the identification of the most important concepts in a docu-ment collection is challenging, which can be partly attributed to varying interests of different users and a corresponding lack of agreement in reference annotations. The interactive modification of a summary concept map to a specific user’s interests is therefore another interesting direction for future work.

In addition to work aiming to improve computational methods for CM-MDS and the personalized variant of it outlined above, research that focuses on the use of automati-cally generated concept maps in practical applications will be crucial to ensure that this technology will ultimately successfully support real-world users. Towards that end, user studies that investigate how concept maps can be best integrated into existing document exploration tools, how they have to be visualized to be beneficial for a user and how their usage can be made more popular would be interesting projects. As we summarized in Section 2.2.2, several studies in this direction have been performed in the past. Once com-putational models for CM-MDS become more mature, it would be important to repeat such experiments using the then state-of-the-art models and summary concept maps that have been automatically created with them.

9.2. Future Research Directions

As we pointed out in Chapter 1, information is nowadays abundantly available and in-formation overload a serious problem that many people face. In this thesis, we approached this problem by developing automatic structured summarization techniques that can sup-port people during the exploration of document collections. The structured summarization task offers both interesting practical applications and enough potential for future research that is not yet matched with a corresponding amount of attention from the community.

We hope that the work in this thesis laid the ground for more research on CM-MDS and similarly structured summarization problems and that the suggestions in this section will provide useful inspiration for interesting future work in this area.

Chapter 9. Conclusions

Index

Index

abstractive concept map, 41 abstractive summarization, 30 concept, 16

concept labeling, 23, 26 concept map, 2, 16, 41

concept map construction, 23, 26 concept map mining, 22

concept map–based multi-document summarization, 42

concept mapping, 18

concept mention extraction, 23, 24 concept mention grouping, 23, 25 crowdsourcing, 65

encoder-decoder architecture, 145 exploratory search, 2, 9, 12

exploratory search system, 11, 12, 35 extractive concept map, 41

extractive summarization, 28 importance estimation, 23, 26, 28 information extraction, 31 information overload, 1 knapsack problem, 29

low-context importance annotation, 65 METEOR, 55

multi-document summarization, 27

neural network, 141

neural supervised summarization, 28 open information extraction, 32 permutation test, 56

proposition, 17, 45

propositional coherence, 18 query-focused summarization, 27 randomization test, 56

relation, 17

relation labeling, 23, 26

relation mention extraction, 23, 25 relation mention grouping, 23, 25 ROUGE, 56

sentence selection, 28

sequence transduction models, 145 sequence-to-sequence models, 145 significance test, 56

single-document summarization, 27 structured text representation, 11, 12 summary concept map, 42

supervised summarization, 28 text summarization, 27

topic shift, 148

unsupervised summarization, 28 update summarization, 27

Index

List of Figures

List of Figures

1.1 An example for a concept map. . . 3

2.1 A concept map that describes the idea of concept maps. . . 17

2.2 Subtasks of concept map mining and their dependencies. . . 23

3.1 Subtasks of CM-MDS illustrated by examples. . . 44

4.1 Summary concept map from Biology on the topic “atom”. . . 62

4.2 Summary concept map from Wiki on the “British contribution to the Man-hattan Project”. . . 62

4.3 Likert-scale crowdsourcing task with topic description and two example propositions. . . 66

4.4 The five-step process of our scalable manual corpus creation approach. . . 69

4.5 Excerpt from a summary concept map from Educ for the topic “students loans without credit history”. . . 73

5.1 Concept extraction recall for inclusive matches at increasing thresholds of𝑘. 88 5.2 PropS representation for a German sentence from TIGER. . . 95

5.3 Extraction precision of PropsDE at increasing yield by genre. . . 101

6.1 Partitioning example with six mentions and coreference predictions. . . . 108

6.2 Excerpt from the summary concept map created with the improved pipeline for the topic “students loans without credit history”. . . 137

7.1 Conceptual illustration of our memory-based graph representation. . . 152

7.2 Sequence-to-graph network unrolled for a small example. . . 154

7.3 Memory addressing vectors computed by our sequence-to-graph model. . 161

7.4 Summary concept maps predicted for the test topic “students loans without credit history”. . . 164

List of Figures

7.5 Summary concept maps predicted for the test topic “parents dealing with their kids being cyber-bullied”. . . 165 8.1 Prototype of the concept map–based document exploration system. . . 170 8.2 Example for a user interaction log of the exploration system. . . 172 9.1 Summary concept map automatically created for the content of this thesis. 178

List of Tables

List of Tables

2.1 Common text representations compared by user requirements. . . 15

4.1 Datasets with reference annotations for concept map mining. . . 60

4.2 Corpus statistics for automatically created benchmark corpora. . . 64

4.3 Correlation of manual responsiveness scores with peer summary scores. . 68

4.4 Source documents of Educ in comparison to classic MDS datasets. . . 72

4.5 Part-of-speech distribution in concept and relation labels of Educ. . . 74

4.6 Performance of the baseline on the Educ test set. . . 75

4.7 Corpus statistics for all benchmark corpora used in the thesis. . . 77

5.1 Concept extraction performance by dataset. . . 87

5.2 Relation extraction performance by dataset. . . 90

5.3 Concept selection performance by dataset. . . 91

5.4 Analysis of the portability of PropS rules from English to German. . . 98

5.5 Tuple extraction performance of PropsDE by text genre. . . 100

6.1 Complexity comparison of mention partitioning algorithms. . . 113

6.2 Classification performance and time for pairwise classifications. . . 114

6.3 Size of the partitioning problem on the smallest and biggest document sets of the training part of Educ. . . 115

6.4 Runtime and optimization results for mention partitioning on the smallest and biggest document sets of the training part of Educ. . . 116

6.5 Pearson correlation between features and true importance scores. . . 124

6.6 Concept selection performance with different models. . . 125

6.7 Evaluation of summary concept maps obtained with the proposed ILP. . . 129

6.8 Comparison of ILP sizes and runtimes for subgraph selection on Educ. . . 130

6.9 End-to-end results on Educ for our pipeline and several baselines. . . 134

6.10 End-to-end results on Wiki for our pipeline and several baselines. . . 135