• Keine Ergebnisse gefunden

In this chapter, we introduced and defined exploratory search in document collections, a task that journalists, researchers, intelligence analysts, lawyers and many other people face regularly. Analyzing a set of user studies, we found that users are particularly interested in key elements such as persons, organizations or facts and the relationships between them when trying to answer complex questions from textual data. A variety of computational models have been developed to automatically extract such structured representations from natural language text. When they are combined with visualizations as well as navigation and search functionalities to build exploratory search systems, they can support users dur-ing exploratory information seekdur-ing activities. In particular, the automatic extraction of relevant elements, concise overviews and easy navigation allow users to process collec-tions far larger than what they could handle without such aids.

Concept maps, labeled graphs depicting concepts and their relations, are a form of struc-tured text representation that is particularly useful. They can provide a concise overview of a collection that reveals key concepts and relationships while also allowing easy ac-cess to details. We argued that these properties make them specifically useful to sup-port exploratory search and differentiate them from representations such as summaries, keyphrases, tables-of-content or mind maps. They have been successfully used in many application scenarios in education as well as knowledge and information structuring.

Furthermore, we reviewed existing concept map mining techniques that aim to auto-mate the generation of concept maps from text. The task is usually approached in a step by step manner, starting with the extraction of concept and relation mentions from the text, grouping coreferent mentions together and then selecting a subset of them to construct a concept map. Pattern-based concept and relation extraction from syntactic structures, sub-string and WordNet-based concept grouping and frequency-based selection of important concepts have been explored. However, as existing work is spread across communities and

2.4. Chapter Summary

uses varying evaluation protocols, it remains unclear which of these techniques perform best. Additionally, we reviewed existing techniques for automatic text summarization and information extraction, two well-studied areas in NLP. Despite their overlap with subtasks of concept map mining, only the most basic methods developed in these areas have been applied to concept maps yet, while the concept map mining task itself has received little attention in the NLP community so far.

Chapter 2. Background

Chapter 3

Structured Summarization with Concept Maps

In this chapter, we will introduce the central problem studied in this thesis, the automatic creation of multi-document summaries in the form of concept maps. First, we motivate the task based on the review of existing work and user requirements laid out in the previous chapter. A formal definition of the task, a discussion of its challenges for computational models and a comparison to existing tasks follows. We close the chapter by suggesting several methods to evaluate and compare automatic methods for the task.

3.1 Motivation

As we discussed in detail in the previous chapter, supporting users during exploratory search in documents is an important problem. Providing an overview, offering navigation capabilities and revealing important elements and relationships are key functionalities to make exploratory search more efficient. Summaries in the form of concept maps can offer all of these features and are therefore a promising text representation in this scenario.

While there is existing work on concept map mining, reviewed in Section 2.3.1, with papers going back to 2001, we argue that research in this area is still at its beginning:

• The amount of existing work on concept map mining is very limited. In our literature review, we cite 23 distinct papers. In the NLP community in particular, almost no work on the task exists. A search in the ACL Anthology13yields 8 results for “concept map” and 14 for “concept map mining”, of which 3 are papers written by the author of this thesis and only one other is relevant. In contrast, querying for “summarization”

returns 668 papers, “named entity recognition” 735 and “event extraction” 172.

13https://aclanthology.info/

Chapter 3. Structured Summarization with Concept Maps

• In addition to the small amount of work in general, there is also no agreed upon eval-uation protocol for proposed methods and no common evaleval-uation data. As a result, a range of concept map mining techniques have been proposed, but no information about how they compare to each other in terms of performance exists. This makes it impossible for researchers to make measurable progress and also difficult for practi-tioners to decide which algorithms to implement in downstream applications.

• The existing work has been published in a range of different communities and venues, including information science (Qasim et al., 2013), expert systems (Zubrinic et al., 2012), learning technologies (Villalon and Calvo, 2009, Zouaq and Nkambou, 2008), knowledge management (Rajaraman and Tan, 2002, Zouaq and Nkambou, 2009) and concept mapping (Aguiar et al., 2016, Kowata et al., 2010, Valerio and Leake, 2006).

Only one related paper appeared in an NLP venue (Olney et al., 2011). The fact that there is no single community that “owns” the task seems to contribute to the lack of comparability and common evaluation protocols.

• Although the core challenges of concept map mining are clearly natural language processing problems, little work from the NLP community has been applied to the task. As we showed in Section 2.3.2, for automatic summarization, powerful super-vised feature-based and neural network-based models have been developed to deter-mine which parts of a document should be included in a summary. For the related selection problem in concept map mining, only simple unsupervised frequency mea-sures have been explored. Similarly, as Section 2.3.3 has shown, much effort has been invested in designing (open) information extraction systems that can process large and heterogeneous collections of text. Existing work on concept map mining largely ignored these efforts and hand-designed own extraction methods from scratch, often tailored to very specific domains and text types.

Overall, we think that the research on extracting concept maps from text is still in its begin-ning and that with increased attention, in particular within the NLP community, computa-tional methods for the task can be greatly improved. Given this observation, in combination with the fact that concept maps are a promising representation for exploratory search from a user requirements’ point-of-view, the research of such methods is the goal of this thesis.

In particular, we reformulate the task as a variant of MDS where the summary has the form of a concept map, thus combining traditional MDS with information extraction and coreference resolution challenges. In the context of exploratory search in large document collections, the summarization aspect is important, as the amount of information to process can be huge. Starting with the research described in this thesis, we hope that this new task gains increased attention in the summarization and NLP communities. It is an interesting, application-oriented task at the intersection of several existing NLP tasks that allows re-searchers to test existing models and to start developing new approaches that target the