Summary and Relevance - Projections for Visual Analysis of Multivariate Data : Methods for Iden

The present thesis integrates into two parts of the DR pipeline, which is depicted in Figure 1.1:

the feature space and the visualization using novel interactive visualizations. The main contri-butions of this thesis are embedded into the family of pattern identification and interpretation in multivariate MDS projections, and the navigation of large spanned information spaces.

Interaction is the common theme in this thesis. It enables users to identify and navigate multivariate patterns that may not be visible at first sight. Also, interaction enables the user to interpret patterns by steering the underlying data model and, thus, causing results to adapt to the user’s incentive dynamically. Interactive model steering acts as means to either verify hypotheses or to form new hypotheses about the data.

I briefly outlined well-known interactive approaches for visualizing multivariate data based on 2D projections. However, it is still ongoing research how to make sense of such multivariate projections, on which I follow up in this thesis. In particular, I investigate the interpretation of patterns in single and multiple views on the data in Chapter 3, as well as using projections to make sense of temporal multivariate data in Chapter 4.

Furthermore, I discussed the two most frequently used state-of-the-art techniques for overview-preservation in large information spaces: Overview-and-Detail and Focus-plus-Context. In particular, the approach by Büring et al.[30]is highly relevant for developing topology-preserving off-screen visualizations. In Chapter 5, I introduce a dedicated border region aiming at preserving the data topology. If only the data points are visualized like it is the case in standard scatterplots, the result is similar to the image-based approach by Büring et al. However, changing the rendering of off-screen objects characterizes off-screen

2.4. Summary and Relevance

visualization. Visually, I build upon the approach of Büring et al. and extend it with respect to the visualization of off-screen information. This means, I adapt the rendering of off-screen content for a data-driven overview. Because focus-plus-context techniques are image-based, they are not suitable for data-driven tasks, where the visualization needs to adapt to certain data characteristics, such as the density or the multivariate nature. Therefore, I investigate the data-driven design space of off-screen visualizations, which are characterized by their ability to adapt the rendering based on the data and the task at hand.

My developed methods and findings are based on classical MDS or general spatial data but are generally valid and applicable to any other projection technique.

Part I

Identification & Interpretation of

Multivariate Patterns in Projections

Visual Pattern Analysis and Interpretation in 3

Multivariate Subspaces

Contents

3.1 Introduction . . . 36 3.2 Related Work . . . 38 3.2.1 Visual Analysis of Mixed Datasets . . . 39 3.2.2 Interpretation of Multivariate Projections . . . 39 3.2.3 Subspace Search and Visualization . . . 40 3.3 Interpretation of DR Data: Phenomenological Study . . . 41 3.3.1 Visualization Prototype: Integration of Mixed Data Types . . . 42 3.3.2 Interpretation Study . . . 46 3.3.3 Findings . . . 50 3.4 Pattern Trails: Pattern Transitions in Subspaces . . . 52 3.4.1 Basic Idea . . . 53 3.4.2 Subspace Pattern Transitions and Its Interpretation . . . 54 3.4.3 Similarity-based Ordering of Subspace Views . . . 58 3.4.4 Visual Identification of Patterns . . . 61 3.4.5 Use Case: University Rankings . . . 65 3.4.6 Use Case: Forest Fires . . . 67 3.5 Discussion & Future Directions . . . 68

M

ULTIVARIATEprojections are hard to interpret, in particular for domain experts not familiar with machine learning or advanced statistics. This statement reflects the common understanding among many visualization researchers. Initially, this claim is held true, because the interpretation of a multivariate projection depends on whether a pattern can be identified and whether it reflects the mindset of the analyst. Patterns, however, typically occur in different subspaces of the data (attribute subsets of the data), posing a challenge to find them in general, and subsequently interpret them since they can mean something different in other subspaces. In recent years, interactive visual methods have been extensively researched for their ability to improve transparency and ease the interpretation. These methods have primarily been evaluated using case studies and interviews with experts trained in DR. This chapter, therefore, first explores whether it applies, that domain experts have difficulties

interpreting multivariate projections. I describe a phenomenological analysis investigating if researchers of a Law Enforcement Agency (LEA) with no or only limited training in machine learning or advanced statistics can interpret the depiction of dimensionally-reduced crime data, and what their incentives are during interaction.

The study, among other things, reveals that the domain experts manually explored different subspaces to seek for patterns. This behavior was expected, because some attributes are, simply put, not relevant for building meaningful patterns. Also, some attribute subsets may be more feasible than others on the domain knowledge of the experts. It is a tedious task to manually explore the subspaces for meaningful patterns without prior knowledge about the data. On these grounds, different subspace analysis methods have been proposed with the aim to ease this task. However, many of these analysis methods produce an abundant amount of patterns, which often remain redundant and are difficult to relate. Creating effective layouts for comparison of subspace pattern remains challenging. Therefore, I also introducePattern Trailsin this chapter, which is a novel approach for visually ordering and comparing subspace patterns. Central to this approach is the notion ofpattern transitionas an interpretable structure imposed to order and compare patterns between subspaces. The basic idea is to visualize projections of subspace patterns side-by-side, and indicate changes in adjacent patterns by a linked representation (transitions). I demonstrate the usefulness of this approach by application to several use cases, indicating that data can be meaningfully ordered and interpreted in terms of pattern transitions.

This chapter is based on[109]and[105]:

Interpretation of Dimensionally-Reduced Crime Data: A Study with Untrained Domain ExpertsD. Jäckle, F. Stoffel, S. Mittelstädt, D. A. Keim, H. Reiterer. Proc. Int.

Conference on Information Visualization Theory and Applications, 2017.

Pattern Trails: Visual Analysis of Pattern Transitions in SubspacesD. Jäckle, M.

Hund, M. Behrisch, D. A. Keim, T. Schreck. IEEE Conference on Visual Analytics Science and Technology (VAST), 2017.

3.1 Introduction

Information is collected at large-scale in all areas of our day-to-day life: occurred crimes, statistical surveys of natural disasters or inhabitants, rankings of public institutions, or any tabular data that consists of multiple observations and attributes. The main task in understanding such multivariate data is to identify and interpret relevant patterns like dense groups (clusters), outliers, or correlations. Recent advances in machine learning propose DR to transform the data to a lower-dimensional space, preserving the main structure of the data. Results can be depicted in a 2D scatterplot, in which proximity between points indicates similarity. This abstract representation of DR results enables exploration of the structure but

3.1. Introduction

brings in challenges about interpretability of the visualization and how the different attributes are reflected in the lower-dimensional representation. The common understanding among visualization researchers is that this is held true, in particular for domain experts not trained in advanced statistics or machine learning.

To this end, several interactive systems have been presented in support of domain experts.

They typically build on top of a 2D depiction of results and enhance the interpretation via different additional interactive visualizations[214]. While the focus lies in improving the interpretability of DR results for domain-specific tasks, only little evidence is given that domain experts are indeed able to interpret the depiction of the data projection. State-of-the-art systems were evaluated in two different ways. Either by means of use cases and application examples or by a user study. The user studies, however, were carried out with domain experts specifically trained in DR[181]or with users unrelated to the field[189]. I argue that domain experts related to the data and tasks are differently motivated in pursuit of their goals compared to participants unrelated to the presented data and tasks. This effect is further amplified because untrained experts need first to learn how to read the depiction of DR results before they can interpret them. In conclusion and to the best of my knowledge, DR results have not been studied for domain-specific tasks including domain experts.

In the following, I report on a qualitative user study, driven by the question: Can untrained domain experts use their domain knowledge to interpret and steer the visual depiction of a data projection? I reached out to data analysts of a German LEA not trained in advanced statistics or machine learning. In research projects, the University of Konstanz has already gained great insight into their everyday work, typical tasks, and the challenges imposed by the huge amounts of data. The data analysts are eager to identify patterns among various data sources they have access to in order to leverage resources, identify suspects, relieve wrongly accused individuals, and more. So far, manual data analysis dominates their everyday work, for example, by creating tabular views of data, which enables them to compare different cases or data sources in the light of a specific information need. This is also held true for applications such as the comparative case analysis, where similarities and correlations among crimes are subject of work in a one-to-many comparison [1]. For example, a correlation between a crimecategoryanddistrictsin a subset of the data can be detected. However, a correlation among crimecategory,district,time,description, andday of weekis demanding without any automated data analysis and visual support, even in small subsets of the data.

The study shows that the domain experts of a LEA effectively adapt to abstract repre-sentations of the data if they are familiar with the tasks and the type of data. Also, the study reveals that the experts performed a manual subspace analysis. This means that they searched for relevant patterns in specific attribute subsets, namely the subspaces. Particularly in multivariate data with many attributes, patterns may only be found in smaller subspaces and would get lost when considering all attributes at once[17]. However, one cannot assume that patterns will be similar across different subspaces. Rather, one can expect they may be structurally different in different subspaces, posing the challenge to identify, interpret, and compare them visually. The performed manual subspace analysis by the domain experts is a tedious task. This asks for Visual Analytics methods to support the effective visual and

automatic analysis of subspaces.

Automatic subspace analysis and clustering methods provide sets of possibly interesting patterns and subspaces. While such methods drastically reduce the amount of possible attribute configurations by ignoring subspaces with a high attribute and pattern overlap, they entirely leave out the analyst[131]from the initial search process. They typically provide no hints on an appropriate ordering or on relationships among the reported subspaces. Hence, the amount of considered subspaces can, in fact, be reduced, yet it is challenging to explore the data to find interesting patterns.

In recent years, several approaches have been presented to explore multivariate data visually, and in particular patterns in subspaces. Parallel Coordinate Plots (PCP)[99]present a key technique for multivariate data analysis. Besides researched challenges such as axis re-ordering, one axis merely corresponds to one attribute making it a difficult task to compare different attribute combinations with each other. A prominent means of making subspaces visually accessible is the application of scatterplots. Scatterplot Matrices (SPLOMs), for example, enable pairwise comparisons between attributes, effective for small to moderate sized data sets[181].

Techniques using DR typically present one view on the data (like, for example, in the study with the domain experts) or present all subspaces via small multiples[203], making it challenging to identify interesting patterns and trace their meaning in different subspaces.

Even if applying subspace analysis before the visual exploration phase, subspaces can still be redundant and too many to identify relevant patterns. Automatic approaches require the user to adapt the analysis model based on the task at hand to retrieve relevant views on the data.

Therefore, I introduce a technique calledPattern Trails, that aims to address the unresolved question:How to identify and relate interesting patterns among multivariate subspaces, using interactive visual exploration?

In this chapter, I first report on the user study and results perceived through the eyes of the domain experts. Then, I present Pattern Trails, a Visual Analytics approach to find and interpret patterns across subspaces of multivariate data. In the context of the present thesis, this chapter contributes to the identification and interpretation of patterns in multivariate projections.

Im Dokument Projections for Visual Analysis of Multivariate Data : Methods for Identification, Interpretation, and Navigation of Patterns (Seite 42-50)