Research Questions and Approach - Visual Analytic Methods for Exploring Large Amounts of Relati

Related to the motivation, we are deriving several research questions, which will be described in detail in the following.

1. How can we support the exploration process for relational data with the help of matrix-based representations?

(a) How can enhance the expressiveness and effectiveness of matrix visualizations?

(b) Which interaction concepts help the user in exploring relational data in matrix visualizations?

2. How can we describe and quantify the interestingness of matrices wrt. its contained patterns?

(a) How can we measure the occurrence of specific visual features (i.e., patterns) contained in matrices?

(b) How can we derive interestingness scores depending on pattern descriptions for matrix-based representations?

3. How can we help the user in navigating and exploring large matrix spaces?

(a) How can we compare matrices, e.g., to allow for ’more-like-this’ queries?

(b) How can we support the user in defining queries for matrix patterns?

The first set of research questions focuses on theeffectivenessandusefulnessof matrix visualizations. While the standard row-/column matrix layouting paradigm already allows encoding a distinct data value with every screen pixel –an outstanding visualization characteristic only shared with a few other visualization techniques– more sophisticated interaction and exploration mechanisms allow a visual encoding of even more information.

We thereforeexperimentally exploreddifferent glyph designs for matrices that “appear”

based on a semantic exploration zoom level. This semantic zoom metaphor allows the user to gain iteratively more and more insight and information during the exploration process.

The second set of research questions addresses the problem of retrieving potentially interesting matrix views to support the exploration of networks. For this purpose, we developed Matrix Diagnostics (or MAGNOSTICS), a conceptual framework toevaluatethe usefulness of image feature descriptors for the retrieval of matrix patternsempirically. In spirit of related approaches for rating and ranking other visualization techniques, such as Scagnostics for scatter plots, the MAGNOSTICSfeature descriptor ranks matrix views according to the appearance of specific visual patterns, such as blocks and lines, indicating the existence of topological motifs in the data, such as clusters, bi-graphs, or central nodes.

c_{1,1} & c_{1,2} & \dots & \dots & \dots & c_{1,n} \\

c_{2,1} & c_{2,2} & \dots & \dots & \dots & c_{2,n} \\

\dots & \dots & \ddots & \dots & \dots & \dots \\

\dots & \dots & \dots & c_{i,j}& \dots & \dots \\

\dots & \dots & \dots & \dots & \ddots & \dots \\

c_{m,1} & c_{m,2} & \dots & \dots & \dots & c_{m,n}\\

\end{bmatrix}

(a)

Figure 1.1Visual matrix of numerical data (a) ordered randomly (b) and with three algorithms (c-e) revealing different patterns.

As an extension of the work of MAGNOSTICSand to contrast the approach of engineered (image)features, this thesis presents a learned feature approach, based on convolutional neural network (cf. Section 4.6) and compares both pattern retrieval approaches with respect to their efficiency and effectiveness (cf. Section 4.7).

While the first two sets of research questions relate to patterns and the visual ap-pearance of a single matrix, the third set of research questions focuses on the analysis of large sets of matrices and especially the pattern-driven navigation within these large view spaces. As one example, we developed the FDIVE(Feedback-Driven Interactive View Exploration), a conceptual and theoretical framework for the relevance feedback-driven exploration of large view spaces, which helps the user to intuitively define and refine his/her current notion of interest.

To be of practical use, we will present throughout this thesis several application sce-narios in which our approaches help the analysts to get a better insight into their (matrix) data sets. As an example, we will show in Section 4.8 how MAGNOSTICShelps to explore the temporal evolutionary changes in brain connectivity scans from the biological domain.

Another example is presented in Section 5.8 where we show how an interactive similarity steering helps to understand the specificities of denial-of-service attacks on computer networks.

Generally, our work can be subdivided intoSingle Matrix AnalysisandMulti Matrix Analysis. However, one has to note that, e.g., a comparative analysis of multiple matrices would not be possible if we neglect single matrix aspects, such as matrix ordering. There-fore, we present in Chapter 2 theoretical considerations on patterns in matrices and more generally, the visual appearance of matrices. Specifically, we will report in Section 2.3 on

the State-of-the-Art of matrix reordering algorithms with the analytic question “Which matrix reordering algorithm tends to produce which specific matrix patterns?”.

1.1.1 | Single Matrix Analysis

Related to the question of visual quality of matrix views is how a matrix is ordered. If a matrix is ordered “appropriately” interpretable visual structures are outstanding, as Fig-ure 1.1 depicts. We conducted a survey [Beh+16] to describe the impact and characteristics of matrix reordering algorithms depending on the dataset’s characteristics. This helps to solve parts of the question, which matrix reordering algorithm to choose for which analysis task at hand.

Figure 1.2Interactive Matrix Reordering: In an interactive user-guided approach the user can steer the reordering process by invoking a localized reordering algorithm. Ordering thumbnails on the left side allow the anticipation of localized reordering results without applying the transformation to the data. Here, the user selection leads to an improve-ment of the linear arrangeimprove-ment quality measure (5.64%).

However, most matrix reordering algorithms solve an optimization problem based on predefined local or global target criteria. They are more-or-less black-box algorithms;

the user has no control over results beyond the choice and parameterization of quality criteria. Due to the large search space, the algorithms use heuristics and may return a local optimum in certain circumstances. Additionally, their runtime and/or memory complexity is such that multiple runs with different parameterizations can be very time-consuming.

Therefore, we investigated means to interactively steer and guide the matrix reordering process during its progression and introduced in [Beh+14a] interactive visualizations (see:

Figure 1.2) that help to improve quantitatively measurable matrix ordering criteria and the qualitative user satisfaction.

Distance-to-Noise (higher better)

Distance-to-Base-Pattern (lower better) Low Distance High Distance

Distance-to-Noise (higher better)

Figure 1.3Final selection of MAGNOSTICSfeature descriptors for a quantification of the primary visual patterns in matrix plots.

In line with the question of visual quality, we also investigated the consensus of multi-ple matrix sorting algorithms following the hypothesis that if multimulti-ple sorting algorithms

“agree” on local substructures these submatrices might contain interesting patterns. Hence, we presented in [Beh+13] a visual approach for the comparison of sequentially ordered (or ranked) data, such as a matrix’s permutation of rows and columns. The approach relies on a small-multiple view of glyphs each of which visually contrasts a pair of rankings.

The glyph, in turn, is defined by a radial node-link representation which allows effective perception of agreements and differences in pairs of rankings. With this visualization, we can spot patterns of similarity and differences in sets of orderings.

1.1.2 | Multi Matrix Analysis

The exploration and navigation in large matrix spaces is another central research focus which we subsume under the term “Multi Matrix Analysis”. Therefore, we investigated methods to interactively and (semi-)automatically support users during the exploration, e.g., occurring in dynamic application scenarios. Since matrices are mostly perceived as a static visualization technique, little research has been conducted in the field of dynamic and multivariate matrix spaces. We developed, on the one hand, clustering and classification approaches and on the other hand information retrieval approaches, which support the user facilitating navigation and exploration tasks.

However, a pattern-driven exploration is not possible without measures that allow assessing the presence or salience of matrix patterns quantitatively. Quantifying patterns in visualizations typically requires heuristic feature-based approaches that respond to the (potentially) interesting structural characteristics of a visualization. These methods try to mimic human perception in that they distinguish one or more visual patterns from noise. While many feature descriptors (FDs) for image analysis exist, there is no evidence how they perform for detecting patterns in matrices. In order to make an informed choice for the primary visual patterns in matrices, we evaluate in [Beh+17] 30 FDs, including three new descriptors that we specifically designed for detecting matrix patterns. Using a controlled benchmark data set of 5,570 artificially generated matrix images, we evaluated each FD on four criteria: pattern response, pattern variability, pattern sensibility, and pattern discrimination.

As the final result of MAGNOSTICSwe derived a set of six FDs that helps us to quantify the presence of matrix patterns as depicted in Figure 1.3.

In [Beh+14a] we also investigated the question: Can we develop visual analytic meth-ods that support the user in a comparative analysis of large sets of matrices? In contrast to the image space approach of MAGNOSTICS, our approach here considers the row and/or column vectors of a matrix as the basic elements of the analysis. We project these data vectors for pairs of matrices into a low-dimensional space which is used as the reference to compare matrices and identify relationships among them. Bipartite graph matching is applied on the projected elements to compute a measure of distance. A key advantage of this measure is that it can be interpreted and manipulated as a visual distance function, and serves as a comprehensible basis for ranking, clustering, and comparison in sets of matrices. We present an interactive system (see: Figure 1.4) in which users may explore the matrix distances and understand potential differences in a set of matrices. A semantic zoom mechanism enables users to navigate through sets of matrices and identify patterns at different levels of detail.

Another line of research tackles the question how computers can effectively support users in exploration tasks. This question originates from the fact that users are often con-fronted with the problem of identifying interesting views in which a manual exploration of the entire view space is ineffective or even infeasible. While certain quality metrics have been proposed to identify potentially interesting views, these often are defined in a heuristic way and do not take into account the application or user context. To tackle some of these challenges, we introduced in [Beh+14b] a framework for a feedback-driven view exploration, inspired by relevance feedback approaches used in Information Re-trieval. The basic idea is that users iteratively express their notion of interestingness when presented with candidate views. From that expression, a model representing the user’s preferences is trained and used to recommend further interesting view candidates. A decision support system monitors the exploration process and assesses the search process

Figure 1.4Projection-based Matrix Comparison: In a semantic zoom interface users can explore distances between matrices (a) (here: 100 matrices; ordered by time stamp). Starting from an overview distance meta-matrix (b) showing the pairwise distances between matrices, users can identify patterns (e.g. strong groups or outliers). Having found such patterns, users can investigate the impact of matrix size variations on the distance calculation (c) and steer it using a simple set of interactions (d) and (e).

for convergence and stability. We presented our approach with an instantiation of our framework for the exploration of large scatter plot spaces based on visual features and demonstrated the effectiveness by a case study on two real-world datasets.

Im Dokument Visual Analytic Methods for Exploring Large Amounts of Relational Data with Matrix-based Representations (Seite 17-22)