Interactive Visualization of Huge Graphs Using Kernel Functions for Aggregation

(1)

Universit¨at Konstanz

Chair of Computer Graphics and Media Design Department of Computer Science

Master Thesis

Interactive Visualization of Huge Graphs

Using Kernel Functions for Aggregation

Interaktive Visualisierung großer Graphen mittels aggregierender Kernel Funktionen

for obtaining the academic degree Master of Science (M.Sc.)

Field of study: Information Engineering by

Michael Zinsmaier

First advisor: Prof. Dr. Oliver Deussen Second advisor: Prof. Dr. Ulrik Brandes

Konstanz, April 4, 2012

(2)

Declaration of Authorship

The author of this work hereby declares that

the present work is the result of his own work, without help from others and without using anything other than the named sources and aids.

the texts, illustrations and/or ideas taken directly or indirectly from other sources (including electronic resources) have without exception been acknowledged and have

been referenced in accordance with academic standards.

The author wants to gratefully acknowledge supervision and guidance he has received from Hendrik Strobelt.

Konstanz, April 4, 2012 Michael Zinsmaier

(3)

Abstract

I propose a new graph visualization technique that allows the interactive exploration of large datasets with adjustable level of detail. The approach is designed to exploit the architecture of common graphics hardware and achieves interactive speed through parallelization. To address information hiding through node and edge clutter, I replace the standard node link diagram with a novel combination of a continous node-density visualization with discrete meta-edges. Additionally, I present a zoom & pan based user interface that allows the intuitive exploration of the data at distinct aggregation levels and supports filter and detail on demand mechanisms to address the characteristics of different datasets and research questions.

(4)

1 Introduction

Graph visualizations and scatterplots are common tools to present data in various research domains like finance, sociology, and the natural sciences.

Graph visualizations express relational data. For instance traffic connections and social networks but also XML files or dependencies in software systems can be modeled with graphs. Formally a graph can be defined as G(V, E), with a set of vertices V representing the data members and a set of edges E(v, v), defining the relationships between them.

The most common visualization of this model is the “node link diagram” which represents vertices as small graphical elements (e.g. circles) and edges as lines connecting the vertices.

The required vertex positions may be part of the dataset, for example geo-coordinates for traffic maps. Alternatively, there are numerous layout algorithms that can be applied to graphs in order to create meaningful vertex positions.

Scatterplots visualize the distributional characteristics of two attributes from a dataset.

The values are mapped to a two-dimensional coordinate system, and a small graphical element (often a single pixel) is used to represent one element of the dataset. The elements can be added to the visualization by defining the two coordinate axes, and the position of the data points expresses the visualized information. Scatterplots can be seen as graphs without edges |E|= 0 and fixed positions for each vertex v ∈V.

The presented application domains share a rapid growth of available data volumes. For instance, the advent of social networks has made relational data sources with millions of individuals available to the social sciences. However, node link diagrams and scatterplots do not scale well with increasing data sizes. Both techniques are based on distinct graphical elements and suffer from overdraw and clutter of vertices and edges. Switching to a continuous visualization metaphor can circumvent these problems. Liere and Leeuw [38]

propose a density based continuous visualization of graph vertices and Lampe and Hauser [14] connect this technique to related scatterplot approaches and kernel density estimation (KDE). Moreover, they extend KDE to edges and describe an efficient GPU implementation

of the approach to allow the interactive visualization of streaming data.

As a contribution to this field, I suggest a new node-density meta-edge visualization that is applicable for the interactive visualization of large graphs at different level of detail. I define a graph hereby as large if it fits into video memory but cannot be rendered as a node link diagram without significant overplotting. The suggested visualization extends the node rendering technique of Lampe and Hauser [14] with spatial data structures and a new rendering technique to increase performance. Furthermore the obtained density field is used to merge edges into space efficient aggregates. The resulting visualization can be explored with a zoom based user interface and intuitive KDE bandwidth manipulations. Additionally the visualized elements can be modified with flexible coloring and normalization functions and the system provides an intuitive mechanism for node labeling. The implemented image

(6)

1 Introduction 1.1 Related Work on Node Aggregation

space algorithms allow hereby the interactive exploration of point datasets with up to∼10⁷ nodes and graphs with up to ∼10⁶ edges on today’s consumer hardware.

The remainder of this thesis is organized as follows: Section 1.1 - 1.3 give an overview of related clutter reduction techniques followed by an introduction of KDE in section 1.4.

The subsequent sections 2 - 4 include a detailed description and analysis of the proposed approach with respect to node aggregation, edge aggregation, and the visualization of the obtained results. The zoom based interaction paradigm is presented in section 5 and implementation details can be found in section 6. Finally, section 7 includes a discussion of two use cases and an analysis of the most important performance factors.

1.1 Related Work on Node Aggregation

Scatterplots represent each data point as a position in space and, due to small screen resolutions, points that are located close to each other often fall into one pixel. Visualizing the amount of overplotting is a common approach to avoid misinterpretations caused by hidden data values. The method is related to one-dimensional histograms and appropriate bins have to be defined in order to accumulate the data. Often, the bin size equals one pixel and color encodings or symbols can be used to represent the amount of overplotting (= the local density) [11]. Furthermore, alpha blending is an implicit form of pixel based

density binning that is often used in combination with other approaches.

However, data binning cannot truthfully reflect the local density of a given dataset because the borders of the bins introduce artefacts into the visualization; consider for example the case in which all accumulated points fall near the four borders of a pixel. Even a small shift in the point’s position alters the local density of the pixel and its neighbors. As a solution to this problem, basis functions can be used to spread the influence of each point in a local environment. Leeuw and Liere [38] for instance apply Gaussian kernels to the vertices of a graph in order to create a continuous density field.

Alternatively, node displacement approaches [25] can be used to reduce overdraw. Node displacement is related to optical distortion techniques but does not depend on a focus point and can therefore resolve overdraw for the whole dataset. However, the technique alters the shape of visual clusters, especially in dense areas. One solution to this problem is interactive blending between the original scatterplot and the overdraw free visualization to simultaneously address overdraw and shape distortion [25].

However, a global solution is not always necessary, for instance explorative and search driven applications can use zoomable user interfaces and optical distortion techniques [10]

to increase the amount of manageable data.

Finally, instead of visualizing all data points, clustering techniques and filters can be applied to reduce the amount of displayed information.

(7)

1 Introduction 1.2 Related Work on Edge Aggregation

1.2 Related Work on Edge Aggregation

Figure 1: Different edge aggregation methods applied to K3,3: (a) the node link diagram, (b) a hierarchical method, (c) density based edge rendering, (d) force directed edge bundling, (e) confluent drawing, (f) confluent drawing with traffic circle heuristic and (g) the proposed approach that combines node density and density guided edge agglomeration.

Apart from node overdraw, the standard node link diagram representation of graphs has to deal with edge clutter, which becomes a major problem for much smaller datasets. Node layout algorithms can increase the readability of graphs and reduce edge clutter. For instance, node distribution can be improved to untangle the edges and avoid unnecessary edge crossings. But except for the special case of planar graphs, node layouting cannot solve the problem alone. Moreover, node positions themselves may encode information as in geographical visualizations and a fixed layout is often desirable in these cases.

Alternatively, distortion methods like the fisheye lens [33] can be used to provide more screen space to interesting areas, but the local improvement is traded against increasing problems in the surrounding area. As a third option, the drawing paradigm for node link diagrams can be altered to be more space efficient. Becker [8] suggests drawing edges only half of the way to their target. These “half lines” reduce the amount of crossings and edge clutter but make tracking edges more difficult.

Apart from such techniques that try to improve the perception of single edges there are many approaches that focus on edge aggregates. Such aggregates can be loose edge bundles, edge representatives (like meta edges) or edge density patterns. In either case the complexity of the picture is reduced and higher level structures become visible, provided that the aggregates are chosen wisely. In the following, I will present several edge aggregation methods that can be roughly classified as hierarchic, confluent drawing, bundle oriented and density based (see fig. 1).

(8)

1 Introduction 1.2 Related Work on Edge Aggregation

Hierarchic methods define a “pyramid” of simplified versions of a graph. The pyramid is often constructed by recursively joined nodes, such that each level approximates the previous.

Appropriate candidates for merging can for instance be determined with, a geometry based proximity graph [20] or geometric clustering [30]. Furthermore, it is possible to improve clustering with graph theoretic measures [20] or to integrate clustering and layouting [30].

The construction of the pyramid normaly requires offline computations except for datasets that contain a natural hierarchy. Hierarchical approaches provide a high level of abstraction and the user can often explore the data by expanding/collapsing of meta nodes. Reducing the amount of visible elements allows hereby the interactive visualizations of graphs with several million edges [7]. (see fig. 1 (b))

Confluent drawing represents graphs without edge crossings. Dickerson et al. [15]

propose for instance a heuristic algorithm that is based on the detection of large cliques and bicliques that can be resolved with “traffic circles”. However, not all graphs can be confluently drawn and “the complexity of deciding whether a general graph is confluent or not still remains an open problem” [15]. (see fig. 1 (e),(f))

Bundle orientedtechniques join the common parts of edges to bundles, like an electrician combines wires on the way to their individual destinations. In most edge bundling visualizations, the initial nodes and edges remain as discrete elements in the visualization. Edge bundling algorithms generally scale to medium graphs (∼1−10k edges) however they are not designed for realtime application but for the generation of good bundles. While all edge bundling approaches share a common analogy, they differ significantly in how they compute the bundles and also in the properties of the achieved visualizations. For graphs with only a few sources, flow map layouts [29] yield good results and for hierarchic graphs a method proposed by Holten [21] can be suggested.

For general graphs there are several recent publications, starting with Geometry Based Edge Bundling (GBEB) [13], which uses a control mesh to guide the bundling process.

Similar results can also be achieved with Force Directed Edge Bundling (FDEB) [22] (see fig. 1 (d)) or image based using 2D skeletons and clustering [18]. Apart from the different implementations, the main advantages of the latter two are better curvature control and more flexibility with respect to potential extensions. Winding Roads [26] proposes another geometry-based rendering technique which allows “road like” edge bundling to prevent node edge overlaps. However, the “road” visualizations are jaggier and following the edges is more difficult. Therefore, the authors also describe how to convert the aggregates into the standard spline driven bundle visualization. Finally, MINGLE [19] uses a proximity graph to select bundling candidates and introduces a bundling measure that is based on minimizing the ink needed to draw the edges. The algorithm allows fast bundling of edges and the authors apply MINGLE to a variety of different graphs and graph sizes.

Density based approaches alter the discrete node link diagram to allow continuous edge representations. The overplotting problem is resolved by visualizing the amount of overdraw instead of the discret elements. Lampe and Hauser [14] define “line kernels”, which basically

(9)

1 Introduction 1.3 Rendering Edge Aggregates

extend kernel density estimation to edges. The resulting smoothed representation of edges replaces the discrete objects and therefore facilitates the overall visualization. Moreover the aggregates can be created with fast image processing operations and the approach is applicable for streaming data [14]. Additionally, the common technique of alpha blended edge rendering can be seen as density based approach. (see fig. 1 (c))

1.3 Rendering Edge Aggregates

Each of the above methods combines similar edges e into aggregates e_a. These aggregates can cumulate edges explicitly with a clearly defined set of members (e.g. hierarchical approaches) or implicitly as a visual pattern in the result (e.g. density based approaches).

Furthermore, there are mixed methods that define, for example, an explicit relation between edges and aggregates but visualize it implicitly with loose edge bundles. Table 1 categorizes the presented approaches according to their aggregate definition and representation. The aggregate definition determines hereby the applicability of per aggregate measures and the options for coloring while the aggregate representation influences the choices for the rendering order and geometry creation.

Class Paper

Definition Representation

implicit explicit implicit explicit

/ ea⇔ei, ej, .. (visual pattern) (representative)

Hierarchic

[20] x x

[30] x x

[7] x x

Confluent Drawing [15] x x

Density Based [14] x x

Edge Bundling

[29] x x

[36] (*) x x

[21] x x

[13] x x

[26] x x

[19] x x

[18] x x

[22] x x

me x x

Table 1: Categorization of related edge aggregation techniques with respect to the aggregate definition and aggregate representation. (The marked (*) approach does not define own aggregates, but clusters and samples the results of other edge bundling methods in order to transform their output into explicit bundle shapes.)

(10)

1 Introduction 1.4 Preliminaries

Rendering implicit representations: Edge bundling approaches bend individual edges to visual bundles. Often, the implicit representation makes it difficult to highlight important aggregates. A common way to deal with this problem is to interpolate between the original edges and their aggregated tightly bundled version [18, 22, 21] in order to achieve some kind of bundle diameter. This interpolation is often user controlled and can, in the limit, show the full transition between unaltered edges and bundles. An ordering scheme applicable to these methods is suggested by Holten [21] and is based on the observation that short edges tend to get buried under long edges and become visually unimportant. Therefore, Holten suggests to render short edges on top.

Other implicit representations [14, 26] accumulate the influence of edges in a local environment and form a density based representation of edges with high densities refering to important areas and vice versa. Note, that a density approach has the advantage that the result is independent from the rendering order.

Rendering explicit representations: Instead of treating individual edges, the aggregates themselves can be converted to geometry (e.g. [30, 15, 36]). Rendering such representatives allows the direct visualization of aggregate based measures like the number of underlying edges or the main direction by influencing the geometry and rendering order. Important aggregates can for example be translated into bigger objects (line thickness, shape) and can be rendered on top of the others.

With geometry at hand, appropriate coloring and shading techniques have to be chosen.

Coloring can be used to express the direction of the original edges [13] or encode the transition between source and target [21]. Moreover explicit aggregate definitions allow the calculation of statistical measures and the assignment of individual colors per aggregate.

GBEB [13] displays for instance variations in the directions of the underlying edges and aggregate colors can be assigned to bundles [18] or representatives [36]. A common choice for coloring, however, is to support the perception of important structures either using calculated values (per defined aggregate) [13] or by measuring the density/overdraw for each pixel [14, 26, 22].

Shading techniques can furthermore support the coloring, for example by visually adding height information to improve the contrast [26, 19] or using cushion shading to indicate aggregate/cluster memberships [36, 18].

1.4 Preliminaries

In the following, I give a brief introduction of Kernel Density Estimation (KDE), to provide the mathematical background for the continuous node representation of the proposed visualization.

KDE is a well proven way of estimating the probability density function (PDF) of a random

(11)

variable. It was independently created by Rosenblatt and Parzen about 50 years ago [31, 28]

and is also known as Parzen-Rosenblatt window.

Given a finite set of observed data points from a population, the PDF can be estimated with “parametric” and “non parametric” methods. Parametric methods assume a certain distribution family that is described by a set of parameters and fit these parameters to the observed data (e.g. the normal distribution with mean µand variance σ). KDE makes no such assumptions and therefore belongs to the “non parametric” estimators. As Silverman says [35] “...the data will be allowed to speak for themselves...”.

The kernel density estimator f_h(x), x∈ {x₁, . . . , x_n} (eqn. 1) with bandwidth parameter h, h >0 and kernel function K can be interpreted as approximation of the PDF of x under the following conditions: R

f_h(x) = 1, f_h(x)≥0

f_h(x) = 1 nh

n

X

j=1

k

x−x_j h

(1)

There exist many kernels that fulfill these conditions, for example the triangle kernel, the cosine kernel and the Epanechnikov kernel. However, as f_h(x) inherits the continuity and differentiability properties of K, the normal kernel (eqn. 2) is often chosen because of its mathematical properties. Moreover, it is well known that “...the choice of h is much more important for the behavior of f_h(x) than the choice of K.” [37].

K(x) = 1

√2πe⁻¹²^x² (2)

Inappropriate bandwidth values result in over-smoothing or jagginess (see fig. 2). To avoid these artefacts, several data driven methods for bandwidth selection have been proposed. For example, Silverman’s rule of thumb [35], leads to optimal results if the data is distributed normally and Turlach [37] gives an overview of several global optimization methods for other univariate distributions. For multivariate distributions, the problem becomes computationally more complex, but it can, for example, be tackled with the Markov chain Monte Carlo algorithms [24].

So far, I have introduced KDE in the one-dimensional case. The extension to multiple dimensions is, however, straight forward; equation 1 has to be altered to equation 3.

(12)

-2 -1 0 1 2

0.00.20.40.60.81.01.2 triangle

h to small h to big

Figure 2: Reconstruction of a triangle distribution from 100 samples (black dashes). Too small (orange) or too big (green) bandwidths hinder the reconstruction of the original distribution (grey).

f_H(x) = 1 n

n

X

i=1

K_H(x−x_i) (3)

with K_H defined as:

K_H(x) = |H|⁻¹²K(H⁻¹²x) (4) where H is a symmetric and positive definite bandwidth matrix.

For the remainder of the thesis, H is bound to the viewport and becomes an interactive parameter that can be used to explore the data similar to [14, 38]. Therefore,H is restricted to multiples of the identity matrix. Note however, that the restrictions on H can be bypassed by appropriate scaling and rotating of the input data. Also, similar to [14, 38], the kernel function is set to the normal kernel. This can be justified by its useful mathematical properties and its prominence in the scientific community, which may be helpful to the reader when interpreting the results.

(13)

2 Node Aggregation 2 Node Aggregation

2 Node Aggregation

Visualizations of large point datasets with distinct graphical elements suffer from overdraw and information hiding. In order to avoid these problems, continuous visualizations of scatterplots and layouted graphs [11, 38, 14] have been suggested. Such representations of points as a density field can provide an overview of large amounts of data without neglecting the influence of a single node. Moreover, the KDE-based creation of the density field can be implemented on the GPU and allows the fast processing of many points [14]. In the following, I present such an implementation. Furthermore, I propose two methods that extend the GPU rendering approach presented by Lampe and Hauser [14] with spatial data structures and a new pre-rendering step to allow the processing of millions of points at interactive rates.

2.1 The Density Field

A density field can be defined by the generation of a continuous 2D function from the nodes of a graph G(V, E). This field can be visualized directly and standard user interactions, like zooming and panning, can be used to explore the data.

Let g be a graph with n vertices v_i, i ∈ {1, . . . , n}. Each vertex v_i has a position p_i = (x_i, y_i)∈R×R, i∈ {1,2, . . . , n}.

Definition:

D_K_H(x, y) = anf_K_H(x, y) =an1 n

n

X

i=1

K_H(x−x_i, y−y_i) = a

n

X

i=1

K_H(x−x_i, y−y_i) (5) where f_K_H is a 2D kernel density estimator and the factor a is selected such that each vertex contributes a weight of one at its position. I choose K_H as the bivariate Normal Kernel with

K(x, y) = 1

2πe⁻^x2+y

2

2 (6)

The result is a continuous 2D field, constructed by summing up n basis functions.

In order to obtain a scalar value at each pixel, D has to be sampled in the visible interval.

To create the required grid of values, the contribution of the kernels has to be aggregated.

However, the unlimited extent of the Gaussian Function results in an equation with n subterms that has to be solved for each cell. In order to reduce the calculation complexity, the fast decreasing bell shape of the kernel can be exploited by limiting it to a certain interval r, as shown in equation 7 and figure 3(a). Based on preliminary experiments, I

(14)

2 Node Aggregation 2.1 The Density Field

(a) (b)

Figure 3: (a) r of 1,2 and 3 on gauss_1D(x, σ = 1)

(b) setting texture corners to zero to enforce constant r

suggest a value of r≥4 to control the visual effects of this approximation.

K_r(x, y) =

(K(x, y), p

x²+y² ≤r

0, else (7)

The calculation of the DensityField can be implemented in standard OpenGLwith additive blending of textured rectangles [38, 14]. The grid is mapped to aframe buffer object (FBO), such that one cell equals one pixel and the shader pipeline is used to solve the above equations. Instead of adding up the vertex influences for each cell, Lampe and Hauser [14]

reorganize the equations and spread the contribution of each vertex to the neighboring pixels (algo. 1).

The geometry shader can be used to map this technique efficiently to rasterization by replacing the vertices with rectangles that cover the extent of the approximated kernels (eqn.

7). These rectangles are defined with a side lengthl(H, r) and textured with precomputed kernel solutions. Standard additive blending can therefore be used tospread the contribution of the nodes to the covered pixels. Note however, that mapping around kernel to a quadratic texture requires special treatment of the corners (fig. 3(b)), and blending between texture levels introduces additional discretization errors.

Algorithm 1: createDensityField input : Vertices

output: the DensityField

Cells = emptyGrid(width, height);

for v in Vertices do for c in Cells do

c += K_H(c - v);

return Cells;

(15)

2 Node Aggregation 2.2 Point Region Quadtree

Figure 4: (a) the direct conversion of vertices to geometry creates three rectangles (b) a quadtree can merge the two close-by vertices and eliminate one rectangle (c) alternatively, the vertices can be accumulated in a texture and a second render

pass can be used to expand them to two weighted rectangles (Seed Point Method).

Lampe and Hauser state that an “..up to approx 300 times large speed-up..” [14] compared to a Matlab implementation can be achieved on a 1024x1024 grid with these techniques.

As further improvement, I propose two approaches that reduce the rasterization overhead, caused by entirely overlapping rectangles (see fig. 4).

There exists a distance d such that two vertices v ∈V with d_euclid(v_i, v_j) ≤ d cannot be distinguished. This distance can be defined through the limits of human perception, the physical capabilities of the display or the resolution of the grid. In the context of this thesis, the latter two are identical and the distance is chosen as d≤1 pixel. Therefore, one weighted rectangle can be used to replace all rectangles that have their center at the same pixel. Note however that d is not constant because the world coordinate size of a pixel changes as the user interacts with the data. An optimal solution would be able to efficiently determine the correct set of meta nodes for every value of d, but approaches that aggregate many nodes with deuclid(vi, vj)≤d and guarantee to separate nodes withdeuclid(vi, vj)> d are sufficient.

2.2 Point Region Quadtree

The PR quadtree [32] fulfills the aforementioned requirements (see fig. 4 (b)). It is constructed by recursively decomposing the image into four disjoint regions of equal size s, until no region contains more than one point. For a certain distance d, all meta nodes from the deepest level with s≤d and all leaf nodes from the levels l_j, j ∈ {1, . . . , i}, have

(16)

2 Node Aggregation 2.2 Point Region Quadtree

to be chosen. While the PR quadtree can only approximate the optimald, it provides a significant speedup and is easy to implement and well understood. A disadvantage of the implementation is that the maximum depth does not depend on the number of nodesn but only on the the minimal distance between them. This can be compensated by introducing a minimal region size that guarantees an appropriate worst case. Nevertheless, on near uniform distributed datasets, the minimum depth of dlog₄(|V| −1)e is more accurate and in practice the quadtree is often just a small factor larger than the original data (see tab. 2, section 7). Advantages of the PR quadtree are the order-independent (greedy) construction and the regular cells that allow small storage consumption per node.

Figure 5: Comparison of render and load times of a hierarchical graph with 1000000 vertices.

To load a certain quadtree level, either all corresponding nodes (level based) or only those in the viewport (tree based) can be sent to the GPU. As a third option, all nodes of the quadtree can be stored in graphics memory and the GPU filters them during rendering.

Interacting with the data will not only change d (zooming), but also the visible parts of the dataset (panning). To address both issues, the rendering system could execute efficient area and depth queries on the quadtree to select the appropriate nodes (tree based approach). But graphics hardware is designed to efficiently clip geometry that is not part of the viewport, therefore less developed approaches should be considered too (see fig. 5). I propose a second technique, where for each tree level the leaves and inner nodes are stored in separate continuous memory areas. This simplifies the construction of the node set to d+ 1 memory-copy operations on the CPU but increases the workload on the GPU (level based approach). As a third option, the level based approach can be implemented on the GPU by adding a level attribute to each node and loading the entire quadtree to the GPU (data on GPU). Experiments showed that the rendering time depends mainly on the number of actually rasterized rectangles and the size of these rectangles, while the clipped geometry can be neglected. Figure 5 shows a strong dependency between the number of actually rendered nodes (part of the viewport) and the rendering time. The rendering method on

(17)

2 Node Aggregation 2.3 Seed Point Method

the other hand has nearly no influence, although the amount of geometry information on the GPU differs significantly between the three approaches.

2.3 Seed Point Method

As stated above, the rendering time for the presented visualization pipeline depends mainly on the amount of rasterized rectangles and their size. Instead of approximating the node merging distance d with the quadtree, it would therefore be preferable to match it exactly.

However, to my knowledge there exists no data structure with speed and space requirements comparable to a quadtree that supports node merging queries for arbitrary pixel sizes.

Moreover the quadtree experiments (see fig. 5) show that merging and uploading of the nodes is an expensive process on its own and the third rendering method (data on GPU) that avoids these costs can only be implemented for a finite set of merging levels.

As a contribution, I therefore propose a rendering method that achieves pixel based node merging without the usage of a spatial data structure (see fig. 4 (c)). Instead of converting the nodes to textured rectangles, they are rendered as points with activated additive blending. This results in a grid of values where one cell matches one pixel and the value of the cell equals the number of nodes that have been merged together. In a second renderpass, this grid is bound as texture and for all cells with a value greater than zero a weighted rectangle is created in the geometry shader (part of the appendix). In other words, the merged points in the grid are used as seed points to create the texture rectangles. Note, that the grid has to be chosen larger than the current viewport in order to include the influences of nearby nodes.

This method minimizes the number of rectangles needed to create the DensityField and therefore saves rasterization time in the second render pass. But it introduces additional costs for the texture examination in the geometry shader and for blending operations during the first render pass. Experiments show (see fig. 6) that the Seed Point Method achieves similar results to the quadtree on most datasets and outperforms it if the kernel size is large. For data sets with several million nodes the first render pass can still become expensive. Instead of blending all nodes by brute force, the Seed Point Method can be used to refine the quadtree approach. With this combination, the amount of blending operations during the first render pass is reduced by the quadtree while the Seed Point Method is used to minimize the number of created rectangles in the second render pass.

The approach optimizes the visualization pipeline with respect to the rasterization steps.

However, for extreme cases like the Europe dataset, other parts of the graphics pipeline can also become the bottleneck. The measurements on the GTX590 (see fig. 6 Europe data) show for instance an outlier for the combined approach that cannot be explained with raster operations but is most likely caused by the memory management of the graphics card.

(18)

2 Node Aggregation 2.3 Seed Point Method

While dependencies on specific hardware features, like the texture unit to shader ratio or the memory clock rate, make it difficult to determine the best solution with respect to the render speed, the Seed Point Method implementations scale better with increasing kernel sizes. Moreover the result of the first render pass can be reused to speed up the edge aggregation (see section 3.2). I therefore recommend the Seed Point approach or the Combined approach and among these, I choose the unaltered Seed Point Method that benefits from lower storage requirements and does not require any pre-processing of the data.

net50 4.5M germany europe

rendering time HD6950

time in seconds 0.00.51.01.5

rendering time GTX590

rendering time HD5770

rendering time Quadro Plex

#nodes: net50 (1.6 E4) 4.5M (9.9 E4) germany (4.3 E6) europe (3.7 E7)

X XX X

Figure 6: Comparison of node rendering times using the quadtree, the Seed Point Method and a combination of both on different graphic cards. The Seed Point Methods scale better with increasing kernel sizes because they minimize the number of textured rectangles that have to be combined to create theDensityField.

(the Europe dataset cannot be loaded with the quadtree nor the com- binded approach on the HD5770 (1GB RAM) because the quadtree requires too much video memory)

(19)

3 Edge Aggregation 3 Edge Aggregation

3 Edge Aggregation

Visualizing large graphs requires the aggregation of edges to reduce clutter and reveal higher level structures. Hierarchical methods seem to be a good choice for the presented approach as they reduce the amount of displayed elements and can scale to big datasets.

However, experiments with a geometry based hierarchical approach that made use of the node quadtree highlighted some problems. The simple quadtree hierarchy does not contain enough aggregation levels to allow a traceable transition between different levels. Moreover, the start and end points of the edge aggregates often do not fit to the visualized node patterns and it is difficult to recognize the graph structure in the visualization.

More sophisticated approaches from related publications provide adequate edge hierarchies.

However, they often rely on costly pre-processing and it is not clear how such edge aggregates can be coupled with the density based node representation. Therefore, I propose a new density based edge aggregation approach that allows the interactive visualization of ∼10⁶ edges on today’s consumer hardware. Unlike hierarchic methods, no pre-processing is required and the node and edge representations have a natural connection. The aggregation of edges is guided by the DensityField, such that visually important node regions attract edges. The results can be seen as user controlled density driven clustering of the nodes.

3.1 Edge Aggregation by Hill Climbing

Figure 7: DensityField guided edge aggregation and filtering. The black lines and squares represent the original graph that is aggregated by climbing the DensityField (i.e.

following the green arrows). The blue squares are the highest cluster points that attract the edges and the orange lines represent the edge aggregates.

I suggest using the already calculated DensityField to guide the edge aggregation. The DensityField is interpreted as height field and a hill climbing algorithm is used to cluster nodes according to the local maxima they belong to. Subsequently, the edges are transformed

(20)

3 Edge Aggregation 3.2 The Evaluation Field

(a) (b) (c)

Figure 8: Effects of increasing bandwidth on a dataset from OpenStreetMap consisting of all tagged buildings in Germany (a total of 4.3 million nodes). The blue areas visualize the DensityField and the black lines represent cluster borders that can be obtained from the hill climbing algorithm.

by moving their start and end nodes to the unique highest point of their clusters (see fig.

7). Similar to hierarchical approaches, edges between nodes from the same cluster vanish while edges that connect different clusters get aggregated at representative points. Note, that the cluster centers correspond directly to prominent areas in the node visualization and that the user controls edge aggregation by manipulating the bandwidth matrix. A big bandwidth smoothes the DensityField and leads to fewer and bigger clusters, while a small bandwidth produces more local maxima which correspond to more and smaller clusters (see fig. 8 and fig. 19). In the following, I split this algorithm into two steps, the creation of an image space solution for the hill climbing algorithm called the EvaluationField and the aggregation of edges with an overdraw based representation into the LineField.

3.2 The Evaluation Field

The EvaluationField associates pixel positions of nodes with their cluster centers and will be useful for edge aggregation and interactive node labeling. The field is defined in image space and allows the hill climbing algorithm to be decoupled from actual geometry. Using solely the quadtree all pixels have to be addressed and a viewport-wide solution is created.

However the Seed Point Methods enable the set of pixels to be reduced to those that actually contain nodes.

(21)

3 Edge Aggregation 3.2 The Evaluation Field

Algorithm 2 transforms a DensityField to an EvaluationField. For each cell / each Seed Cell, the algorithm iteratively inspects the surrounding 3x3 neighborhood until it reaches a local maximum. In other words, a greedy hill climbing algorithm is applied to the DensityField.

The for-loop (of algo. 2) can be executed independently for each cell and is therefore well suited for a GPU implementation in the fragment shader. Note that the “isSeedCell”

statement is only available if one of the Seed Point Methods has been used to create the DensityField. With the pure quadtree approach all pixels have to be processed.

Algorithm 2: evaluateDensityField input : DensityField

output: the EvalField

Cells = emptyGrid(width, height);

for c in Cells do

if isSeedCell(c) then current = c;

repeat

for n in Neighbours(current) do

if DensityField[n] > DensityField[position(current)] then current = n;

until stable(current);

c = position(current);

return Cells;

This GPU implementation allows the computation of the EvaluationField in a few mi- croseconds in the average case. And even in the worst case, when the shading units have to inspect large parts of the DensityField, these calculations will not become a limiting factor.

While the EvaluationField can be efficiently constructed in image space, it inherits the limited extent of the DensityField and is not defined outside the current viewport. Edges that leave the visible area fan out and panning operations can influence the aggregation of edges. To solve these problems, the EvaluationField has to be extended beyond the borders of the viewport. But even a small increase comes with high costs, as the expensive computation of the DensityField has to be done for a bigger area. However, for the purpose of edge aggregation, it is sufficient to approximate the DensityField by slightly lowering the sampling rate.

I suggest creating a second DensityField that is extended by a factor of two but sampled with half the resolution. The two DensityFields can then be used to create an extended EvaluationField that is sufficient to make the edges robust against viewport borders and panning.

(22)

3 Edge Aggregation 3.3 The Line Field

3.3 The Line Field

The EvaluationField associates each vertex v ∈V with the highest point of its clusterc∈C and thus connects an edgee ∈E, E⊆V ×V with its aggregate ea∈Ea, Ea⊆C×C. To render these aggregates, the sum of weights of their edges has to be calculated. Instead of explicitly calculating a set of edge aggregates, I propose to render the unaggregated edges at the positions of their aggregates. The EvaluationField can be bound as texture and the geometry shader moves the start and endpoints of the edges to the cluster centers (see algo.

3). Subsequently, the edges become rendered to a texture with activated additive blending to sum up their weights.

Algorithm 3: aggregateEdges input : Edges, EvalField

output: representation of aggregated Edges (the LineField) Cells = emptyGrid(width, height);

for e in Edges do

e.vertex1 = EvalField[e.vertex1];

e.vertex2 = EvalField[e.vertex2];

for e in Edges do for c in Cells do

if e passes c then c += weight(e);

return Cells;

This implicit overdraw based approach is well suited to the graphics pipeline and can be executed very quickly, however it entails a major drawback. The result of the process is not a set of edges but a grid where each cell of the grid (named henceforth LineField) sums up the weights of all edges that pass through it. A direct visualization of the LineField shows the aggregated edges but the following two aspects have to be considered:

all aggregated edges have the same width

the weights at edge crossings are too high (sum of multiple edges)

Therefore, I propose a post-processing step that allows edges of different widths to be created based on the LineField and as additional contribution, I present a rendering method that can significantly reduce hotspots caused by crossing edges.

(23)

3 Edge Aggregation 3.4 Creating Edge Widths

3.4 Creating Edge Widths

A direct visualization of the LineField would map one cell in the grid to one pixel on the screen. However, the weight of an aggregate is defined as the amount of overdraw in the render result and therefore cannot influence the render process. In particular, an edge that belongs to an important aggregate cannot be rendered thicker because it is not possible to determine the weight of the aggregate during the creation of the LineField. I therefore suggest a post processing step to add the edge widths in image space. For each cell c_i of the LineField, a small neighborhood is inspected to determine nearby aggregates that could be extended to color the pixel that matches ci. This technique generates the illusion of explicitly defined aggregates and enables different edge widths to be adjusted based on the aggregate weight (see fig. 10(c)).

The width of an aggregate is defined by its weight width(weight(aggregate)) and the aggregate weights are stored in the cells c_i of the LineField. A pixel on the screen should therefore be colored, if the distance distbetween the cell correlated to the pixel cp and at least one cell that belongs to an aggregate c_a∈ {c_i}, weight(c_a)>0 fulfills the following equation:

dist(c_p, c_a)≤ width(weight(c_a))

2 (8)

A pixel that inspects multiple cells in a local environment can thus identify all cells that could potentially color it and, for instance, choose the one with the highest weight to render important aggregates on top. The distance function could be defined symmetrically as the Manhattan or Euclidean distance, with the Euclidean distance producing better results on askew edges. However, I suggest using an asymmetric pseudo Euclidean distance function (fig. 9).

2 0

4 1 3

2 0

4 1 3

0 1 2 0 1 2

Figure 9: Euclidean and Pseudo Euclidean distance functions. The red line is an edge aggregate in the LineField withwidth(weight(aggregate)) = 2. For the symmetric Euclidean distance both neigboring pixels are colored, the Pseudo Euclidean distance function colors only the left pixels (26≤ ²₂).

(24)

3 Edge Aggregation 3.5 Adjusting Edge Crossings

While asymmetry has the disadvantage of slightly shifting edge centers from their intended positions, it has the advantage of supporting even edge widths. In a 5x5 environment, the symmetric Euclidean distance function allows edge widths of one, three, and five. The corresponding asymmetric distance function additionally supports edge widths of two and four. For a reasonable environment size, this method can be implemented in the fragment shader without negative performance impacts and the shader code is included in the appendix of this thesis.

3.5 Adjusting Edge Crossings

(a) (b) (c)

Figure 10: (a) edge crossings become hotspots in the visualization (b) angle separated rendering removes the artefacts

(c) maximal color values and edge widths are assigned to edges not to crossings Measuring density or overdraw to adjust edge colors and widths based on their relative importance leads to hotspots at edge crossings. A simple way to reduce these artefacts is to choose a discrete color scheme for edge rendering. For weak hotspots that differ only slightly from their edge, the same color will often be picked and the color artefact vanishes. However, this technique does not remove strong hotspots with the result that the visualization continues to be disturbed. Moreover, edge crossings often increase the maximum density and distort the color and width adjustment metrics.

Therefore, I suggest separating edges by their angle and rendering them sequentially. Nearly parallel edges have a much lower likelihood to intersect each other and instead of calculating the total overdraw/density for a cell, the maximum from the separate renderings can be used. To map the idea to the rendering pipeline, all edges of a bin are rendered into one frame buffer and the resulting textures are combined with a max value operation. As further improvement, the stencil buffer can be used to exclude areas without edges from the process. Adjusting the binning width trades quality versus performance, however, 180 bins (undirected graph) with a width of one degree each yield good results (see fig. 10(c)) and can be calculated in acceptable time.

(25)

3 Edge Aggregation 3.6 Discussion

3.6 Discussion

Several stages of the presented edge aggregation pipeline contain implementation decisions that will be discussed in the following:

The EvaluationField is created on the GPU with identical shader instances for each cell. Alternatively, the results of already processed cells could be used to speed up the calculations. From an algorithmic point of view, the latter method is superior, however, it is difficult to implement it on the GPU because the fragment shader cannot access its own output and multiple render passes would therefore be required to integrate already calculated results. Switching to the CPU would also be an option but the implemented method achieves sufficient render speed and maintaining a clean GPU pipeline is preferable.

I therefore suggest integrating the results from the Seed Point Methods that improve the algorithm without the discussed disadvantages.

The LineField provides an implicit overdraw based edge representation, however, the EvaluationField could also be used as a lookup table to merge edges to explicitly defined aggregates. Such aggregates could avoid edge width and edge crossing problems but the required computation time would highly depend on the amount of aggregates and edges.

Especially combinations of a small bandwidth (many cluster centers) with a crowded viewport (many edges) would be expensive. Moreover, the assignment of edges to previously unknown aggregates requires the generation of new geometric instances (the aggregates) and parallel write access to them (weight counter). This combination is not well suited for GPU processing.

In contrast, the overdraw based approach fits the GPU pipeline and does not depend on the amount of clusters or aggregates. Furthermore, the implementation maintains a linear dependency towards the size of the dataset and parallel processing on the GPU provides pseudo constant rendering times for large graphs (see section 7).

Uniform edge widths and edge crossings are addressed by the edge width post- processing shader and angle separated rendering. However, depending on the desired visualization, it may not be necessary to resolve these issues. For example, using line kernels to create the LineField would result in a density based representation similar to the result presented in [14] and although the edge crossing problem is relevant for all coloring methods that rely on density or overdraw measurements, it is often ignored. Nevertheless, solving the problems improves the quality of the results when rendering discrete edges with a wide range of different edge weights (see fig. 10) and, therefore, I suggest using the presented improvements.

(26)

4 Aggregate Visualization 4 Aggregate Visualization

4 Aggregate Visualization

As a final step in the visualization pipeline, the DensityField and the LineField have to be rendered to the screen. The DensityField contains the node density at each point of the screen, while the LineField stores the shapes of the aggregated edges. However, both of them are defined as discrete 2D scalar fields and each cell of the fields represents one pixel on the screen. Therefore, the same rendering techniques can be applied to both fields despite their different purposes.

4.1 Normalization and Scaling

It is convenient to convert the field values to the interval [0,1]. Normally, a fix upper border could be used as divisor but because of the limited screen color space, a dynamic normalization that depends on the content of the current viewport is the better choice.

Therefore, the maximum value of the field has to be determined. The examination of the cells (several million depending on the screen resolution) on the CPU would slow down the whole pipeline, but using “Parallel Reduction” [27] on the GPU makes the processing times neglectable. The method reduces a given grid by examining many small rectangles in parallel and writing the results into a smaller grid (shader code is part of the appendix).

Experiments showed that the chosen reduction rectangle should be significantly larger than the minimum of 2x2 on modern GPUs. A single GPU reduction pass with a rectangle size around 20x20 followed by CPU examination of the remaining values yields good results.

In order to prevent information hiding through large node or edge clusters, a balanced use of the color space has to be found. Scaling functions scale : [0,1] → [0,1], x 7→ x_s can be used to control the distribution of field values in the interval. However, the mapping should not be too complicated and chosen in a way that allows the user to identify and estimate different values. The remote GUI therefore supports linear and simple exponential functions like x¹² orx¹³ (see fig. 11).

4.2 Color Mapping

The obtained field values can be mapped to colors using discrete intervals or with continuous color gradients, which can, for example, be defined in the HSV color space. The most flexible way to assign colors to the values, however, is to sample a color scheme texture.

Such textures can be loaded in the GUI (see fig. 11) and arbitrary color chains are supported in either a discrete or continuous manner.

However, color schemes can be misleading and can easily distract the viewer from the real data [9]. Instead of using many different colors or abrupt changes, I therefore suggest a

(27)

4 Aggregate Visualization 4.3 Antialiasing

smooth blend of only a small number of similar colors. Examples for good color schemes can be found at ColorBrewer.org [2] and giCentre Utilities [3]. Finally, node and edge color schemes should provide a good contrast to each other and edges should be rendered on top.

To further improve the overall rendering, alpha values can be used to differentiate between important and unimportant edge aggregates. I suggest using a discrete color scheme for the DensityField because the resulting levels are similar to the contour lines used in cartography and can support the height field metaphor. Edges on the other hand can be rendered with continuous or discrete color schemes, where discrete color schemes aid hotspot hiding, while continuous schemes can express more different values.

Figure 11: The remote GUI allows the user to adjust the scaling functions and coloring schemes for nodes, edges and labels.

4.3 Antialiasing

Antialiasing techniques can be applied during the render process to improve the picture quality and to avoid aliasing artefacts. The field representation of the data discards object based techniques, but super sampling, which uses multiple sample values to determine the color of a pixel, can be applied. The picture could, for example, be rendered at a higher resolution but this would affect the most costly parts of the visualization pipeline and decrease performance significantly. Alternatively, the fields can be sampled at multiple, slightly shifted points and blending operations can be used to enable inter cell access. All operations can be executed in the fragment shader and for a reasonable number of sample points the overall performance decreases only marginally.

Applying the method to the DensityField is straight forward, instead of one texture lookup to color a pixeln, texture lookups at slightly shifted positions are executed and the resulting

(28)

4 Aggregate Visualization 4.4 Direct Response and Animation

value is averaged. This leads to softer contour lines and improves the visualization of the field.

Working with the LineField is not that easy because the supersampling technique has to be incorporated with the edge widening algorithm (see section 3.4). This can be achieved by shifting the center of the above defined distance function to get the different samples.

Note however, that the costs of edge widening increase by a factor equal to the number of sample points. A small number of samples is thus preferable, and for the presented results four points have been used.

4.4 Direct Response and Animation

Although the image-based visualization pipeline allows the fast processing of large graphs, full interactivity with around 30 frames per second cannot be achieved. Instead, the pipeline differentiates between rendering and displaying of data. Rendering includes the calculation of the DensityField, and the LineField and has to be done in an appropriate response time that should be around one second. Displaying the data, on the other hand, includes the steps described in this section and fast execution of these tasks is crucial for a good user experience. Moreover, the calculation of the fields should not influence the reactivity of the system. This functionality can be compared to a web browser that may need several seconds to load a new web page but stays always reactive and allows fast interactions with the loaded sides.

Apart from this, the limited rendering speed also influences the standard user interactions.

Pan and zoom operations can only be implemented stepwise and not as full, continuous transitions. However, stepwise zooming is a common interaction pattern and problems with stepwise panning can be resolved. Instead of recalculating the fields for every change, the old solutions are shifted and displayed at a new position. While some areas of the screen cannot be filled with data, the solution is sufficient to support the user during the panning operation. As soon as the position is fixed, the more costly field rendering can be executed to fill the gaps.

Further improvements can be achieved by animating changes of the color mapping and the geometry. Moving the viewport can have a large impact on the normalization factor and thus alter the color to value mapping. Instead of abrupt changes, a smooth transition between old and new max values can be applied to preserve the user’s mental mapping of colors and values. Moreover, bandwidth and semantic zoom operations that alter the shape of the density clusters can benefit from geometry animations. Edge movements between new and old cluster centers require the repeated computation of the LineField and thus cannot be animated in real-time. But changes of the DensityField can be animated by interpolating between the old and the new field and result in dynamic shrinking and expanding of the density clusters.

(29)

5 Interaction 5 Interaction

5 Interaction

The Visual Information-Seeking Mantra “Overview first, zoom and filter, then details-on- demand” [34] can be used as a basic guideline for the design and evaluation of interactive zoom based information visualization systems. The presented visualization pipeline starts with an overview of the whole graph which can be altered with standard zoom and pan operations by the user. Several filter mechanisms can then be used to analyze the current viewport and, finally, node labels can be requested on demand to annotate the visualization.

5.1 Zoom and filter

The user can make use of geometric and semantic zooming operations. Geometric zooming simply enlarges the objects similar to the effect of a loupe. Semantic zooming on the other hand, “...lets the user see different amounts of detail in a view by zooming in and out” [39].

Coupling the KDE bandwidth with the zoom level allows density clusters and edge aggregates to be refined as the user zooms in. This approach provides an intuitive user interface for bandwidth selection and is therefore commonly used in KDE based visualizations [14, 38].

The initial bandwidth is set to approximately 16 pixels, which is in the range of 2 to 20 pixels suggested by Lampe and Hauser [14]. Additionally, the zoom level to bandwidth ratio can be defined as an interactive parameter to allow a mixture of semantic and geometric zooming.

Besides, the semantic zoom operations additional filters are provided to support field visualization. The scaling functions and color schemes can be adapted via an intuitive remote GUI (fig. 11), and the combination of both filters allows the user to focus on interesting value ranges. These two user controlled parameters and the dynamic normalization process determine the value-to-color matching in the final rendering. While dynamic normalization allows the system to adjust the color mapping to the density of the viewport, it is sometimes desirable to switch it off, for example, to ensure a consistent mapping during panning operations. Therefore, as a third filtering parameter, the normalization devisor can be locked to its current value. This can be done via the remote GUI or the visualization window by clicking on the lock symbol (see fig. 11). Alternatively, a user defined devisor can be applied via the remote GUI.

Both actions fix the color-to-value mapping until the lock is released by a subsequent click on the symbol. However, the fixed devisor may be smaller than the visualized maximum.

In these cases, the lock symbol turns red to indicate a potentially misleading visualization and the values are clipped to the interval [0,1].

(30)

5 Interaction 5.2 Details-on-demand

5.2 Details-on-demand

For graph visualizations, details on demand often refer to the ability to query additional vertex or edge information. However, in contrast to node link diagrams, the presented node visualization is not item based and selection operations have to be defined care- fully.

The main problem is the continous representation of the DensityField that is not suitable to select single vertices. However, the cluster based selection of multiple vertices is possible and similar to an item-based approach, the visual outstanding elements (the clusters) can be selected and provide additional information.

The cluster selection can be implemented on the GPU as point rendering of the node data.

The geometry shader is used to compare the EvaluationField values of the nodes with the hill climbing result at a selected position and reduces the dataset to the matching nodes (i.e.

with the same cluster center). These nodes can then be transferred to the C++ application by writing their ids into a transform feedback buffer that can be accessed from the CPU.

In particular, if a user clicks onto a visual cluster, the most prominent node labels will be displayed as ordered list and their positions are marked by small squares (see fig. 12(a)).

The prominence of a label is hereby defined via an assigned weight, or if the data does not contain label weights, by the degree of the node. The user can select multiple labels per cluster that will be globally scaled according to their weights, and initially positioned centered on their vertices (see fig. 12(b)). Note, that the clusters do not influence the label positions and the visualization therefore remains stable with respect to zoom and pan interactions. However, nearby labels are likely to overlap and readability may suffer as a result. For the images presented in this thesis, the Graphviz [4] implementation of VPSC [16] has been used to remove such overlaps by slightly shifting the label positions.

The presented technique allows the interactive labeling of important areas in the visualization and, in the limit, even the selection of single vertices is possible using appropriate bandwidth and zoom interactions. Moreover, the approach follows the same density based paradigm as the edge aggregation and is thus also useful to understand the connections between the node clusters.

(31)

5 Interaction 5.2 Details-on-demand

(a)

(b)

Figure 12: US air-traffic visualization (1925 edges). (a) The Denver cluster is marked and shows important nodes (squares) and their labels. (b) Labels can be added to the visualization and are scaled according to their weights.

(32)

6 Architecture & Implementation 6 Architecture & Implementation

6 Architecture & Implementation

The program was implemented as a C++ x64 application and uses graphics hardware to achieve fast processing of large amounts of data. Handling large datasets requires optimizations of shader code but also of the C++ application, and I describe in the following how the proxy and state pattern can be applied to reduce loading times and to control the render pipeline. Examples of simplified shader code for Parallel Reduction, the edge widening shader, and the SeedPointField conversion shader can be found in the appendix.

6.1 Data Loading

With increasing graph sizes, loading and pre-processing of the text based input format becomes an expensive process. For instance, loading the Europe dataset takes, even with a solid state disc and a fast processor, several minutes. Using the Seed Point Method instead of the quadtree can reduce these costs, but the expensive parsing of the input files and the upload to the GPU remain.

Therefore, I propose the usage of a cache proxy that stores the data in its GPU format. The cache allows already loaded datasets to be reloaded within seconds and therefore enables fast switching between different data sets. For the application, however, no difference is made between initial loading and reloading of the data because the data management is completely hidden in the cache proxy.

6.2 Program States

As described above (see section 4.4), the implementation uses background rendering to stay reactive while theDensityField and theLineField get calculated. The background rendering is necessary to allow the interactive exploration of datasets that cannot be rendered in real time. However, the achieved reactivity during the render process allows the user to interact with incomplete results. For example, a panning operation can be followed by a zoom operation without the necessity to wait for the panning background job to finish. In such a case, the background job is interrupted and restarted with new parameters. The visualization should, however, still be based on a valid result and not on intermediate data.

To address these requirements, I propose the usage of two result sets, one for complete results and one for intermediate results, along with software states that control the program flow. The above example uses the Working State to compute the result of the panning

(33)

6 Architecture & Implementation 6.2 Program States

operation. The subsequent zoom command triggers a transition from the Working State to theWorking State with a new parametrization that expresses the zoom and pan operations (see fig. 13). Still, the complete result is not changed and visualization is based on the

latest valid data from before the pan operation.

I chose the state pattern because it “... is a clean way for an object to partially change its type at runtime” [17] and can therefore be used to separate the different tasks of the renderer (load a dataset, change parameters, do nothing...) but also to implement unique display routines for the different states that can provide previews and animations as described in section 4.4.

Figure 13: The states of the proposed architecture. The background jobs are executed in a worker class that can be accessed from the statesInitial Work and Working.

Interactive Visualization of Huge Graphs Using Kernel Functions for Aggregation

Master Thesis

Interactive Visualization of Huge Graphs

Michael Zinsmaier

Declaration of Authorship

Abstract

Contents

1 Introduction

1.1 Related Work on Node Aggregation

1.2 Related Work on Edge Aggregation

1.3 Rendering Edge Aggregates

1.4 Preliminaries

2 Node Aggregation

2.1 The Density Field

2.2 Point Region Quadtree

2.3 Seed Point Method

3 Edge Aggregation

3.1 Edge Aggregation by Hill Climbing

3.2 The Evaluation Field

3.3 The Line Field

3.4 Creating Edge Widths

3.5 Adjusting Edge Crossings

3.6 Discussion

4 Aggregate Visualization

4.1 Normalization and Scaling

4.2 Color Mapping

4.3 Antialiasing

4.4 Direct Response and Animation

5 Interaction

5.1 Zoom and filter

5.2 Details-on-demand

6 Architecture & Implementation

6.1 Data Loading

6.2 Program States