Background and Definitions - Visual Analytic Methods for Exploring Large Amounts of Relational

We introduce the concept of a matrix based on the more formal definition of graphs.

Graphs are an accepted and well-studied subject in computer science. Interestingly, many problems from graph visualization directly relate to the problems and challenged described here in this thesis (see also: Section 1.1).

Agraph Gis a couple (V,E) whereV is a set of vertices, andE is a set of edges where:

V ={v0,· · ·,vn},

E={e₀,· · ·,e_m},e∈V² (2.2) Adirected graphis a graph where the two vertices associated with an edge are consid-ered ordconsid-ered. Anundirected graphis a graph where the vertices associated with an edge are not ordered.

We use the termnetworkto describe the graph topology as well as attributes associated with vertices (e.g. labels) and attributes associated with edges (e.g. weights). Most networks used in this survey have names associated with vertices, and positive weights associated with edges. A weighted graphG_W adds a weight functionw(e) toGso that:

w(e_i)=w_i, withw_i∈R⁺ (2.3) Anorderingororderis a bijectionϕ(v)→i fromv∈V toi∈N={1,· · ·,n} that asso-ciates a unique index to each vertex. A network usually comes with an arbitrary ordering reminiscent of its construction or storage. We call that order theinitial ordernotedϕ0(v) to distinguish it from a computed order. A transformation from one ordering to another is called apermutationπ. Formally, a permutation is a bijectionπ(x)→ysuch that:

π(x_i)=y_i, (x,y)∈N²wherey_i=y_j⇒i=j (2.4) It is usually implemented as a vector containingndistinct indices inN. We callSthe set of then! possible permutations fornvertices. A permutation can also be represented as a n×nmatrixP with all entries are 0 except that in rowi, the entryπ(i) equals 1.



Figure 2.2A simple labeled graph, its adjacency matrix, its weighted adjacency matrix, and its degree matrix

Alternatively to the representation by a tuple of sets (V,E), a graph can be represented by different matrices.

Anadjacency matrixof a graphGis a square matrixMwhere the cellm_i_,j represent the edge (or lack of ) for the verticesv_iandv_j. It is equal to 1 if there is an edgee=(v_i,v_j) and 0 otherwise. When the graph is weighted,m_i,_j represents the weight (for clarity purposes, we restricted weights to be strictly positive in Equation 2.3).

Another less common possibility, since lossy, representation of a graph is to denote only the vertex degrees: Thedegree matrix Dis defined as:

D=d i ag(d eg r ee(v₁), . . . ,d eg r ee(v_n)).

TheLaplacian matrixrepresentation, also known as Kirchhoff matrix or admittance matrix can be formulated with the help of the aforementioned equations:

L=D−M.

This matrix form has applications in spectral clustering approaches for graphs and spectral reordering methods for matrices, where this representation is used to find a partitioning of the underlying data (see also: Section 2.3.3).

Abipartite graphorbigraphis a graphG=(V₁,V₂,E) where the vertices are divided into two disjoint setsV₁,V₂, and each edgeeconnects a vertex inV₁to a vertex inV₂:

V =V₁∪V₂,V₁∩V₂= ;such thate∈E=V₁×V₂ (2.6) The adjacency matrix of a bigraph is generally rectangular, composed ofV₁in rows and V₂in columns to limit empty cells. We consider a generaldata table, such as data presented in spreadsheet form, a valued bigraph. A classic example of a bigraph is a document-author network with a single relationis-authorconnecting authors to documents. The

adjacency matrix of such bigraph includes authors in rows (respectively in columns) and documents in columns (respectively in rows), a value of 1 marking the authoring relationship, and a value of 0 otherwise.

2.2.1 | Related Concepts

In addition to the previous definitions and notations, this work frequently bridges graph concepts with linear algebra concepts. Since readers might not be familiar with these relationships, we summarize them here, introducing concepts often used in this thesis.

Adjacency matrices typically bridge graph theory and linear algebra, allowing the interpretation of a graph as a multidimensional (n-dimensional forn vertices) linear system, and vice-versa. We list below several properties of networks when considered as adjacency matrices.

• When encoded as an adjacency matrix, a vertex becomes ann-dimensionalvector of edges(or edge weights). When the network is undirected, the matrix issymmetric and the vectors can be read horizontally or vertically. Otherwise, two vectors can be considered: the vector ofincoming edges, and the vector ofoutgoing edges.

• Since vertices are vectors, adistancemeasured(x,y) can be computed between two vertices (x,y)∈V²(or asimilarityordissimilaritymeasures(x,y)). For example, the Euclidean distanceL₂:d(x,y) between verticesxandyis:

L₂(x,y)=r X

k∈[1,n]

(x_k−y_k)² (2.7)

• Several reordering algorithms use adistance matrix(orsimilarity matrix) as input, which is a symmetric positive definite matrixDcontaining the pairwise distances between multiple vectors. From then×nadjacency matrix of an undirected graph, one symmetric distance matrix can be computed of sizen×n. From a general n-rows×m-columns matrix, two distance matrices can be computed: one of size n×nfor the rows (m-dimensional vectors we will callA), and one of sizem×m for the columns (n-dimensional vectors we will callB). A distance matrix is always symmetric and positive (it ispositive-definitemathematically speaking).

• A particularly important distance matrix is thegraph distance matrix, which con-tains the length of the shortest path between every pair of vertices for an undirected graph. Note that a distance matrix or more generally a positive-definite matrix can also be interpreted as an adjacency matrix of a weighted undirected graph. Note also that any symmetric matrix can be interpreted as an adjacency matrix of a valued undirected graph (a graph where each edge has an associated value).

• From any undirected graph, or positive-definite matrix, many graph measures can be computed. These can serve as objective functions to minimize or as quality

mea-sures of reordering algorithms. We describe three key meamea-sures below:bandwidth, profile, andlinear arrangement.

Let us callλ(u,v) the length between two vertices inG, given a one-dimensional λ(u,v) the length between two vertices inG, given an alignment of the vertices ϕ:λ((u,v),ϕ,G)= |ϕ(u)−ϕ(v)|.

Bandwidth BW is the maximum distance between two vertices given an orderϕ. BW(ϕ,G)= max

(u,v)∈Eλ((u,v),ϕ,G) (2.8) Intuitively and visually, when looking at the adjacency matrix of an undi-rected graph (a symmetric matrix), the bandwidth is the minimum width of a diagonal band that can enclose all the non-zero cells of the matrix. A small bandwidth means that all the non-zero cells are close to the diagonal.

Therefore, a quality measure is MINBW, theminimum bandwidthof a graph MINBW(G)=arg min_ϕ∗(BW(ϕ^∗,G)). Note that there can be multiple different orders that achieve that same minimum bandwidth.

Profile PR is: sum, for each columni of the matrix, of the “raggedness”: the distance from the diagonal (with coordinates (i,i)) to the farthest-away non-zero cell for that column (with coordinates (i,j)). It is a more subtle measure than the band-width because it takes into account all the vertices and not only the vertex with the largest length. Theminimum profileis MINPR(G)=arg min_ϕ∗(PR(ϕ^∗,G)).

Linear arrangement LA is the sum of the distances between the vertices of the edges of a graph:

LA(ϕ,G)= X

(u,v)∈E

λ((u,v),ϕ,G). (2.10) It is an even more subtle measure than the profile since it takes into account all the edges. Theminimum linear arrangementis therefore formally defined as: MINLA(G)=arg min_λ∗(LA(ϕ^∗,G)).

• In the context of matrix reordering several data “modes” are to be distinguished: (i) atwo-way one-modedata set describes a matrix, which has columns and rows (two-way), but only represents one set of objects (one-mode). For example, symmetric dissimilarity matrices are of the form two-way one-mode. (ii)two-way two-mode

Figure 2.3Distribution of papers using matrix visualizations as primary or meta-analysis visualiza-tions. Blue bars indicate the presence of matrix visualizations within the paper, orange bars present the usage of matrices for analytic purposes.

data, such as in general non-negative matrices, represent two sets of objects. For two-way two-mode, an optimal order of columns can depend on the order of rows and vice versa or it can be independent, i.e., allowing for breaking the optimization down into two separate problems, one for the columns and one for the rows [HHB08].

Im Dokument Visual Analytic Methods for Exploring Large Amounts of Relational Data with Matrix-based Representations (Seite 43-47)