Constrained Graph Drawing

(1)

Dissertation

zur Erlangung des akademischen Grades des Doktors der Naturwissenschaften

vorgelegt von

Barbara Pampel, geb. Schlieper

an der Universit¨ at Konstanz

Mathematisch-Naturwissenschaftliche Sektion Fachbereich Informatik und Informationswissenschaft

Tag der m¨ undlichen Pr¨ ufung: 14. Juli 2011 1. Referent: Prof. Dr. Ulrik Brandes

2. Referent: Prof. Dr. Michael Kaufmann

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-193966

(2)

Teile dieser Arbeit basieren auf Ver¨offentlichungen, die aus der Zusammenar- beit mit anderen Wissenschaftlerinnen und Wissenschaftlern entstanden sind.

Zu allen diesen Inhalten wurden wesentliche Beitr¨age geleistet.

Kapitel 3 (Bachmaier, Brandes, and Schlieper, 2005; Brandes and Schlieper, 2009) Kapitel 4 (Brandes and Pampel, 2009)

Kapitel 6 (Brandes, Cornelsen, Pampel, and Sallaberry, 2010b)

(3)

Zusammenfassung

Netzwerke werden in den unterschiedlichsten Forschungsgebieten zur Repräsenta- tion relationaler Daten genutzt. Durch geeignete mathematische Methoden kann man diese Netzwerke als Graphen darstellen. Ein Graph ist ein Gebilde aus Kno- ten und Kanten, welche die Knoten verbinden. Hierbei können sowohl die Kan- ten als auch die Knoten weitere Informationen beinhalten. Diese Informationen können den einzelnen Elementen zugeordnet sein, sich aber auch aus Anordnung und Verteilung der Elemente ergeben.

Mit Algorithmen (strukturierten Reihen von Arbeitsanweisungen) aus dem Gebiet des Graphenzeichnens kann man die unterschiedlichsten Informationen aus verschiedenen Forschungsbereichen visualisieren. Graphische Darstellungen können das Verständnis von Datenmengen entscheidend unterstützen und bilden eine hervorragende Basis für weitere Untersuchungen und neue Erkenntnisse.

Die Aussagekraft und der Informationsgehalt sowie die Verständlichkeit solcher Graphen hängen von verschiedenen Gesichtspunkten ab. Dazu gehören Knoten- verteilung, Planarität (kreuzungsfreie Einbettung) und weitere ästhetische Ei- genschaften. Die besonderen Informationskonstellationen und Strukturmerkmale verschiedener Datenmengen erfordern Graphen, deren Darstellungsform der Pro- blemstellung angepasst ist. So ist es wichtig, effiziente Algorithmen zum Zeich- nen von Graphen zu finden und zu untersuchen, die aufgrund der Anforderungen konkreter Anwendungen spezielle geometrische Eigenschaften haben. Für einige Problemstellungen konnten im Rahmen dieser Arbeit Algorithmen zum automa- tisierten Zeichnen solcher Graphen entworfen werden. Bedingung für die entwi- ckelten Methoden war die korrekte Wiedergabe der Daten, Optimierungsziel ist die Lesbarkeit. Für andere Probleme konnte deren Zugehörigkeit zur Komple- xitätsklasse der N P-schweren Probleme bewiesen werden, was bedeutet, dass es kaum Hoffnung auf eine effiziente und exakte Lösung gibt. Ist ein Problem als N P-schwer oder N P-vollständig bekannt, rechtfertigt dies eine Lockerung der Vorgaben, motiviert die Suche nach Heuristiken und Näherungsverfahren sowie nach anderen Anforderungen an die Zeichnung, welche eventuell ähnliche Vorteile für die Interpretation haben.

(4)

Ergebnisse

Winkel und Abst¨ ande

Zu entscheiden, ob ein allgemeiner Graph mit festgelegten Knotenabständen oder Winkeln zwischen benachbarten Kanten kreuzungsfrei gezeichnet werden kann, ist alsN P-schwer bekannt. Wir schränken die Problemstellungen auf eine häufig verwendete Teilklasse von Graphen ein: Bäume sind zusammenhängende Gra- phen, bei denen es zwischen je zwei Knoten nur genau einen Weg im Graphen gibt. So werden Wegskizzen aufgrund ihrer Struktur mit Abzweigungen und We- gen als Bäume dargestellt. Bedingungen an den Verlauf der Kanten helfen dem Nutzer bei der Interpretation. Ähnliche Problemstellungen tauchen auch in anderen Anwendungen auf. Phylogenetische Bäume repräsentieren zum Beispiel Verwandtschaftsverhältnisse zwischen verschiedenen Tieren oder Pflanzen, wobei jeder Knoten eine Spezies repräsentiert und die Länge eines Weges zwischen zwei solchen Knoten die evolutionäre Distanz dieser Spezies widerspiegeln soll.

Phylogenetische Bäume haben also festgelegte, aber möglicherweise unterschied- liche Kantenlängen.

In Kapitel 3 wurden nun Linearzeit-Algorithmen zum kreuzungsfreien Zeich- nen von Bäumen entwickelt, für welche die geometrischen Abstände je zweier durch eine Kante miteinander verbundenen Knoten, die Winkel zwischen benachbarten Kanten oder beides vorgegeben sind.

Reihenfolge

Oft erleichtert eine schematisierte Darstellung eines Graphen seine Interpretati- on. Für den Betrachter wichtige Eigenschaften des ursprünglichen Layouts, wie zum Beispiel die relative Lage der Knoten bei der Veränderung geographischer Netzwerke, sollten bei der Schematisierung erhalten bleiben. (Misue, Eades, Lai, and Sugiyama, 1995) diskutierten, wie wichtig die Erhaltung einer soge- nannten ”mental map”bei der Veränderung von Graphen ist. Eines der Modelle ist die Erhaltung der orthogonalen Reihenfolge der Knoten, ihrer senkrechten und waagrechten Anordnung.

In Kapitel 4 wurde für drei typische Anforderungen an eine schematisierte Zeichnung untersucht, ob sie unter der Bedingungen, die orthogonale Reihen- folge der Knoten einzuhalten, effizient erfüllt werden können. Eine Möglichkeit die Zeichnung eines Graphen zu vereinfachen ist, für die Kanten nur bestimmte Richtungen - zunächst nur waagrecht und senkrecht - zu erlauben. Da der Stil, in dem U-Bahnpläne typischerweise gezeichnet werden, von Betrachtern als besonders übersichtlich und leicht verständlich empfunden wird, ist das Schematisieren von Pfaden, so dass alle Kanten entweder waagrecht, senkrecht oder diagonal im

(5)

Winkel von 45^◦ gezeichnet werden, besonders interessant. Eine weitere mögliche Einschränkung ist, die Länge der Kanten festzulegen. Hier wurde das Problem betrachtet, einen Pfad zu schematisieren, so dass alle Kanten dieselbe Länge haben. Es konnte bewiesen werden, dass das Problem, zu entscheiden, ob ein Graph nach einer dieser Vorgaben schematisierbar ist, ohne Kantenkreuzungen oder Kanten der Länge null einzuführen, unter Erhaltung der orthogonalen Kno- tenreihenfolge N P-schwer ist.

Richtungen

Gegeben sei eine Menge von Knoten und eine Menge von Pfaden, wobei jeder Pfad eine Sequenz von Knoten ist. Beim Zeichnen der Pfade bekommt jeder Kno- ten Koordinaten zugewiesen. Eine Problemstellung ist nun, die Koordinaten der Knoten so zu w¨ahlen, dass alle Pfade streng monoton wachsend in eine Richtung sind. Hier kann weiter unterschieden werden, ob eine solche Monotonie-Richtung parallel zu einer der Dimensions-Achsen sein soll, oder beliebig ist. Solche Pfa- de auf einer Knotenmenge k¨onnten unter anderem zur Veranschaulichung von Karrieren genutzt werden, zum Beispiel von Trainern, die zwischen den Mann- schaften einer Liga wechseln.

Ein verwandtes Problem ist das betweennes-Problem. Gegeben ist eine Menge von Punkten und eine Menge von betweennes-Beziehungen, in dem Sinne, dass für zwei Punkte ein dritter angegeben wird, der in mindestens einer der Dimen- sionen zwischen den beiden anderen liegen muss. Die betweennes-Beziehungen können also durch Pfade der Länge 2 beschrieben werden, die in Richtung einer der Dimensionen streng monoton sein müssen. Die Frage ist also, ob man in d- Dimensionen den Punkten so Koordinaten zuweisen kann, dass alle betwenness- Beziehungen erfüllt sind. Das Problem ist im eindimensionalen als N P-schwer bekannt.

In Kaptitel 5 konnte sowohl für das betweennes-Problem, als auch das Ent- scheidungsproblem für streng monotone Trajektorien das N P-schwere-Resultat in den 2- und in den 3-dimensionalen Raum übertragen werden, ebenso wie das Entscheidungsproblem für streng monotone Trajektorien in den d-dimensionalen Raum.

Form

Während in einem Graphen jede Kante genau zwei Endknoten hat, sind Hy- pergraphen allgemeiner definiert. Eine Hyperkante ist eine Teilmenge der Kno- tenmenge. Damit können Hypergraphen überall dort Verwendung finden, wo eine Menge von Elementen weiter aufgeteilt wird, zum Beispiel bei Datenbank- systemen oder sozialen Netzwerken. Eine Möglichkeit solche Hypergraphen zu

(6)

zeichnen sind Träger. Ein Träger eines Hypergraphen ist ein Graph, in dem jeder durch eine Hyperkante induzierte (d.h. alle Knoten der Hyperkante enthaltende) Teilgraph zusammenhängend ist. Motiviert durch die Übersichtlichkeit von U- Bahn-Plänen wurde in Kapitel 6 die Anforderung untersucht, dass jeder durch die Hyperkante induzierte Teilgraph ein Pfad ist. Neben einigen Ergebnissen zu monotonen, minimalen und planaren pfadbasierten Trägern ist unser zentra- les Resultat eine Charakterisierung der Hypergraphen, die einen pfadbasierten Baumträger haben und ein Algorithmus, um einen solchen effizient zu berechnen, falls er existiert.

(7)

Acknowledgements

I am so glad that my dream of achieving a PhD in computer science has come true, even though I have not been a typical PhD-student. While following my husband’s employments through the world and especially after the birth of my daughter Sophia, I had to count on the flexibility and the support of so many people and I am deeply grateful for all the help I have received!

First and foremost I would like to thank Ulrik Brandes for being a great advisor. His enthusiasm and his knowledge are impressing and inspiring. I am grateful for his encouragement, his patience, and for making it possible to continue my work while following my family.

I thank all my colleagues from the algorithmics research group at the Uni- versity of Konstanz and especially our secretary, Christine Agorastos, for being my link to Konstanz by helping me with whatever needed local access, and for the warm welcome and the great time whenever I returned for a few months.

I would like to acknowledge the financial, academic and technical support of the University of Konstanz and I am thankful for the support I received from the State of Baden-Wuerttemberg through a LGFG-scholarship.

Even though my traveling made it more complicated, doing research was most exciting when I could work on a problem together with other scientists. I thank Christian Bachmaier, Ulrik Brandes, Sabine Cornelsen, Michael Kaufmann, and Arnauld Sallabary for their contribution to our joint work and for everything I have learned from them.

Further thanks to Lars Volkhardt and Dina Tantawy for implementing some of the tree drawing algorithms.

During my stays abroad, I had the opportunity to visit other research groups.

Thanks to Antonios Symvonis at the National Technical University of Athens and especially to Andrea Pietracaprina, Geppino Pucci and their group members for having me as their guest at the DEI during my eight months in Padova.

I also want to thank Michael Kaufmann for being my second referee and examiner and Michael Berthold for joining my examination committee.

I am so grateful to my family, my parents Meinolf and Rita Schlieper, and my sister Claudia, for their never-ending support and for always being proud of

(8)

me. I thank my kids, Sophia and David, for being the cutest distraction from my work and for showing me what really matters in life.

Last but not least, I would like to thank my husband Christian, whose calm and even temper give me so much strength and perseverance. I am deeply grateful for his understanding and his patience - and for his love.

(9)

Chapter 1 Introduction

Our world is full of networks. The linking relationships might be quite abstract, such as friendship or metabolic processes or even more concrete, like roads or railways, but are still hard to overlook. One way to deal with such a network, is to mathematically model it as a graph with vertices representing the entities and edges the relationships. Graphs are widely used to visualize relational data. The area that deals with the theory and algorithmic questions of graph visualization is conventionally calledgraph drawing and is covered by the annual International Symposium on Graph Drawing and several books (see e.g. Di Battista, Eades, Tamassia, and Tollis (1994, 1999); Kaufmann and Wagner (2001); Sugiyama (2002)). The usefulness of a drawing depends on aesthetic criteria as well as on the amount of information contained in the data which can be revealed by the drawing. Especially relevant for the readability is the minimization of edge cross- ings (see Purchase (1997)), which is why many methods yield intersection-free drawings if the input graph is planar. Yet the respective importance of the different criteria depends on the application. An aesthetically good drawing might not display relevant information or might even be misleading. A good vertex distribution for example can be helpful to overview a data set, but the relative closeness of objects might be interpreted as the strength of their relationship.

Likewise, further information can be assigned not only to the elements, vertices and edges, but can also be displayed through their absolute and relative positions. Various criteria can be described by formal constraints. With the help of such geometric constraints, the graphs can meet the different requirements of concrete applications. In this thesis various types of fundamental constraints on relevant classes of graphs are studied.

(12)

1.1 Angles and Distances

We consider planar drawings that must satisfy constraints on the angles between edges incident to a common vertex, on the distances between adjacent vertices, or both. These requirements arise naturally in many applications. The decision problems are known to be hard for either constraint on graph layouts. We apply them to trees.

pseudomona

nico-tabac nico-syl-A

arabidopsi gylcine--- chara--- bryopsis-- gonium---- chlamydomo chlorella-

astasia--- euglena--- raphidonem

ochromonas cynophora

coscinodis cyclotella laminaria- porphyra-- smithora--

gracilaria anacystis-

plectonema gloeobacte myco-gentl

thermotoga borrelia-b ChlamydiaB Tthermophi

Taquaticus deinonema- bacillus-- salmonella ecoli---

micrococcu shewanella

(a) phylogenetic tree

(b) route sketch

(c) molecule

(d) Fibonacci caterpillar

(Duncan, Eppstein, Goodrich, Kobourov, and N¨ollenburg, 2011) Figure 1.1: Trees

Vertex distances can represent evolutionary divergence of taxons or species in phylogenetic trees for instance (Fig. 1.1(a)). Vertex distances and angles between incident edges are of interest when derived from an original geographic setting. Route sketches (Fig. 1.1(b)) for example are mostly trees. When drawing chemical structures (Fig. 1.1(c)), distances and angles are given following the bond lengths and angles of the reacting atoms represented by the vertices.

(13)

Furthermore, to improve the readability of a drawing, a good angular resolution may be intended. The tree drawing in Fig. 1.1(d) has perfect angular resolution.

When investigating the constraints separately, we demand each edge to be a straight line. Since a straight-line drawing satisfying both angle and distance constraints is fully determined up to rotation and scaling, we test for planarity and allow bended and curved edges here.

1.2 Ordering

There are several scenarios in which a given drawing of a graph is to be modified while subject to preservation constraints. Examples include shape simplification, sketch-based, and dynamic graph layout. The orthogonal ordering of vertices is a natural constraint and is frequently pursued for preserving the user’s mental map of a changing drawing.

(a) actual geographic positions (b) ordering preserving layout Figure 1.2: The New South Wales rail network (Dwyer, Koren, and Marriott, 2006)

Figure 1.2 shows how the readability of a rail network drawing can be im- proved while preserving the orthogonal ordering of the vertices representing the stations.

(14)

We investigate three basic drawing conventions with respect to the orthogonal ordering constraint: the rectilinear drawing problem, where each edge must be either horizontal or vertical, the octilinear drawing problem, especially interesting for metro map drawings, and the problem of drawing graphs with equal edge lengths.

1.3 Directions

Figure 1.3: Internal preference map (Dunn and Goldman, 1998)

Figure 1.3 shows an internal preference map illustrating ten different rank- ings by corresponding axes of monotonicity. The problem of deciding whether there are geometric positions for the items such that each (complete) ranking is strictly monotone increasing in some direction is a special case of the problem we introduce: the trajectory drawing problem. The input consists of a set of paths, called trajectories, each on a subset of a common vertex set. By satisfying suit- able constraints on the shape of the trajectories, additional information can be carried for several practical visualization problems. We concentrate on the basic constraint that each path must be strictly monotone in some direction .

(15)

1.4 Shape

While a graph consists of a set of vertices and a set of pairs of vertices, called edges, a hypergraph is more general. It is defined on a set of vertices and a set of non-empty subsets (not necessarily of size two) of the vertex set, called hyperedges. Hypergraphs can be used wherever subsets of element sets occur, such as schemata of relational databases and social networks. One main way of drawing hypergraphs is the Euler diagram style, where each vertex is drawn as a point and each hyperedge as a region containing only the points of the hyperedge’s vertices. To make the Euler diagram style more concrete, the drawing can be represented as a graph. A graph is called a support if for each hyperedge the induced subgraph is connected.

v4 v₅ v₆ v₇ v₈ v₉ v10

v₁₁ v₁₂ v13

v1

v₂

v₃ v₁₄

(a) Euler diagram

v4 v5 v6 v7 v8 v9 v10

v11

v12

v13

v1

v2

v3 v14

(b) support

v₉

v₃ v₄ v₅ v₆ v₇ v₈ v₁₀ v₁₄

v₁₃

v₁₂ v₁₁ v₁

v₂

(c) metro map like drawing

Figure 1.4: Three representations of a hypergraph

We investigate an additional constraint on supports. Motivated by the aes- thetics of metro map layouts the subgraph of the support should be a simple path for each hyperedge.

(16)

(17)

Chapter 2 Preliminaries

This chapter contains the main definitions and notation used in this thesis.

Graphs A graph G = (V, E) consists of a set of vertices V and a set of edges E with each e ∈ E is a 2-element subset of V. Let n =|V| denote the number of vertices and m = |E| the number of edges. We say an edge is incident to its end vertices and call the connected vertices adjacent. Two edges are called incident if they are incident to a common end vertex. With {v, w} we refer to an undirected edge while an edge (u, v) is directed from u to v and is called a (directed or undirected)loop ifu=v. The set of vertices adjacent to a vertexvis called the set ofv’sneighbors N(v). A vertexuwith (u, v)∈E is called aparent of v, and referred to with parent(v), while for an edge (v, w) ∈E the vertex w is achild of v. We refer to the set of children of a vertex v with children(v). A source does not have any parents, a sink no children and aninner vertex has at least one parent and one child. Thedegree degG(v) denotes the number of edges in G incident to a vertex v (the subscript may be omitted if it is clear which graph is referred to). The outdegree outdegG(v) denotes the number of outgoing and the indegree indeg(v) the number of ingoing edges. G is called multigraph if we explicitly allow loops and multiple edges like e= (u, v), e⁰ = (u⁰, v⁰) where u=u⁰ and v =v⁰, simple otherwise. In this thesis we are interested in drawing simple graphs. A subgraph G⁰ = (V⁰, E⁰) of a graph G= (V, E) is a graph on a subset V⁰ ⊂V of vertices connected by a subset E⁰ ⊂E of edges.

A walk in a graph G(V, E) is a subgraph that is a finite sequence of vertices v1. . . vk such that (vi, vi+1) ∈ E for each 1 ≤ i ≤ k. A walk is a (directed or undirected) path P = (v1, . . . , vk) if all edges are distinct and the length of P is k−1. A path is simple if for any vertex vi with 1< i < k it is deg_G(v) = 2. A path is a (directed or undirected) cycle if v1 =vk. A graph is acyclic if none of its subgraphs is a cycle.

A graphG= (V, E) is calledbipartite ifV can be partitioned into two disjoint

(18)

sets V1, V2 ⊂ V where V1 ∪V2 = V such that no two vertices in V1 and no two vertices in V₂ are adjacent.

Trees A tree T = (V, E) is a connected and acyclic graph. We use root(T) to refer to the only vertex r of a directed tree T that has indegG(r) = 0. Tree edges are viewed as directed away from the root. Even if a root is given, it will sometimes be convenient to consider the tree as being rooted at another vertex. If some vertexr∈V is made the root ofT, the resulting tree is denoted by Tr, so that root(Tr) = r. In a rooted tree, each vertex v ∈ V \ {root(T)} has a unique parent and a set of children. From v there is a directed path to each of v’s descendants, while a vertex a is called an ancestor of v if there is a directed path from a to v. We will only consider subtrees induced by a vertex and its descendants with respect to some root (rather than arbitrarily connected subgraphs) and refer to the subtree of a tree Tr induced by v ∈V as Tr(v). If it is clear at which vertex the tree was rooted, we can refer to the subtree induced byv as T(v). For a subtreeT(v) of T leaves (T(v)) denotes the set of its leaves which consists of all vertices v in T(v) with deg(v) = 1 in T. Since directions depend on the root, we use {u, v}to refer to an edge when the root is not clear from context. A tree T = (V, E) is called binary if degT(v)≤3 for every vertex v ∈V. A rooted tree T iscomplete if the length of the shortest path from a leaf l to the root is the same for every leaf l ∈V.

Hypergraphs A hypergraph is a pair H = (V, A) where V is a finite set and A is a (multi-)set of non-empty subsets ofV. The elements ofV are called vertices and the elements of A are calledhyperedges. We denote by n=|V| the number of vertices,m=|A|the number of hyperedges, andN =P

h∈A|h|the sum of the sizes of all hyperedges of a hypergraph H. Thesize of the hypergraph H is then N+n+m. A hypergraph is agraph if all hyperedges contain exactly two vertices.

A hypergraph H = (V, A) is closed under intersections if h₁∩h₂ ∈ A∪ {∅} for h1, h2 ∈A.

A support (or host graph) of a hypergraph H = (V, A) is a graphG= (V, E) such that each hyperedge ofh∈H induces a connected subgraph ofG, i.e., such that the graph G[h] := (h,{e ∈ E, e ⊆ h}) is connected for every h ∈ A. An Euler diagram of a hypergraph H = (V, A) is a drawing of H in the plane in which the vertices are drawn as points and each hyperedge h∈A is drawn as a simple closed region containing the points representing the vertices in hand not the points representing the vertices inV\h. TheHasse diagram of a hypergraph H = (V, A) is the directed acyclic graph with vertex set A∪ {{v};v ∈ V} and there is an edge (h1, h2) if and only if h2 ( h1 and there is no set h ∈ A with h2 (h(h1. Fig. 6.1(a) shows an example of a Hasse diagram.

(19)

Layout, Drawing A graph is said to be embedded (combinatorially) if for each v ∈ V the, say, counterclockwise cyclic ordering of edges incident to v is pre- scribed. For a graphG(V, E) that isdrawn ind dimensions a geometric position x(v) = (x1(v), . . . , xd(v)) is assigned to each vertex v ∈V. A graph G(V, E) is calledplanar if there is aplanar drawing of G, i.e., if it can be drawn such that any two edges e6=e⁰ ∈E intersect only in a vertex they are both incident to.

If the task is to redraw a graph that has an initial layout, let x⁰(v) be the position of a vertex v in the resulting layout. By preserving the orthogonal ordering of the vertices in the plane we mean that if for two vertices vi, vj it is x1(vi)≤x1(vj) in the original layout, x⁰₁(vi)≤x⁰₁(vj) holds also for the resulting layout and for x2(vi)≤ x2(vj) in the original layout it is x⁰₂(vi) ≤ x⁰₂(vj) in the resulting layout. For a 2-dimensional drawing of a (sub)-pathP we call the strip between the vertical line through P’s rightmost vertex and the one through the leftmost vertex the x1-range of P, and analogously the strip between the horizontal line throughP’s highest vertex and the one through the lowest vertex P’sx2-range.

A direction is an oriented line through the origin of the coordinate system.

A path P in a (possibly curvilinear) drawing is strictly monotone increasing in some direction a if every line perpendicular to a intersects the drawing of p in at most one point. A path P in a straight-line drawing is strictly monotone increasing in some direction a if and only if the orthogonal projections of the vertices ofP ona appear along a totally ordered in the order induced by P.

We adopt the convention that the edges incident to v are to be drawn in counterclockwise order, i. e., for a vertex v with deg(v) = k the edgee_{(i+1) mod}k

is the counterclockwise first after ei for 0 ≤i≤ k. In a tree for an inner vertex other than the root the counterclockwise first edge after the incoming edge is that of the first child.

A triangle (quadrangle) is described by its three (four) vertices in counterclockwise order and an angle or an areabetween two (half-)linesg and his swept out counterclockwise fromg to h.

Complexity We provide an intuition of the complexity classes P and N P and the concept of N P-completeness. For a more detailed and formal introduction see Garey and Johnson (1979).

Aproblem consists of a formal description of the relevant parameters and the properties asolution must have. We receive an instance of a problem by fixing all parameters. For a decision problem a possible solution can only be either YES orNO.

The classP (forpolynomial) is the set of decision problems for which there is an algorithm that gives a correct solution for each instance in polynomial time.

(20)

A certificate (or witness) proves that the solution for an instance of a decision problem is YES. The class N P(for nondeterministic polynomial) is the set of decision problems for which there is a polynomial time algorithm which can verify a certificate and one can nondeterministically (finite number of options per step) generate a guess for a certificate in polynomial time.

A reduction from a decision problem Π to another decision problem Π⁰ is an algorithmAthat generates an instanceI⁰ of Π⁰ for eachI of Π. For apolynomial time reductionthe runtime ofAis bounded by a polynom of the size ofI. As long as the question whetherP =N P is not answered, it would be helpful to identify the problems inN P which are not in P if P 6=N P. We can find problems that are especially hard: A problem is N P-hard if and only if there is a polynomial time reduction from a problem that is known to beN P-complete, i.e., N P-hard and ∈ N P. (Note that furthermore it is sufficient to find a reduction from another N P-hard problem.) If any N P-hard problem was in P, all N P-hard problems would be as well.

A Boolean variable is an element that can have one of two values, e.g., either true or false. The satisfiability problem (SAT) is a decision problem, with each instance a Boolean expression written using only variables, parentheses and the logical operations conjunction (AND) , disjunction (OR) and negation (NOT).

The question is whether there is an assignment of values to the variables such that the entire expression is true. A literal is either a variable or the negation of a variable. A clause is a disjunction of literals. We only consider instances which are conjunctions of clauses, i.e., in the conjunctive normal form (CNF).

In MONOTONE 3-SAT each clause contains exactly three literals either all negated (negative clause) or all non-negated (positive clause). This problem is known to be N P-hard Garey and Johnson (1979). An instance I = (B, C) of the MONOTONE 3-SAT-problem is a set of Boolean variablesB ={b1, . . . , bn} and a set of clauses C ={C1, C2, . . . , Ck} with each clause Ci ={li1∨li2 ∨li3}.

(21)

Chapter 3 Angles and Distances

When representing a data set, information can be assigned to the vertices and edges of a graph but also to their relative position. Standard graph drawing algorithms do not take into account these criteria. In this chapter the impli- cations of the following two formal constraints on drawings of a graph G are investigated.

Angle constraints: For an embedded graphG= (V, E), letA⊆E×E be the angle set, where (ei, e_i+1) ∈ A if and only if both edges share a vertex v and ei+1 is the counterclockwise next edge after ei aroundv.

A drawing of an embedded graph G = (V, E) satisfies angle constraints α : A → (0,2π), if the angle between all pairs (ei, ei+1) ∈ A is exactly α(ei, ei+1). Note that α is frequently called anangle assignment.

A necessary requirement for angle constraints to be satisfiable is that they sum to 2π around every vertex and to (dG(f)−2)π around every inner facef with dG(f) vertices. Such a set of angle constraints is called locally consistent and we assume that all given angle constraints are.

Distance constraints: A drawing of a graph G = (V, E) satisfies distance constraints δ : E → R⁺ if all pairs of adjacent vertices {v, w} ∈ E are exactly at distance δ(v, w).

A graph with angle and/or distance constraints is called realizable in the straight-line model (or straight-line realizable for short) if there exists a planar straight-line drawing in the plane such that all constraints are satisfied. Testing straight-line realizability is known to beN P-complete for both angle-constrained graphs (Garg, 1995) and distance-constrained graphs, even if all edges are constrained to have unit length (Eades and Wormald, 1990). Hence, we will focus on trees.

(22)

Trees are widely used as a data structure and to represent hierarchical data.

Standard tree drawing algorithms (Eades, 1992; Reingold and Tilford, 1981;

Walker, 1990) place the vertices on concentric circles or horizontal lines, re- spectively. While achieving a good vertex distribution and an apparent level structure, information about the relative position of vertices and edges, possibly relevant for the data, is lost. We study the problem of drawing trees considering given vertex distances, edge angles or both.

For trees, arbitrary distance and (locally consistent) angle constraints can be satisfied, though not necessarily in the same drawing. We present linear time algorithms for drawing trees satisfying angle constraints (Sect. 3.1) and distance constraints (Sect. 3.2). In Sect. 3.3 we describe how to test whether a tree has a straight line drawing that satisfies both angle and distance constraints and show how using polylines and curves for the edges makes it possible to draw any tree with the desired edge angles and vertex distances.

3.1 Angle Constraints

Not only can a good angular resolution improve the readability of a drawing, edge angles can also carry information which is relevant for a data set as for example in geographic networks. We describe a linear time algorithm to draw a tree T(V, E, α) in a balloon layout style (Melan¸con and Herman, 1998) without introducing any intersection other than the common vertex of two incident edges and keeping the edge length positive for each edge.

Theorem 1. For any tree T = (V, E) with locally consistent angle constraints α, a planar straight-line drawing satisfying α can be determined in linear time.

The basic idea of the algorithm is to assign a circlecw with centerwto every vertex w 6= root(T) in which the subtree T(w) is drawn. Starting with fixing the absolute angle of one arbitrary edge, we can extend α : A → (0,2π) to α :A∪E →(0,2π] with α(e) being the absolute angle of each edge e∈ E and α(ei, ei+1) = (α(ei+1)−α(ei)) mod 2π. The vertexw must be positioned on the halfline hl(w) from w’s parent v with the outgoing angle α (v, w)

. To make sure cw is not intersecting with any circle of another neighbor of v, cw is placed in a wedge rooted in v and with bisector hl(w). The position of w on hl(w) is determined by placingcw nearest possible tov but without causing intersections with the wedge borders.

To determine the position of a childw ofv, we compute the wedge size ω(w) of the wedge belonging to w. Therefore the two sectors between (v, w) and the clockwise next edge incident to v and between (v, w) and the counterclockwise next edge are divided among the wedges of their target vertices. We divide

(23)

Figure 3.1: Dividing a sector

a sector proportional to the radius of the two corresponding circles. Assume a vertex v has k children w₁. . . wk in counterclockwise order. The wedge size ω(wi) of a vertex wi is double the minimum of the two sector parts assigned to wi. If a wedge cannot occupy the full part of a sector, the counterclockwise or the clockwise next wedge should be allowed to use more. We compute initial wedge sizes counterclockwise from w1 towk and update them in a second counterclockwise traversal. To simplify the computation, we introduce temporary verticesw₀ andw_k+1, edgese₀ ande_k+1 with anglesα(e₀) =α(e_k+1) = α (v, u) if u = parent(v) or α(e0) = α(ek+1) = (α(e1)−α(ek, e1)· _r(w₁^r(w_)+r(w¹⁾ _k₎) mod 2π if v = root(T), the corresponding wedge sizes ω(w0) = ω(wk+1) = 0^◦ and radii r(w0) = r(wk+1) = 0. To avoid complications, no wedge should be wider than π. Then the initial wedge size of a vertex wi for 1≤i≤k is:

ω(wi) = 2·min{α(ei−1, ei)− ω(wi−1)

2 , α(ei, ei+1)· r(wi)

r(wi) +r(w_i+1),π

2} (3.1) Then we update the wedge sizes counterclockwise from w1 to wk:

ω(wi) = 2·min{α(ei−1, ei)− ω(wi−1)

2 , α(ei, ei+1)− ω(wi+1) 2 ,π

2} (3.2) After this, except for the sectors between ek and e1, there are no two suc- cessive sectors that both have a sector-part not occupied by any wedge. Hence,

(24)

Algorithm 1: BALLOON-LAYOUT

Input: Rooted tree T = (V, E, α), minimal radiusrmin

Data: Vertex arrays r (radius of subtree), ω (wedge size), Edge array δ (edge length)

Output: Coordinates x(v) = (x1(v), x2(v)) for allv ∈V begin

postorder traversal(root(T)) x(root(T))←(0,0)

preorder traversal(root(T))

procedure postorder traversal(vertex v) if deg(v) = 1 then

r(v)←rmin

else

r(v)←0^◦ k ←outdeg(v)

let{e₁, . . . , ek}=v.outedges() in counterclockwise order for i= 1 to k do

letwi ←ei.target() postorder traversal(wi)

Edges e0, ek+1; Vertices w0, wk+1

r(wk+1)←0;ω(w0)←ω(wk+1)←0^◦ if v 6=root(T) then

α(e0)←α(ek)+1)←(α(inedge(v)) +π) mod 2π else

α(e0)←α(ek+1)←(α(e1)−α(ek, e1)· _r(w^r(w₁_)+r(w¹⁾ _k₎) mod 2π for i= 1 to k do

ω(wi)←2·min{α(ei−1, ei)−^ω(w₂ⁱ⁻¹⁾, α(ei, ei+1)· _r(w_i_)+r(w^r(wⁱ⁾_i+1₎,^π₂} for i= 1 to k do

ω(wi)←2·min{α(ei−1, ei)− ^ω(w₂ⁱ⁻¹⁾, α(ei, ei+1)− ^ω(w₂ⁱ⁺¹⁾,^π₂} δ(ei)← _sin(^r(w^ω(ⁱwi⁾)

2 )

if δ(ei) +r(wi)> r(v) then r(v)←δ(ei) +r(wi) Deletee0,ek+1,w0, wk+1

procedure preorder traversal(vertex v) if v 6=root(T) then

u←parent(v); e←inedge(v)

x(v)←x(u) +δ(e)·(cosα(e),sinα(e)) foreach w∈children(v) do

preorder traversal(w)

(25)

(a) example tree

(b) balloon layout

Figure 3.2: Layout of a tree with given absolute angle for each edge

(26)

no wedge can be widened without narrowing another wedge. Placing a circle cw in the wedge with the center on its bisector hl(w) and the circle touching the wedge-borders, indicates the length of the edge (v, w): δ(v, w) = ^r(wⁱ⁾

sin^ω(^wi₂ ⁾. Given also the radius r(w) for every child w of v, the circle with radius r(v) = max{r(w) +δ(v, w)}^wis child ofv will contain the entire subtree. All edge lengths are computed in a postorder-traversal. With the angle also given for every edge, the layout is determined up to translation and scaling and can be computed in a preorder-traversal. Algorithm 2 shows the complete code. In Fig. 3.2 we show an example of a tree laid out with Algorithm 2. Each edge is labeled with the desired absolute angle. We thank Dina Tantawy for implementing our algorithm using yFiles version 2.7 (Wiese, Eiglsperger, and Kaufmann, 2002).

3.2 Distance Constraints

In a straight-line drawing of a graph satisfying distance constraints, the given vertex distances are exactly the desired edge lengths. A typical application are phylogenetic trees. Phylogeny is the study of evolutionary relationships within a group of organisms. A phylogenetic tree represents the evolutionary distances among the organisms represented by its leaves. Due to the increasing size of data sets, drawings are essential for exploration and analysis. In addition to the usual requirements for arbitrary tree structures, drawings of phylogenetic trees should clearly display given edge lengths (since they represent evolutionary distances) and leaf names. Standard methods from the graph drawing literature do not take these criteria into account. Popular software tools in computational biology such as TreeView (Page, 2000), PAUP^∗ (Swofford, 2002), or PHYLIP (PHYLIP, 1993) provide limited documentation and usually no analyses of their incorporated layout algorithms.

Two essential classes of graphical representation for phylogenetic trees can be distinguished (see, e. g., Carrizo (2004) for an overview of variants). Both classes are related to dendrograms, likely because many algorithms for the construction of phylogenetic trees are based on clustering (see, e. g., Swofford, Olsen, Waddell, and Hillis (1996); Felsenstein (2004)). They differ in that leaf labels are either placed monotonically along one axis or around the tree structure. While the first class of representations is very similar to standard dendrograms and easy to layout automatically, it is somewhat difficult to understand the nesting of subtrees from the resulting drawings. Here we focus on the algorithmically more challenging and graphically more appealing second class of representations, and further differentiate two subclasses called radial and circular tree drawings.

In radial tree drawings, edges extend radially monotonic away from the root, and we discuss a linear time algorithm that preserves all edge lengths exactly.

(27)

pseudomona

nico-tabac nico-syl-A

arabidopsi gylcine--- chara--- bryopsis-- gonium---- chlamydomo chlorella-

astasia--- euglena--- raphidonem

ochromonas cynophora

coscinodis cyclotella laminaria- porphyra-- smithora--

gracilaria anacystis-

plectonema gloeobacte myco-gentl

thermotoga borrelia-b ChlamydiaB Tthermophi

Taquaticus deinonema- bacillus-- salmonella ecoli---

(a) dendrogram (showing edge lengths)

pseudomona nico-tabac nico-syl-A arabidopsi gylc

ine--- chara

--- bryo

psis-- gonium

---- chlamyd

omo

chlorella-

ast asia --- eug

lena --- ra

phid onem ochro

mon as cyn

ophora coscinodis cyclotella laminaria- porphyra--

smithora--

gracilaria

anacystis-

plectonema glo eoba

cte

myco-g entl

therm otoga

borre lia -b

ChlamydiaB T therm ophi

T aqua

ticus dein

on em

a- bacillu

s-- salm

onella ecoli---

(b) circular dendrogram (ignor- ing lengths)

pseudomona nico-tabac nico-syl-A arabidopsi gylc

ine--- chara

--- bryo

psis-- gonium

---- chlamyd

omo

Ttherm

ophi Taqua

ticus dein

on ema-

bacillu s--

salm onella ecoli---

chlorella-

ast asia --- eug

lena --- ra

phid onem ochro

mon as cyn

ophora coscinodis cyclotella laminaria- porphyra--

smithora--

gracilaria

anacystis-

plectonema gloeoba

cte

myco-g entl

therm otoga

borre lia-b

ChlamydiaB

(c) weighted barycenter of children (Dwyer and Schreiber, 2004)

pseudomona micrococcu

shewanella salmonella ecoli---

bacillus-- myc

o-g entl

Chlamyd iaB

therm

otoga borelia-b

deinonema- Ttherm

ophi

Taqu atius

plect onema gloeobacte anacystis- gracilaria

porp hyra

-- sm

itho ra--

laminaria- cosci

nodia cyclo

tella

ochromonascynophora

raphidonem

asteugasialena------

bryo psis-- chlo

rel la-

gonium---- chlamydomo

chara ---

nico-t

aba c nic

o-syl-A ara bidopsig lyc ine---

(d) our radial approach

pseudomona

salm onella ecoli--- bacillu

s--

myco-gentl C

hla myd

iaB therm

otoga bore lia -b

deinonema-

Ttherm ophi Taqua plectonema tius gloeobacte anacystis-

gracilaria

porphyra --

smithora-- laminaria-

coscinodia cyclotella ochro

monas

cyn ophora

raphidonem astasia--- euglena---

bryo psis--

chlo rel la-

goniu m----

chlamydomo

chara--- nico-tabac

nico-syl-A arabid

opsi glyc

ine---

(e) our circular approach Figure 3.3: Drawings of an example from Dwyer and Schreiber (2004)

(28)

(a) radial drawing algorithm (b) circular drawing algorithm Figure 3.4: Complete binary tree withn= 2¹³−1 vertices

In circular tree drawings, leaves are placed equidistantly on the perimeter of a circle and the tree is confined to the inside of that circle. Note that it may be impossible to preserve edge lengths in this representation. We give an algorithm that heuristically minimizes length deviations and, even though based on solving a system of linear equations, runs in linear time as well. Both algorithms yield drawings that are unique in a well-defined sense up to scaling, rotation, and translation. Since each subtree is confined to its own wedge rather than a line interval, their nesting structure is more apparent than in vertical or horizontal representations. Even though phylogenetic trees are frequently restricted to ex- tended binary trees, our algorithms apply to general trees as well. The behavior of our algorithms are nicely compared on complete binary trees with uniform edge length as shown in Fig. 3.4.

A comparison with other drawing techniques on a real-world data set is provided in Fig. 3.3 and larger real-world examples are presented in Figs. 3.9 and 3.13. Effects of extensions are illustrated in Figs. 3.8 and 3.12. Note that the root of each tree is indicated by a dot. Typical tree reconstruction methods yield rooted trees in which most inner vertices have two children.

The leaves of a phylogenetic tree typically represent species, molecules, or DNA sequences (invariably referred to astaxa) under study and its inner vertices represent virtual or hypothetical ancestors. The length of an edge represents the evolutionary distance between its incident vertices and the entire tree represents a tree metric fitted to a (potentially noisy and incomplete) dissimilarity matrix defined over all taxa. Since we want to indicate δ(e) by the length of the line segment depicting e ∈ E, only positive values can be represented. If a method

(29)

for tree construction (see, e. g., Felsenstein (1973); Fitch (1971); Michener and Sokal (1957); Saitou and Nei (1987)) assigns negative or zero length to an edge, the range of values is shifted appropriately.

3.2.1 Radial Drawings

In this section, we describe a drawing algorithm that yields a planar radial drawing of a phylogenetic tree T = (V, E, δ). For general graphs it is N P- complete to decide whether it can be drawn in the plane satisfying distance constraints, even if the graph is planar and all edges have unit lengths (Eades and Wormald, 1990).

Theorem 2. For any tree T = (V, E) with distance constraints δ, a planar straight-line drawing satisfying δ can be determined in linear time.

3.2.1.1 Basic Algorithm

The main idea is to assign to all subtrees a wedge of angular width proportional to some weight function c. Given a rooted tree T, weights c(T(v)) will later be defined for allv ∈V \ {root(T)}. Weights of other subtrees are then implied via c(T) =P

v∈children(root(T))c(T(v)) and the following condition:

Complementarity Property:

For all edges {u, v}, c(Tu(v)) +c(Tv(u)) =c(T).

Note thatTu(v) =T(v) ifu= parent(v), so that at most one term is unknown in each of the above equations. A straightforward symmetry consideration shows that the extension to subtrees with respect to other roots is well-defined. We say that subtree weights arecomplementary if they satisfy the Complementarity Property.

To determine the layout, the wedge of an inner vertex is divided among its children and tree edges are drawn along wedge angle bisectors so that they can have any length without violating planarity. See Fig. 3.5 for an illustration.

Algorithm 2 is therefore based on a preorder-traversal of the input tree, where a childwof an inner vertex v is placed at distanceδ(v, w) on the angular bisector of the wedge reserved forw.

The following theorem shows that the layouts determined by Algorithm 2 are essentially the only ones that fulfill all natural requirements for radial drawings of trees with given edge lengths, and that the weights can be defined with respect to any root. We give two examples of interesting subtree weighting schemes in the next subsection.

(30)

w2

w1

v u

!v/2

±uv(,)

T w( 2) T w( 1)

leaves

(( )) T

w

1

leaves( ( )) Tw2

±v w (,

)1

± v

w (,

)

! 2 w

2

1 !_w2/2

!_w

2

!_w

1

!_v/2

!_w1/2

!w2/2

Figure 3.5: Wedges of vertexv’s neighbors

Algorithm 2: RADIAL-LAYOUT

Input: Rooted tree T = (V, E, δ), vertex arrayc (subtree weight) Data: Vertex arrays ω (wedge size) and τ (angle of right wedge border) Output: Coordinates x(v) for each vertex v ∈V

begin

x(root(T))←(0,0) ω(root(T))←2π τ(root(T))←0

preorder traversal(root(T))

procedure preorder traversal(vertex v) if v 6= root(T)then

u←parent(v)

x(v)←x(u) +δ(u, v)·

cos(τ(v) + ^ω(v)₂ ), sin(τ(v) + ^ω(v)₂ ) η←τ(v)

foreach w∈children(v) do ω(w)← ^c(T_c(T^(w))₎ ·2π τ(w)←η

η←η+ω(w) preorder traversal(w)

(31)

Theorem 3. For an ordered phylogenetic tree with complementary subtree weights, there is a unique planar radial drawing up to rotation, translation and scaling that satisfies the following properties independent of which vertex is chosen as the root:

1. Edge lengths are proportional to evolutionary distances.

2. Disjoint subtrees are confined to disjoint wedges.

3. The angular width of a subtree’s wedge is proportional to its weight.

4. Subtrees are centered at the bisectors of their wedges.

Moreover, it can be determined in linear time.

Proof. Let T = (V, E, δ) be a phylogenetic tree with complementary subtree weights cand fix any vertex as the root. With relative edge lengths and angles fixed, the only degrees of freedom left are translation, rotation, and scaling.

Because of Property 1, edge lengths are fixed up to scaling. It remains to show that angles between incident edges are fixed as well.

By Property 3, the width of a subtree wedge is θ ·c(T(v)) for each v ∈ V. By Property 2 and the definition of c(T), we have θ = _c(T^2π₎. Because of complementarity, c(T) = P

w∈children(v)c(Tv(w)) holds for all v ∈ V, so that the total width of all subtree wedges for the children of a vertex v is 2π. Together with Property 4 this implies that the angles between incident edges are fixed.

Clearly, Algorithm 2 determines the desired layout in linear time.

A tree drawing is said to haveconvex faces if the path between all consecutive pairs of leaves is a convex arch, i. e., a polygonal chain in which the maximum angular difference between two edges is π and the edges occur sorted by their absolute angles. In a tree drawing with convex faces, the edge lengths can be chosen arbitrarily without causing intersections.

Lemma 1. Algorithm 2 yields a drawing with convex faces.

Proof. AssumeP = (e₁. . . ep) is the path from one leafz to the counterclockwise next leaf z⁰ in Tz. Consider any two edges ei and ej with i < j. Since for the sizes ω(ej) ≤ ω(ei) ≤ 2π the wedge of ej is contained in the wedge of ei with the right wedge borders parallel and each edge is drawn on the bisector of its wedge, the relative angle between the ei and ej is at mostπ and the edges inP are sorted by their absolute angles.

(32)

Labels of leaves can be placed on the angle bisector of the respective wedge.

Since the angle of a leaf wedge may be small, labels placed close to their leaf may not fit into the wedge. When using a font of heighth, non-overlapping labels are guaranteed if the labels of all leaves v ∈V are placed at a distance of at least

h

2·tan (π·c(T(v))/ c(T)) (3.3) from parent(v).

3.2.1.2 Subtree Weights

While any complementary subtree weighting scheme is admissible, the following two seem to be especially relevant.

Count of Leaves

It is easy to see that the proportion of leaves in a subtree yields complementary subtree weights. This weighting scheme we proposed (Bachmaier, Brandes, and Schlieper, 2005) when we were not aware of the description of the following algorithm (Felsenstein, 2004) implemented in PHYLIP (1993).

In a preorder traversal, where the angle of each edge is determined for a tree with l leaves, let leaves(T) = {z1, . . . , zl}. With a reference leaf z1, the leaves are arranged around the unit circle in counterclockwise order with zi positioned at angle

αi = 2π· i

l for 1≤i≤l. (3.4)

For an edge e let {zp, z(p+1) modl, . . . , zq} be the set of leaves separated from z1 bye. Then the angleα(e) of e is

α(e) = 1

2(αp+αq). (3.5)

To determine vertex coordinates during the preorder-traversal of the tree starting at the reference vertex z1, each neighbor w of the current vertex v is pushed away from v in the direction specified by α(v, w) such that (v, w) has length δ(v, w).

We next show that this algorithm yields exactly the same drawings as Algo- rithm 2 with subtree weights equal to their number of leaves.

Since the straight-line drawing of a tree is fully determined if the angle and length is determined for each edge and evolutionary distances are preserved in both algorithms, we only have to show that the two algorithms compute the same

Constrained Graph Drawing

Dissertation

zur Erlangung des akademischen Grades des Doktors der Naturwissenschaften

vorgelegt von

Barbara Pampel, geb. Schlieper

an der Universit¨ at Konstanz

Mathematisch-Naturwissenschaftliche Sektion Fachbereich Informatik und Informationswissenschaft

Tag der m¨ undlichen Pr¨ ufung: 14. Juli 2011 1. Referent: Prof. Dr. Ulrik Brandes

2. Referent: Prof. Dr. Michael Kaufmann

Zusammenfassung

Ergebnisse

Winkel und Abst¨ ande

Reihenfolge

Richtungen

Form

Acknowledgements

Contents

Chapter 1 Introduction

1.1 Angles and Distances

1.2 Ordering

1.3 Directions

1.4 Shape

Chapter 2 Preliminaries

Chapter 3

Angles and Distances

3.1 Angle Constraints

3.2 Distance Constraints

3.2.1 Radial Drawings