• Keine Ergebnisse gefunden

As an application example we tested our approach on some hierarchical structured German news data from Europe Media Monitoring (EMM) 1. EMM gathers reports from different news portals, extracts information and produces different visual presentations of the news information. Since we could not get the news data directly, we were glad that the group of Prof. Dr. Keim at the University of Konstanz computed the distances and indices for us, so that we could use them for visualization. Fig. 6.1 describes the hierarchical structure of the news data.

We took several thousand news articles of one day as an example. See Fig. 6.2 for an example of the visualization.

Nodes of Hierarchy Graph

Nodes of Document Graph Edges of Hierarchy Graph Edges of Document Graph

DE CH

Frankfurter FAZ blick NZZ

Figure 6.1.: Scheme of the hierarchical data: news from different newspapers which are hierarchically structured by different categories.

1Europe Media Monitor,http://emm.newsbrief.eu

72 6 APPLICATION EXAMPLE

(a)

(b)

Figure 6.2.: (a) Overview of the hierarchical structure, (b) Result for search query

”Krise”. The relative positions of the different newspapers are pre-served in most cases. The space is used efficiently and the edges of a node are highlighted when moving the mouse over this node. Com-paring the positions of the newspaper ”stern” and ”faz” one can see that the relative positions are nearly the same, although a lot of news papers are not shown anymore.

6Application Example 73 Looking at Fig. 6.2 we can see that the preservation of the relative positions and of the mental map works quite well. The relation between two cells is retained in most cases and the hierarchical structure is also clearly visible. A limitation which becomes visible is that some parts of the hierarchy are too big and do not have their assigned area, like ”BE” and ”FR” on the lower left of Fig. 6.2b. This prob-lem occurs in cases where two high weighted Voronoi cells are in conflict with each other and thus their weight cannot be increased without causing a domination. To solve this problem, we plan to create a heuristic which introduces a border force to move the site of the low weighted Voronoi cell. This force would change the centroid definition by moving the centroid of a border region in the direction of the border. The space nevertheless is used for the most important hierarchy parts of a certain query.

The visualization of the edges also often shows clusters of different newspaper which seem to have very similar articles. A possible reason for this is that they use the same source for their news articles. Another reason could be that it is the same topic which is reported. Further analysis of these cases could be done if one has the textual information of the articles and not only the distances.

7. Summary

We introduced a new approach which could be used as a proactive extension to the normal, ranking based, search interface. For each query the user gets a vi-sualization of the underlying search space and the dependencies of the document results.

To solve this challenge we transformed the problem into a problem of visualising a hierarchically clustered graph.

Our introduction of a preprocessing step creates initial Mental Map positions which are used to consider the similarity in the layout. By using the Mental Map po-sitions, the layout is also stabilized. Our stability tests show that about 60% of the initial neighbourhood relations are preserved and that 90% of the orthogonal orderings are maintained. The aspect ratio of the visual elements is also very good and is about 0.9 after a low number of iterations.

We also developed the analytical computation of the Voronoi Treemap which has the advantage of running in O(k·nlogn). Fortune’s algorithm was used and complemented by several other algorithms to be able to analytically compute a Voronoi Treemap. The previous Monte Carlo based method for Voronoi Treemaps took at least Ω(k·n2+n2logn) steps.

By combining MDS and the Voronoi Treemap technique, we developed a flexi-ble method which uses the availaflexi-ble space efficiently and could also be used in many other fields. Our approach uses the area for the important parts of the hi-erarchy, considers similarity, has a good aspect ratio for the visual elements and preserves the hierarchy.

As a further step, we used Stress Majorization to place similar documents near each other but still keep the overall Mental Map. The hierarchically bundled edges then clarify the dependencies of the document nodes.

We empirically observed that the initial positions of a Voronoi Treemap are very powerful and lead to good local optima. We tested our approach on some real data and found it to be very useful as an extension to a normal search interface. We also think that the combination of Stress Majorization and the Voronoi Treemap technique could be used in other fields where the location and size of elements plays a role.

76 7 SUMMARY

7.1. Outlook

In this section we explain possible improvements and research directions related to this work.

Border Heuristic: As already described in the previous chapter we still have to improve the efficiency of the space usage. The area of a Voronoi cell should be proportional to its weight. By introducing a force for the Voronoi cells which are located on the border, the centroid is moved in direction of the border. This could worsen the aspect ratio of the cells on the border but it would probably improve the area-weight ratio.

Extrapolation of Lloyd’s method: To make the application faster one can reduce the number of iterations in Lloyd’s method, which would lead to lower quality of the Centroidal Voronoi Tesselation. An alternative to this is the extrapolation of the next positions. By predicting the direction one could reach the desired convergence for Lloyd’s method with fewer iterations. This would speed up the whole process.

User Study: By integrating the visualization in an often used real environment one could give users tasks they have to solve and see how the visualisation influ-ences their behaviour. This would help to determine to which extent the visuali-sation is beneficial for the user. Even without asking the user directly one could determine the behaviour by logging how the visualization is used in combination with the search interface. Such a study could also consider different approaches and compare them from the view of the user.

Incremental Preprocessing Update: Since the preprocessing step is quadratic, it is very slow for a huge amount of data. If the data changes partly a complete recomputation of the indices would be necessary. An incremental update of the indices would speed this up on dynamic data.

Experimental Quality Study: Further studies are necessary to make more gen-eral statements about the quality of our approach. We already showed some steps in this direction by defining some quality measurements and generating different example initialisation to explore the behaviour over the amount of iterations.

Experimental Parameter Study: It is not obvious which parameters to use for the anchoring in Stress Majorization. The best parameter is still an open question since they are depending on the number of neighbours. If the anchoring weight is too high the user cannot associate the mental map, if it is too low the new layout might not consider enough the query.

A. Quality Measurements

Figure A.1.: Cluster-Distribution: uniform, Weight-Distribution: uniform. a) neighbourhood preservation, b) inversions in x-directon, c) inversions in y-direction

78 A QUALITY MEASUREMENTS

Figure A.2.: Coordinate-Distribution: uniform, Weight-Distribution: few impor-tant (FI). a) neighbourhood preservation, b) inversions in x-directon, c) inversions in y-direction

AQuality Measurements 79

(a)

0 50 100 150 200

020406080100

Iterations

Amount of Inversions in %

Methods Random Squarified SliceAndDice Voronoi Boxplot

Whisker Quartil Median

(b)

0 50 100 150 200

020406080100

Iterations

Amount of Inversions in %

Methods Random Squarified SliceAndDice Voronoi Boxplot

Whisker Quartil Median

(c)

Figure A.3.: Coordinate-Distribution: uniform, Weight-Distribution: few impor-tant (FI). a) neighbourhood preservation, b) inversions in x-directon, c) inversions in y-direction

80 A QUALITY MEASUREMENTS

Figure A.4.: Coordinate-Distribution: cluster, Weight-Distribution: few impor-tant (FI). a) neighbourhood preservation, b) inversions in x-directon, c) inversions in y-direction

AQuality Measurements 81

Figure A.5.: Measurement of the aspect ratio for the different proposed ini-tialisations. Coordinate-Distribution/Weight-Distribution: a) uni-form/uniform, b) uniform/few important, c) cluster/uniform, d) clus-ter/few important

BVisualization Examples 83

B. Visualization Examples

84 B VISUALIZATION EXAMPLES

Result of the visualisation for the search query ”Nat”

List of Figures

1.1. Normal search interface . . . 1 1.2. Normal graph and hierarchically clustered graph . . . 5 2.1. Websom: example which uses self-organizing maps to mark

impor-tant result regions . . . 10 2.2. SPIRE uses self-organizing maps to create maps which consider

doc-ument similarities. Different attributes can be mapped to the visual properties. . . 10 2.3. Cat-A-Cone visualization of the hierarchy by using cone trees . . . 11 2.4. Hyperbolic Browser which changes the focus on the queried parts

of the hierarchy . . . 12 2.5. ResultMaps which visualizes the document space and the

corre-sponding results for a query by using a Squarified Treemap . . . 13 2.6. InfoSky: Visualization where the results of a search query are

high-lighted in a hierarchical search space representation by using yellow markers . . . 14 2.7. Information Pyramids: Hierarchy grows in the third dimension which

results in a pyramidal representation of the hierarchy . . . 15 2.8. Webmap which creates zoomable regions to visualize the

corre-sponding hierarchy. The regions can get quite complex and thus hard to recognize . . . 15 3.1. Small hierarchy and its treemap. The nodes are represented by

nesting rectangles which preserves the hierarchy relations. . . 18 3.2. Rectangular Treemap and Voronoi Treemap . . . 19 3.3. Centroidal Voronoi Tesselation with and without MDS . . . 20 3.4. Hierarchical Edge Bundling, (a) control polygon with LCA (Least

Common Ancestor, (b) control polygon without LCA . . . 21 3.5. Representation of the hierarchy with nested polygons using blue

color scheme . . . 22 3.6. Using a darker color of the higher hierarchy for the border makes

the contour visible even for objects which are too small and thus not sharp enough. . . 22 4.1. Diagram of the different preprocessing steps . . . 23 4.2. (a) XML-Scheme of the desired input. (b) Example of a data set

with two documents. . . 24

86 List of Figures 5.1. Representation of the layout steps . . . 29 5.2. Anchoring example . . . 32 5.3. Illustration of the scaled gradient projection for an iteration of the

stress majorization . . . 33 5.4. Voronoi Treemap of a software hierarchy . . . 34 5.5. A hierarchy (a) which is used to create a Voronoi Treemap (d).

Initial positions of the first hierarchy layer (blue nodes) are used to generate a Voronoi diagram (b)-(c). Each resulting region is used for the child nodes in the second layer. . . 35 5.6. Voronoi diagram of a set of sites in the plane. Outer cells are not

bounded. . . 36 5.7. Connecting the half-lines of a Voronoi diagram to a vertex, which

lies in infinity, to get a connected planar Graph with linear edges in n 37 5.8. (a) Voronoi diagram with initial sites. (b) Voronoi diagram after 50

iterations with Lloyd’s method. The move of the sites is described by the points. . . 40 5.9. Example of iterative Centroidal Voronoi tesselation. Note that the

numbers are for identification. Site 3 has much higher weight, that is why its weight (radius) is increased. Last diagram shows the situation after 100 iterations. . . 42 5.10. Voronoi diagram (a) changes after the sweep-line has passed (b). It

is not possible to guarantee that everything behind the sweep-line has been computed correctly. . . 42 5.11. Mapping function ∗p(l) for point p where l is a bisector of sites p

and q . . . 44 5.12. Two degenerate cases which are implicitly handled by the Fortune

Algorithm. (a) cocircular sites which cause zero-length edges, (b) collinear sites . . . 47 5.13. Break point of two sites and the sweep-line . . . 52 5.14. Intersection of two bisectors is the midpoint of tangent circle to the

given circles Cp1, Cp2 and Cp3. . . 53 5.15. Voronoi diagram of 6 weighted sites. The red bisectors are given

for site 3. They have to be clipped with the bounding box and connected such that they have a closed simple form. . . 55 5.16. Representing a hyberbola section as rational quadratic B´ezier curve

with points b0, b1 and b2. v0 and v2 are tangents to b0 and b2. . . 58 5.17. Solving overlapping by spring embedder like technique. Repulsive

forces are introduced between overlapping nodes. Overlap is recog-nized if the euclidean distance between pa andpb is smaller than the sum of their radii. . . 62 5.18. Bundling by using the position of the nodes in the path between

start node (P0) and end node (P4) . . . 63

List of Figures 87 5.19. (a) Cubic B´ezier curve with control polygon defined by P0, . . . , P3.

(b) Construction of point on curve for t = 0,5 using the Casteljau Algorithm. . . 64 5.20. Straightening the control polygon to control the bundling strength

with β ∈[0,1]. . . 64 5.21. a) Hierarchical Edge Bundling with the LCA as part of the control

polygon. b) Hierarchical Edge Bundling without LCA. (β = 0.8) . . 66 5.22. Two stability measures for Lloyd’s method. Created inversions and

preserved neighbours . . . 70 5.23. Boxplot of aspect ratio for two different initialisation schemes. (a)

Coordinate Distribution: Clustered, Weight Distribution: FewIm-portant, (b) (Coordinate Distribution: Clustered, Weight Distribu-tion: Uniform) . . . 71 6.1. Scheme of the hierarchical data: news from different newspapers

which are hierarchically structured by different categories. . . 73 6.2. (a) Overview of the hierarchical structure, (b) Result for search

query ”Krise” . . . 74 A.1. Cluster-Distribution: uniform, Weight-Distribution: uniform. a)

neighbourhood preservation, b) inversions in x-directon, c) inver-sions in y-direction . . . 79 A.2. Coordinate-Distribution: uniform, Weight-Distribution: few

im-portant (FI). a) neighbourhood preservation, b) inversions in x-directon, c) inversions in y-direction . . . 80 A.3. Coordinate-Distribution: uniform, Weight-Distribution: few

im-portant (FI). a) neighbourhood preservation, b) inversions in x-directon, c) inversions in y-direction . . . 81 A.4. Coordinate-Distribution: cluster, Weight-Distribution: few

impor-tant (FI). a) neighbourhood preservation, b) inversions in x-directon, c) inversions in y-direction . . . 82 A.5. Measurement of the aspect ratio for the different proposed

ini-tialisations. Coordinate-Distribution/Weight-Distribution: a) uni-form/uniform, b) uniform/few important, c) cluster/uniform, d) cluster/few important . . . 83

List of Algorithms

1. Computation of relative positions . . . 27 2. Stress Majorization with region constrains . . . 33 3. Monte Carlo Centroidal weighted Voronoi diagram Computation . . 41 4. Computation of V(s) . . . 46 5. Computation of additive weighted V(s) . . . 50 6. Connect Segments . . . 56 7. Analytical Centroidal weighted Voronoi diagram Computation . . . . 60 8. Voronoi Treemap Computation for HCG . . . 61

Bibliography

[1] K. Andrews, W. Kienreich, V. Sabol, J. Becker, G. Droschl, F. Kappe, M. Granitzer, P. Auer, and K. Tochtermann. The infosky visual explorer:

exploiting hierarchical structure and document similarities. In Proc. Infor-mation Visualization, pages 166–181, 2002.

[2] K. Andrews, J. Wolte, and M. Pichler. Information pyramids tm: A new ap-proach to visualising large hierarchies. IEEE Visualization’97, Late Breaking Hot Topics Proc.:49–52, 1997.

[3] Apache Software Foundation. Apache lucene. http: // lucene. apache.

org/, visited on April, 2011.

[4] M. Balzer and O. Deussen. Exploring relations within software systems using treemap enhanced hierarchical graphs. International Workshop on Visualizing Software for Understanding and Analysis, pages 1–25, 2005.

[5] M. Balzer and O. Deussen. Voronoi treemaps. In Proc. IEEE Symposium on Information Visualization (InfoVis 05), pp. 49-56, 2005.

[6] M. Balzer, O. Deussen, and C. Lewerentz. Voronoi treemaps for the visual-ization of software metrics. In ACM Symposium on Software Visualization (SoftVis), 2005.

[7] M. Berg, O. Cheong, and M. Kreveld. Computational geometry: algorithms and applications. Springer, 2008.

[8] J. Bernhardt, S. Funke, M. Hecker, and J. Siebourg. Visualizing gene expres-sion data via voronoi treemaps. In Proceedings of the 2009 Sixth International Symposium on Voronoi Diagrams, pages 233–241, 2009.

[9] I. Borg and P. Groenen. Modern multidimensional scaling: theory and appli-cations. Springer, 1997.

[10] K. B¨orner, A. Dillon, and M. Dolinsky. Lvis - digital library visualizer. In-ternational Conference on Information Visualisation, 0:77, 2000.

[11] P. Bourke. Calculating the area and centroid of a polygon, 1988. http:

// paulbourke. net/ geometry/ polyarea/, visited on April, 2011.

[12] C. Bradley. The Algebra of Geometry: Cartesian, Areal and Projective Co-Ordinates. BPR Publishers, 2007.

92 Bibliography [13] U. Brandes and C. Pich. Eigensolver methods for progressive

multidimen-sional scaling of large data. In Proc. Graph Drawing, pages 42–53, 2006.

[14] U. Brandes and C. Pich. An experimental study on distance-based graph drawing. In Proc. Graph Drawing, pages 218–229, 2009.

[15] S. S. Bridgeman and R. Tamassia. A user study in similarity measures for graph drawing. In Proc. Graph Drawing, pages 19–30, 2000.

[16] M. Bruls, K. Huizing, and J. van Wijk. Squarified treemaps. In Proceedings of the Joint Eurographics and IEEE TCVG Symposium on Visualization, pages 33–42, 1999.

[17] J. Caumanns. A fast and simple stemming algorithm for german words, 1999. http: // edocs. fu-berlin. de/ docs/ receive/ FUDOCS_

document_ 000000001631, visited on April, 2011.

[18] E. Clarkson, K. Desai, and J. Foley. Resultmaps: Visualization for search interfaces. IEEE Transactions on Visualization and Computer Graphics, 15:1057–1064, 2009.

[19] E. C. Clarkson, J. A. Day, and J. D. Foley. The development of an educational digital library for human-centered computing, 2005.

[20] G. W. Coaklay. Analytical solutions of the ten problems in the tangencies of circles; and also of the fifteen problems in the tangencies of spheres. In The Mathematical Monthly 2: 116-126, 1860.

[21] T. Cox and M. Cox. Multidimensional scaling. Number Bd. 1 in Monographs on statistics and applied probability. Chapman & Hall/CRC, 2001.

[22] M. Damashek. Gauging similarity with n-grams: Language-independent cat-egorization of text. Science, 267(5199):843–849, 1995.

[23] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harsh-man. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407, 1990.

[24] H. D¨orrie. 100 Great Problems of Elementary Mathematics: Their History and Solutions, chapter §32, pp. 154-160. New York: Dover, 1965.

[25] Q. Du, M. Emelianenko, and L. Ju. Convergence of the Lloyd Algorithm for Computing Centroidal Voronoi Tessellations. SIAM J. Numer. Anal., 44:102–

119, 2006.

[26] D. Dubin. The most influential paper Gerard Salton never wrote. Library Trends, 52(4):748–764, 2004.

Bibliography 93 [27] T. Dwyer and K. Marriott. Constrained Stress Majorization Using Diagonally

Scaled Gradient Projection, 2008.

[28] D¨ursteler,Juan C. The digital magazine of infovis.net. http: // www.

infovis. net/ printMag. php? lang= 2&num= 55, visited on April, 2011.

[29] P. Eades and N. C. Wormald. Fixed edge-length graph drawing is NP-hard.

Discrete Applied Mathematics, 28(2):111 – 134, 1990.

[30] H. Ernst. Grundkurs Informatik: Grundlagen und Konzepte f¨ur die erfolgreiche IT-Praxis- Eine umfassende, praxisorientierte Einf¨uhrung.

Vieweg+Teubner, 2008.

[31] G. Farin. NURBS: from projective geometry to practical use. A.K. Peters, 1999.

[32] G. Farin. Curves and surfaces for CAGD: a practical guide. Morgan Kauf-mann, 2002.

[33] G. Farin, J. Hoschek, and M. Kim. Handbook of computer aided geometric design. Elsevier, 2002.

[34] H. Fischer and H. Kaul. Mathematik f¨ur Physiker. 1. Grundkurs. Teubner, 2005.

[35] G. Fishman. Monte Carlo: Concepts, Algorithms, and Applications. Springer Verlag, 1996.

[36] S. Fortune. A sweepline algorithm for Voronoi diagrams. Algorithmica, 2:153–

174, 1987.

[37] M. Fowler. UML distilled: a brief guide to the standard object modeling lan-guage. Addison-Wesley, 2004.

[38] E. Gansner, Y. Koren, and S. North. Graph Drawing by Stress Majorization.

In Proc. Graph Drawing, pages 239–250, 2005.

[39] M. Gavrilova. Generalized Voronoi Diagram: A Geometry-Based Approach to Computational Intelligence. Studies in Computational Intelligence. Springer, 2008.

[40] O. Gospodnetic and E. Hatcher. Lucene in action. Manning, 2005.

[41] D. Gotz. Dynamic Voronoi Treemaps: A Visualization Technique for Time-Varying Hierarchical Data. IBM Research Report, 2011.

[42] M. Harrower and C. Brewer. Colorbrewer. org: an online tool for selecting colour schemes for maps. The Cartographic Journal, 40(1):27–37, 2003.

94 Bibliography [43] S. S. Haykin. Neural Networks and Learning Machines: A Comprehensive

Foundation. Prentice Hall, 2008.

[44] M. A. Hearst and C. Karadi. Cat-a-cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy. SIGIR Forum, 31:246–255, July 1997.

[45] P. M. Heimann. Seventeenth Century The Mathematical Papers of Isaac Newton. Volume VI. The British Journal for the History of Science, 9(01):75–

77, 1976.

[46] K. E. Hoff, III, T. Culver, J. Keyser, M. Lin, and D. Manocha. Fast compu-tation of generalized Voronoi diagrams using graphics hardware. In Proceed-ings of the sixteenth annual symposium on Computational geometry, SCG ’00, pages 375–376, 2000.

[47] D. Holten. Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data. IEEE Transactions on Visualization and Computer Graphics, 12:741–748, 2006.

[48] T. Honkela, S. Kaski, K. Lagus, and T. Kohonen. WEBSOM - Self-Organizing Maps of Document Collections. In Neurocomputing, pages 101–117, 1997.

[49] M. S. Horn, M. Tobiasz, and C. Shen. Visualizing Biodiversity with Voronoi Treemaps. International Symposium on Voronoi Diagrams in Science and Engineering, 0:265–270, 2009.

[50] M. Iron, R. Neustedt, and O. Ranen. Method of graphically presenting net-work information, Mar. 20 2001. US Patent App. 09/812,968.

[51] M.-Y. Kao. Encyclopedia of Algorithms. Springer, 2008.

[52] R. Klein. Algorithmische Geometrie: Grundlagen, Methoden, Anwendungen.

Springer, 2005.

[53] K. Koh, F. Dong, and E. Tay. Introduction to graph theory: H3 mathematics.

World Scientific Pub Co Inc, 2007.

[54] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis. Psychometrika, 29:1–27, 1964.

[55] P. S. Kurt Mehlhorn. Algorithms and Data Structures: The basic Toolbox.

Springer, 2008.

[56] J. Lamping, R. Rao, and P. Pirolli. A focus+context technique based on hyperbolic geometry for visualizing large hierarchies. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI ’95, pages 401–408, 1995.

Bibliography 95 [57] T. K. Landauer and S. T. Dumais. A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104:211–240, 1997.

[58] J. McConnell. Computer graphics: theory into practice. Jones and Bartlett Publishers, 2006.

[59] T. Munzner. H3: laying out large directed graphs in 3d hyperbolic space.

IEEE Symposium on Information Visualization, 0:2, 1997.

[60] T. Nishizeki and N. Chiba. Planar Graphs: Theory and Algorithms. Dover Pubn Inc, 2008.

[61] A. Okabe. Spatial Tessellations: Concepts and Applications of Voronoi dia-grams. Wiley, 2000.

[62] R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy. The Effectiveness

[62] R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy. The Effectiveness