More-Than-Text Visualization - Visualization of Large Document Corpora

Text visualizations are applied to large corpora of documents but do not of-ten consider the value of additional informations given by images. Both types of content have different properties and fulfill different functions. Text is ca-pable of describing thoughts and feelings as universal as no other represen-tation. It allows the description of abstract concepts like e.g. "freedom" or

"god" [Naj98]. Text is read sequentially, which makes for a good transmitter of sequential data, while processing of parallel data is slow. A texts universality is limited by language dependencies. In opposition, images are processed highly parallel and they can be language independent (think of photographs). While they allow a very detailed description of a fact or a scene, abstract concepts require a common knowledge of representation. Figure 1.3 gives examples of images where abstract concepts require a shared context knowledge among ob-servers to be readable. The circles around the heads in the Madonna painting can be interpreted as sign for holiness by people knowing Christian symbolism.

The formula (a pictogram) depicting that for all x there is at least one y that fulfills y = x+ 1 requires mathematical symbolic conventions. On the other hand, an image like the photograph in Figure 1.3 is a good example of how rich images describe scenes in very compact form. To convey the whole information (including e.g. the blue car) with text would require considerable effort and large space. Both, the photograph and the Madonna image exemplify that

pictures are good transmitters of empathy.

concept holiness

concept infinity

photograph

Figure 1.3: Examples of figures depicting various concepts: Benois Madonna (∼1478, Leonardo da Vinci) on the left visualizes the concept of holiness by the commonly shared symbol of a halo; the formula top right requires common knowledge of mathematical symbols to be interpretable. The photograph ex-emplifies how expressive images can be in a limited space. (License: Wikimedia Commons Public Domain)

The following selection of statements on images and texts have been largely collected from Collin Wares’ book [War04] and Strothotte and Strothotte [SS97]:

• Images perform better to show structural relationships. Bartram [Bar80]

examined journey planning for bus rides and found that graphical rep-resentation worked better than tables.

• Visual information is generally remembered better than verbal informa-tion, but not for abstract images. [BKD75]

• Text is better than graphics for conveying program logic.[War04]

• Verbs are awkward to express in presentational pictures. [SS97]

• Presentational pictures are good for communicating structural informa-tion.[SS97] Without this agreement, information visualization would not

be beneficial. Especially important in this context are two references.

The first one from Cleveland and McGill [CM84] on pre-attentive fea-tures and how precisely we can depict visual feafea-tures before really draw-ing attention to them. As a second reference, Gestalt theory describes laws that effect human perception when looking at spatial arrangements of visual items [Wer23].

After reflecting on aspects that separate images and texts we now search for evidence that their use in combination can be of benefit. Bieger and Glock [BG86] observed that providing pictorial context for an assembly task reduced assembly time and slightly increased correctness. Bock [Boc78] made experiments on the detection and processing of ambiguous words and sen-tences. As a result "the subject’s awareness of sentence ambiguity, and hence the depth of semantic analysis, was found to depend on the pictorial context in which the sentences were presented. The pictorial context was also found to affect the depth of processing of unambiguous sentences, which, when presented without a picture, were more time-consuming in comprehension and less well recalled than when preceded by a picture". These findings indicate the potential of using images as contextual information. A good example for the intuitive use of images to disambiguate texts is given by Collins et al. [CP06]. For a machine translation scenario they use Flickr³ images as word representatives if the measured word ambiguity between two languages is too high. A classi-cal example of integrating graphics into text is a drawing from Oliver Byrne (1847) visualizing the proof of the Pythagorean theorem (see Figure 1.4).

Before related work is discussed we have a look at how images and texts are linked in corpora of mixed documents. During analysis of such a corpus, we can build a network of images and terms, e.g. via reference decomposition.

When creating such a bi-partite concept graph it is hard to detect what acts as concept and what are its members. E.g., the photograph in Figure 1.3 could be a concept for its members "child" and "netherlands", while "child" can itself could be a concept for multiple images showing children. This problem is known in literature and for further reading see the PhD thesis of Tobias Köt-ter [K¨12] or Kötter and Berthold [KB12]. Adoption of methods for visualizing this duality of images and texts is interesting future work.

3http://www.flickr.com

Figure 1.4: Visual proof of the Pythagorean theorem in "The First Six Books of The Elements of Euclid" (1847 by Oliver Byrne). (License: Wikimedia Commons Public Domain)

Related work

Systems that use figurative and textual content in combination are often dominated by one type. As examples of image dominated systems we con-sider image search engines (like flickr), image annotation systems (like CAT-MAID [SCHT09]), or image tagging and commenting functions in social net-works (like facebook or imgur⁴). Text-dominated systems have been discussed as text visualization systems in Section 1.1.3. We want to focus on integrated visualizations.

The BioText search engine from Hearst et al. [HDG⁺07] integrates abstracts, titles, and figures in a single list of search results. Pafilis et al. [POJ⁺09]

developed the reflect⁵ system, which tags names of genes, proteins or small molecules in biological websites. When clicking on an annotation it presents a concise summary of image and text information on the specific entity. Jhaveri and Räihä [JR05] describe their method, Session Highlights, of representing webpage thumbnails as user-selective browser history. Currently, rendering page previews has become an essential part of web browsers and search en-gines. These methods are not selective to content and do not summarize, they just display "first pages". More selective are Marian Dörks projects of Visual Backchannel [DGWC10] integrating text and images for following com-munication structure or his display of linked VisGets [DCCW08]. We discuss advanced approaches of document summarizations further in Chapter 2.

We conclude with the approach of WordsEye from Coyne and Sproat [CS01].

They convert textual descriptions into 3D scenes. After linguistic analysis, 3D models are queried from a large model database and model attributes are de-picted. The resulting images are simplistically iconic. For the 3D construction, assumptions have to be made that are not part of the original text and can therefore induce false information (analogous to pandoleia).

4http://www.facebook.comor http://www.imgur.com

5http://reflect.ws

Im Dokument Visualization of Large Document Corpora (Seite 22-27)