Visualization of queries or query attributes

3. Information Visualization

3.3. State of the Art: Visualization Ideas, Metaphors, Techniques, Components and Systems. 49

3.3.3. Components

3.3.3.1. Visualization of queries or query attributes

In the AI-STARS system, [Anick, Brennan, Flynn et al. 1990] used a component called “Query Reformulation Workspace” to visualize Boolean queries automatically derived from natural lan-guage queries. The ascertained citation forms are laid out as tiles two dimensional form, represent-ing the Boolean queries with “AND” and “OR” conditions. The system carries out automatic op-erations on the query, like identification of noisewords or meaningful phrases. The results are also visualized. Figure 25 shows the Boolean query “(’copy’ AND ‘BACKUP saveset’ AND ‘tape’

AND (‘v.5.0’ OR ‘version 5.0’))” automatically derived from natural language query “copying backup savesets from tape under v5.0”. The example of [Anick, Brennan, Flynn et al. 1990] is based on a database with technical information for customer support specialists. Using the WebViz-example the query could be “((’visualization’ OR ‘visualisation’) AND ‘search’ AND

‘results’ AND (‘www’ OR ‘internet’))” automatically derived from the natural language query

“Visualization of Search Results from the World Wide Web”. The black tiles represent the query.

The white tiles represent citation forms detected, but not automatically selected by the system. By clicking on the tiles the selections can be toggled. Additionally there are number of other functions like changing Boolean operators by moving tiles to other columns or requesting a window with related terms to expand or change the query. The related terms are grouped in phrases containing the term, synonyms, conceptually related terms, and compound terms. The numbers in the lower left corner of the tiles shows the number of postings of each term.

visualization

Query: „Visualization of Search Results from the World Wide Web“

visualization

from thethe widewide webweb www

Query: „Visualization of Search Results from the World Wide Web“

copy Query: „copying backup savesets from tape under v5.0 “ copy

BACKUP savesetsaveset fromfrom underunder

version 5.0 Query: „copying backup savesets from tape under v5.0 “

Figure 25: Principle of the Query Reformulation Workspace used in the AI-STARS system by [Anick, Bren-nan, Flynn et al. 1990]

As described on page 58 discussing the water flow metaphor, [Shneiderman 1991] / [Young, Shneiderman 1993] introduced a component called Filter/Flow to overcome known problems with the formulation of Boolean queries. The filters let through only the appropriate documents and the pipe layout determined if the relationship was an “AND” or an “OR”. The left part of Figure 26 shows the simplified example of a complex query according to Figure 5 from [Young, Shneider-man 1993]. The example uses an employee database. The task is to find the accountants or engi-neers from Georgia who are managed by Elisabeth, or clerks from Georgia who make more than thirty thousand dollars per year. The right part of Figure 26 shows a transfer of the principle to the visualization of Web search results. Assuming an already found result set for the WebViz-example, the task is to filter English or German documents that are mixed linklists from academic servers, or high relevant English or German documents of all types except framesets.

DEXA: Abst

Filter English or German documents which are mixed linklists from academic servers, or high relevant English or German documents of all types except framesets.

DEXA: Abst

Filter English or German documents which are mixed linklists from academic servers, or high relevant English or German documents of all types except framesets.

Find the accountants or engineers from Georgia who are managed by Elisabeth, or the clerks from Georgia who make more than thirty thousand Akelo

Figure 26: Principle of Filter/Flow component by [Shneiderman 1991], [Young, Shneiderman 1993]

Another visualization of the filter effect of a query has been proposed by [Fishkin, Stone 1995]

with the Movable Filters or Magic Lens Filters, already mentioned on page 55. The component allows compound queries to be constructed by overlapping lenses. The left side of Figure 27 shows an example taken from [Fishkin, Stone 1995] where two filters are combined in an “AND”-condition on a map with symbols for cities. One filter allows only cities where taxes are low to pass, the other one only cities with high salaries. Grey rectangles passed the filters. White rectan-gles do not fulfill the conditions of the query. The example shows cities with high salaries AND low taxes in gray. The threshold values can be changed by a control. Buttons can change the modes of the filters. Besides “AND” and “OR”, there is a SELF option to switch to a mode where only the effect of this filter is shown, and a NOP option to switch off the effect of this filter. There are a number of other options described in [Fishkin, Stone 1995], including real-valued filters, a magic lens filter showing cities where no values are available, and a callout lens mechanism. Real-valued filters show not only the presence of a feature, but also it’s value by filling the symbol more or less with a color according to the real value. The callout lens allows user to explore “clumbs”

where a number of icons are close to each other or overlap. The callout lens displays the items in form of a list, and therefore allows an easy and detailed inspection. The list includes the icons from the scatterplot. These icons are active in the callout lens, and can, for example, be filtered by over-lapping additional lenses. The right side of Figure 27 shows a transfer of the general lens principle to the visualization of Web search results. The earlier established result set from the WebViz-example functions as the basis. Every document is represented by a rectangle in a scatterplot. The language of the document is shown on the x-axis. The number of the documents in the language category is shown on the y-axis. Two filters in AND-condition are used: one for the relevance allowing only documents with a high relevance score to pass, and a second one for the document type, allowing only documents without framesets to pass. Therefore, only non-frameset documents with a high relevance are marked gray.

Average annual pay, 1991

English French German Italian Spanish LANGUAGE NUMBER

10 FRAMESET

High relevance AND no frameset

Figure 27: Principle of Movable Filters / Magic Lens Filters by [Fishkin, Stone 1995]

Venn diagrams have been used in a number of cases to represent Boolean queries. One recent ex-ample is the usage in the TeSS prototype by [Hertzum, Frøkjær 1996]. A good overview of Venn diagrams can be found in [Jones 1998]. Simple Venn Diagrams are capable of dealing with two or a maximum of three keywords. Figure 28 shows the principle of Venn diagrams for a part of the result set for the WebViz-example. Starting in the upper left corner, the blue circle represents 18 documents retrieved by ‘(visualization OR visualisation) AND (NOT (search OR results))’. The intersection of the two upper circles shows the 8 documents retrieved by ‘(visualization OR visu-alisation) AND search AND (NOT results)’. The intersection of the three circles contains the 32 documents which are retrieved by ‘(visualization OR visualisation) AND search AND results’.

(visualization

Figure 28: Venn diagram for the concepts (visualization OR visualisation), search, results.

[Jones 1998] integrated Venn diagrams in the VQuery interface in a query workspace to support users in a more flexible way when working with this type of visualization. Figure 29 shows an illustration using the WebViz-example. Six keywords are spread over the workspace. Currently the active query, represented by the gray rectangle, includes three of them. The query is ‘(visualization AND search) OR results’. Part of the workspace is a text field, where the system presents an Eng-lish language interpretation of the graphically constructed active query. Besides “AND” and “OR”, the systems support also a NOT operator, but complex queries are impossible to construct.

results

Search for any documents containing either visualization and search; or results Active query

Figure 29: Principle of the Query workspace with Venn Diagrams in the VQuery system by [Jones 1998], [Jones 1998a]

[Spoerri 1993], [Spoerri 1993a] introduced with the InfoCrystal a query-visualization component also derived from Venn diagrams. The InfoCrystal can be used as a visualization tool and as visual query language. Spoerri describes the usage for Boolean or for vectorspace queries, and different modes like simple queries or complex queries using a block building mode. The layout inside an InfoCrystal can be done in rank layout or bull’s-eye layout. Information is coded in shape, prox-imity, rank, orientation, and color or texture. In special cases size, or brightness and saturation coding is used. Figure 30 shows an InfoCrystal for the WebViz-example. It is a simple query in rank layout with color-coding. The number in an icon shows the number of documents satisfying the conditions represented by it. Starting in the upper left corner, the blue circle represents 64 documents retrieved by ‘visualization OR visualisation’. The next blue circle shows one document retrieved by ‘(visualization OR visualisation) AND (NOT (search OR results) OR (www OR internet))’. The rectangle with a blue and a green end stands for 18 documents retrieved by

‘(visu-alization OR visualisation) AND (www OR internet) AND (NOT (search OR results)’. The trian-gle with blue, red, and light green sides stands for 2 documents retrieved by ‘(visualization OR visualisation) AND search AND results AND (NOT (www OR internet)’. The rhombus in the middle stand for 27 documents retrieved by ‘(visualization OR visualisation) AND (www OR internet) AND search AND results)’. Icons, which represent two diagonally opposite concepts are represented twice. When specific relevance weights are assigned to concepts, the bull’s-eye layout can be used instead of rank layout. In this mode, symbols for relationships with a higher relevance score are placed closer to the center of the InfoCrystal.

(visualization OR visualisation) search

Figure 30: Principle of the InfoCrystal by [Spoerri 1993], [Spoerri 1993a]

[Bürdek, Eibl, Krause 1999], [Eibl 1999] re-implemented the InfoCrystal and had a number of problems when using it. Guided by the speculation that the coding of the InfoCrystal is too mani-fold and that the presentation is too complex they developed a new visualization. One of their mo-tivations was the need for reorientation when adding additional keywords in the InfoCrystal. The principle of their solution is shown in Figure 31 using the WebViz-example. The visualization has two basic elements: the entry fields on the left side, and the resulting document sets on the right side. Keywords or field-based restrictions can be entered in the entry fields. Fields can be selected by clicking on the “T” [Eibl 1999]. Keywords under a bracket are “ORed”. The brackets them-selves are “ANDed”. So Figure 31 shows the query ‘(visualization OR visualisation) AND search AND results AND (www OR internet)’. On the right side the resulting document sets are dis-played. On the rightmost side is the result of the complete query, combining all four brackets. The other columns show all possible combinations between two or three Brackets in distinct mode.

[Eibl 1999] reports that six of the eight users interviewed, preferred distinct mode, in which the first set stands for ‘(visualization OR visualisation) AND search AND (NOT (results OR (www OR internet)))’. In non-distinct mode where it would have been ‘(visualization OR visualisation) AND search’. A careful examination of Figure 1 in [Bürdek, Eibl, Krause 1999] and Figures 1c/d in [Eibl 1999] reveals that the distinct mode is used for the columns containing more than one con-cept, but not for the entry brackets where just one concept is shown. Eibl et al. do not mention this point. In Figure 31 the same mechanism is used like by the authors. To be fully consistent in dis-tinct mode the number of documents in the first row should be 1, 3, 1, and 14 instead of 64, 151, 114, and 200. The InfoCrystal shows both numbers for single concepts, distinct and non-distinct.

Besides this minor inconsistency, which could be avoided by showing both numbers, the

“Bracket”-visualization has a number of features important to improve usability. [Eibl 1999a] re-ports for example, that users adopted very fast, instead of interpreting the colors, the feature that

when crossing a result set with the mouse the corresponding query-Brackets are dimmed⁹¹. Like the InfoCrystal, the visualization by [Bürdek, Eibl, Krause 1999], [Eibl 1999] has a number of additional features beyond simple Boolean retrieval. Among them is the support of probabilistic retrieval by using the horizontal position of the Bracket-groups, or the support of vague retrieval.

visualization (58), visualisation(7)

Figure 31: Principle⁹² of the “Bracket”-visualization by [Bürdek, Eibl, Krause 1999], [Eibl 1999]

[Berenci, Carpineto, Giannini 1998] created the VIEWER (VIEws of WEb Results) system where a component also shows graphically the distribution of sub queries. The length of a bar indicates the size of the result set of every sub query. Clicking on the bar brings up the subset in list form in a second window. By carefully analyzing Figure 1 of [Berenci, Carpineto, Giannini 1998] it can be found, that in contrast to the InfoCrystal and the “Bracket”-visualization, the subsets are shown in non-distinct mode. Figure 32 shows the principle using the WebViz-example.

internet 200

Figure 32: Principle of the Bargraph in the VIEWER system by [Berenci, Carpineto, Giannini 1998]

[Cugini, Laskowski, Piatko 1998] used in the NIRVE system a component called Concept Control to allow the user to map keywords to concepts. Later the component has been named Keyword-Concept Matrix [Cugini, Laskowski, Sebrechts 2000]. Besides the possibility to group keywords into comprehensive concepts, the Keyword-Concept Matrix also allows for each of the resulting concepts to assign a “weight” or importance. Each concept has its own color attribute. Figure 33 shows the principle using the WebViz-example. The keywords “visualization” and “visualisation”

are mapped to the concept “visualization”. “Www” and “internet” are mapped to “internet”.

“Search” and “results” are mapped to themselves. The concepts “visualization”, “search”, and

“internet” are important and therefore received a high weight value. “Results” is marked as less important. In later versions of the NIRVE system [Cugini, Laskowski, Sebrechts 2000], the au-thors used an interactive legend instead of the matrix. The concepts are shown in a row with the

91 It seems to be more logic to highlight the brackets. Nevertheless it appeared to have the expected effect.

92 The original uses a black background and is visually optimized by using interface and media design principles not used for the reproduction. In [Eibl 1999] a ”T”-symbol is used for field-selection, in [Bürdek, Eibl, Krause 1999] a triangle is used for the same purpose.

corresponding keywords beneath of them. The mapping could by changed by drag and drop. An extra column is reserved for unused keywords. The lower part of Figure 34 shows the principle of the interactive legend.

Figure 33: Principle of the Keyword-Concept Matrix or Concept Control used in the NIRVE system by [Cugini, Laskowski, Piatko 1998], [Cugini, Laskowski, Sebrechts 2000].

Starting with visualizations of interdocument similarities and document clusters at a later point of the development of the NIRVE system, a so-called Concept Globe [Cugini, Laskowski, Sebrechts 2000] has been added, showing per default no single documents but only document clusters, the concept distribution and average relevance in the cluster, the number of documents in the cluster, and a number of other features. The primary design version was a 3D globe, but the authors also experimented with 2.5D and 2D versions. The definition of a cluster is guided by previous user experiences and is quite simple: all documents that have the same subset of concepts form a clus-ter. The clusters are visualized starting at the North Pole of the globe, or the upper end in the 2D version, starting with the cluster containing all keywords. In the next row are the clusters in which one of the concepts is missing, in the next row two concepts are missing and so forth. At the South Pole, or lower end in the 2D version, would be the cluster of documents where all concepts are missing. So the number of concepts defines the „latitude“ of an icon representing a cluster. In the 3D version the thickness of the box of a cluster represents the number of documents in the cluster.

The height of a rectangle below the cluster icon indicates the same value in the 2D version. Pres-ence or absPres-ence of colored bars indicates the presPres-ence or absPres-ence of concepts. Colored lines be-tween the icons indicate concept differences bebe-tween clusters. Neglecting the length of the bars, indicating the average relevance of a concept for the documents in the cluster, and some other fea-tures not described here, the Concept Globe the presents almost the same information like visual-ized in the InfoCrystal or the “Bracket”-visualization. Figure 34 shows the 2D principle using the result set from the WebViz-example.

VISUALIZATION

Figure 34: Principle of the 2D Global View used in the NIRVE system by [Cugini, Laskowski, Sebrechts 2000]

In the Query Reformulation Workspace by [Anick, Brennan, Flynn et al. 1990] we have seen some basic functionality to support query expansion. Additionally there has been a feature to look for related terms grouped in phrases containing the term, synonyms, conceptually related terms, and compound terms. There are a number of other components supporting query expansion or refine-ment by more or less sophisticated visualizations. In general, these components show relations between keywords or concepts, stored in a thesaurus or computed on the fly by analyzing docu-ment sets. Examples for supporting the query formulation or expansion can be found in [Fowler, Fowler, Wilson 1991], [Fowler, Wilson, Fowler 1992] where the visualization of a network struc-ture is used, or [Veerasamy, Navathe 1995], where additional items from a thesaurus are listed below the entered keywords. Figure 35 shows the principle used by Fowler et al. for the visualiza-tion of a query as a Request Map in the Informavisualiza-tion Navigator. The system uses statistically-based associative structures, PFNETS, and a spring based network display layout-algorithm not only for the visualization of the query, but in addition with fisheye-techniques also for the visualization of the documents and the concepts found in the document base.

search results

www internet

visualization

Figure 35: Principle of the Request Map by [Fowler, Fowler, Wilson 1991], [Fowler, Wilson, Fowler 1992]

Graphically simpler is the approach by [Veerasamy, Navathe 1995] / [Veerasamy, Hudson, Navathe 1995] used in the Tkinq system⁹³. Additional items from a thesaurus are listed below the entered keywords. Drawing them to a positive or negative feedback box causes them to be in-cluded in the query. Positive items are “ORed” with the entered term, negative items are inin-cluded into the query with a NOT operator. The implementation shown in [Veerasamy, Navathe 1995] / [Veerasamy, Hudson, Navathe 1995] has a number of insufficiencies, like the order of the boxes or the not very intuitive feedback about the actual constructed query. Nevertheless, the idea has some interesting aspects.

visualization AND search AND results AND www Query:

Figure 36: Principle of Positive / Negative Feedback by [Veerasamy, Navathe 1995], [Veerasamy, Hudson, Navathe 1995]

An example for the support of a query expansion in the refinement step, which has some similari-ties with the formulation phase, is the Cow9 graphical query refinement by [Bourdoncle 1999]

used by AltaVista⁹⁴. Cow9 had initially been named LiveTopics [Bourdoncle 1997]. Cow9 shows

93 The name is not used by [Veerasamy, Navathe 1995], but in their Figure 1 the main window of the system is named “Tkinq”. In [Veerasamy, Hudson, Navathe 1995], [Veerasamy 1996], [Veerasamy, Belkin 1996] Tkinq can also be seen in the window title or as label of the system-quit button, but the name is also not mentioned in the text. A number other authors like [McCrickard, Kehoe 1997] use this name to reference the system from Veerasamy et al.

94 The author used the feature in the years 1997 and 1998. In 2001 it seems to be discontinued.

a map of expandable topics, automatically constructed from the terms contained in the document set and the query. Yellow bars to the right of each word indicate the probable relevance of that word to the query. Words can be marked as included or excluded. In Figure 37 “visualisation” is

Im Dokument Visualization of search results from the World Wide Web (Seite 67-75)