Proximity plot - User ’ s Guide

Cluster analysis and multidimensional scaling are both data reduction techniques and may not accurately represent the true proximity of keywords or cases to each other. In a dendrogram, while keywords that co-occurs or cases that are similar tends to appear near each other, one cannot really look at the sequence of keywords as a linear representation of those distances. One has to remember that a dendrogram only specifies the temporal order of the branching sequence. Consequently, any cluster can be rotated around each internal branch on the tree without in any way affecting the meaning of the dendrogram. The best analogy here is to think of a Calder mobile. Different photos of such a mobile will yield different distances between hanging objects. While multidimensional scaling is a more accurate representation of the distance between objects, the fact that it attempts to represent the various points into a two- or three-dimensional space may result in distortion. As a consequence, some items that tend to appear together or be very similar may still be plotted far from each other.

The proximity plot is the most accurate way to graphically represent the distance between objects by displaying the measured from one or several target objects to all other objects. It is not a data reduction technique but a visualization tool to help one extracts information from the huge amount of data stored in the distance matrix at the origin of the dendrogram and the multidimensional scaling plots. In this plot, all measured distances are represented by the distance from the left edge of the plot. The closer an object is to the selected one, the closer it will be to the left.

To select a keyword or a case that will be used as the point of reference, one can choose from the KEYWORD or CASE drop down check list located at the top of the page. One can also freely browse through different keywords or cases by double-clicking its bar in the Proximity Plot. The co-occurrence or similarity to more than one target item may be displayed in a single chart allowing easy comparisons. When several target items are selected, the proximity plot may consist of bars clustered side by side (clustered bars), or stacked, representing either the total amount (stacked bars), or the relative distribution of scores

(100 percent stacked bars). When two target items are selected, it is also possible to display the bars on both sides of a central axis like the sample chart above (mirrored bars).

By default, the chart displays the proximity of the target items to a maximum of 30 related items. Clicking the button located in the upper left-hand corner of the chart, displays a dialog box that allows one to either manually choose items to be plotted, or to automatically select a specific number of items.

When looking at keyword co-occurrences, selecting a bar enables the button. Clicking this button retrieves every document or text segment containing both keywords, allowing one to further explore the factors that may explain this co-occurrence. When examining the similarity of documents rather than keywords, clicking this button retrieves both documents and displays them side by side in a dialog box.

Right-clicking any existing bar displays a menu that allows one to remove the selected item, move it to the list of target items either by adding it to the existing bars or replacing one of them. One may also retrieve documents or text segment using this popup menu.

The Table page allows one to examine in more detail the numerical values behind the computation of those plots. When the distance measure is based on co-occurrences, the table provides detailed information, such as the number of times a given keyword co-occurs with another one (CO-OCCURS) and the number of times it appears in the absence of this selected keyword (DO NOT). Such a table also includes the number of time the selected keyword appears in the absence of the given keyword (IS ABSENT). In the example below (computed using the paragraph as the frequency criteria), we can see on the highlighted line that the word MILITARY co-occurs 107 times with IRAQ, but this word is encountered in 285 paragraphs without the word IRAQ, while IRAQ is found in 1,182 paragraphs in the absence of MILITARY. The Jaccard coefficient of 0.109 indicates that of all paragraphs containing either one of these words, 10.9 percent contains both words. Note, however, that not all proximity measures can be interpreted as easily. To facilitate the interpretation of this table, the status bar provides a textual interpretation of some of the statistics.

The following list provides a brief description of the buttons found on these two pages Proximity plot controls

By default items in the proximity plot are sorted in descending order of proximity scores.

Clicking this button sorts items in alphabetical order.

This button is used to create a copy of the chart to the clipboard. When this button is clicked, a pop-up menu appears, allowing one to select whether the chart should be copied as a bitmap or as a metafile.

This button allows editing of various features of the proximity plot, such as the appearance of value labels and bars, the chart and axis titles, and the location of the legend.

Proximity plot and proximity table controls

Press this button to append a copy of the chart or the proximity table in the Report Manager.

A descriptive title will be provided automatically. To edit this title or to enter a new one, hold down the SHIFT keyboard key while clicking this button (for more information on the Report Manager, see page 191).

When the proximity plot is displayed, this button allows storing the chart on disk in one of the supported graphic file formats. When the proximity table is shown, the table can be saved to disk in an Excel, text delimited, XML, HTML, SPSS, or Stata file.

Clicking this button allows you to print a copy of the displayed chart or table.

Im Dokument User ’ s Guide (Seite 80-83)