Rules: Output and Input Space

4.3 Rule Explorer

4.3.1 Rules: Output and Input Space

The view frames of the Rule Explorer enable us to examine the output and input spaces of the classification problem. We start explaining the output views, as this is the first step when examining an error. Subsequently, the views related to the input space will be described.

Output Space

The main view of the Rule Explorer starts is shown in Figure 4.7. The upper tree depicts the hierarchy of labelset 1; the bottom hierarchy, that of labelset 2. The tree layout is automatically calculated by the library org.abego.treelayout¹⁴. However, the nodes can be selected by clicking on them and dragging them to other positions. Thus, if a connection between two far-apart positioned nodes is interesting, this distance can be diminished, and a new view with only the interesting nodes can be produced. Moreover, the root node of labelset 2 is renamed “root1” in order to distinguish it from the “root”

node of labelset 1. In the table below, the connections between the labels and their values for the selected IM are shown. There is also the possibility to search for nodes in the text field above. Found nodes are marked in red to ensure ease of identification across the labelsets.

In Figure 4.8, we see how the “Enable Connections” entry from the “Option” menu operates. Clicking on a label shows all the rules associated with this node on the lower

14https://github.com/abego/treelayout, accessed May 2015.

Figure 4.8: Rule Explorer Main Window: IM Rules

panel, ordered by their IM value (here, JacDif). The first two rules have a value above the threshold 0.05, so they are displayed – the stronger with a thicker line, the weaker with a thinner line. Clicking on one of these edges opens a new window displaying the IM values (selected IM, support, expectation and absolute support) as well as certain IM values from the parents, so that the resulting Dif IM value will be more readily understood.

These tables can be saved for later comparison to tables belonging to other relations.

This enables a direct comparison of the measure values involved and therefore provides an indispensable way of seeing how labels are interconnected. Furthermore, a comparison between labels with similar values can be undertaken, and the influence of the parents and the expectations can be assessed.

Input Space

The prototypes created by the ML-(H)ARAM classifiers are IF-THEN-rules in the input (features) space, and the prototypes link each input cluster to a specific multi-label.

In Figure 4.9, selecting “Option” from the menu, then the entry “Enable Rules” and finally clicking on a label in the tree window changes the lower table to display a view of the data from the testing arff-file: each row is a sample and the columns are ordered with the labels first, and then the attributes. It also opens a new window showing the specific rules in which the selected label is involved. Rules can also be searched by their number.

Clicking on a row in the lower table classifies the sample and the results are shown in a new window (Figure 4.10a). Here, a table at the top displays the labels and rankings set by the classifier. The table in the middle shows the rule number, whether it was selected

Figure 4.9: Rule Explorer Main Window: IF-THEN Rules

(a) (b)

Figure 4.10: Rule Explorer: Classification rule activations and instances belonging to selected rule

for ranking (those marked with an asterisk) and how many true and false positives and false negatives were produced. Also displayed are which labels belong to the rules and whether they are true or false for the sample, the number of instances involved in the creation of the rule and the result of|A∧W^a_k|(caption: Act. Total) and|W^a_k|(caption:

HP Volume¹⁵) – as it was presented in Section 2.1.3.5. We adapted the Rule Explorer to the text-mining setup; as a result, many features are available for this specific task (e.g.

the text of the classified sample is shown below). Clicking on a given rule additionally opens the window displayed in Figure 4.10b. In the table on the left, the features actually calculated with the sample are shown. Also depicted are the values of the differences between the lower bound and the sample and the upper bound and the sample, but also the values of the sample and the lower and upper bound. In this way, deviations between samples and prototypes and their magnitudes can easily be detected.

The table on the left shows which instances were involved in the creation of the rule;

clicking on them displays the text for the selected training instance in the text field below. If a feature row is clicked, this word is highlighted red in the text. This facilitates the location of appearances the word in the text as well as providing an impression of how much an individual instance was responsible for a given feature.

15This is a reference to the hyperbox and the norm, since we used the Manhattan the circumference instead of Volume would be more appropriated. We used in the caption the more general term since the norm can be later changed as Volume is, in our opinion, a more intuitive concept.

Im Dokument Multi-label Classification with Multiple Class Ontologies (Seite 108-112)