• Keine Ergebnisse gefunden

3.2 Knowledge Extraction with Association Rules

4.1.3 Strategies

With a threshold defined in this way, we select only those rules that have IM val-ues above TV as good enough for application in association analysis tasks, e.g. for improvement of predictions.

4.1.3 Strategies

We divide the many possible strategies for improving the labels into two overarching approaches: using the ARs (IUAR hereafter), and using only the hierarchy (IUH here-after). When using ARs, there are two main issues5: a) how to employ the rankings produced by the MLC predictor (IRP), and b) how to select the ARs to use (IAE).

The former issue will be handled in three different ways: α) the ranking will be not considered, β) some score was assigned to the label and will be considered (ranking is irrelevant, but there is an ordering) and γ) the ranking of the rule’s antecedent and consequent will obey a certain relation.

The IAE issue can be reduced to the problem of finding the right threshold for the IM6. We first engaged in this problem in the study [BBS10], in which an automatic method selected the rules based on confidence to create a hierarchy. In this thesis, we use both a similar method and a new method (both described in Section 4.1.2).

The strategies employed here that use only the hierarchy to improve the multi-labels (HZ) are based on the assumption that the lower nodes of the hierarchy have been

5The selection of the right IM for the given problem is also an issue. In the experiments section, the number of selected IMs is chosen to be relatively large to diminish preconceptions of the problem.

6The selection of ARs can be made more contextualized by using more available information, but we preferred a more automatic and general method.

assigned a low score and might not pass the threshold. The strategies thus try to find the node most likely to be set from among the children of a parent.

Two of the greatest challenges in the improvement of multi-label predictions are de-termining which labels are trustworthy and whether the rule applies to the case at hand.

With regard to the first point, a high prediction score by the classifier may testify for use.

The second issue is related to the application of an appropriate IM for rule extraction.

We are confident that our approach consider both aspects.

We implemented strategies that will target multiple issues above discussed but are primarily orthogonal, i.e. they can be applied consecutively to further improve the result.

An additional advantage of examining these different strategies is that we can identify which strategy performs best in which scenario. The selected strategies listed below implement solutions for the issues discussed above (basic strategy issues in parentheses);

in particular, labels used as antecedents must be predicted for the rule to be applied.

HB (IUAR): For every antecedent label set a given label (taken as consequent) if the antecedent is set and the rule IM value is greater than 0.2.

HX (IUAR,IRP): For every antecedent label find all respective rules that surpass the threshold α. If there is a score7 greater than zero predicted for the consequent apply the rule. α was varied as follows: HXa: α=0.2, HXb: α=0.4 and HXc: α=0.7.

HC (IUAR,IRP,IAE): The PlAR strategy for rule threshold selection presented in Sec-tion 4.1.2 is used to extract the rule set. For every rule in this set, the following condition is verified: the score of the rule consequent label is divided by the score of the rule antecedent label; if the result is greater than 0.3 (HCa) or 0.5 (HCb), the consequent is set.

HE (IUAR,IAE):

a) Using the threshold method TPS from Section 4.1.2, extract rules and apply all of them.

b) (IRP) Also using the threshold method TPS, extract a rule set. Apply the rule if the score of the consequent divided by the score of the antecedent is greater than 0.2.

HZ (IUH,IRP):

a) For a label whose parent is set and other children of this parent are not set, the label will be set if the ranking of this node divided by the maximum over the scores of this sample is less than β=0.5.

b) (IAE) Given a consequent label, set it: if the condition HZa is true; or if the parent is set and the score of this node (label) divided by the score of the parent is less thanα=0.5; or if from the rule set with this label as consequent chose the rule with maximum value w.r.t. the chosen IM and set the conse-quent if the value is greater θ=0.7. Furthermore, a strategy to consolidate

7The score of a label is the value a classifier assigns to a label.

Table 4.1: Strategies and Issues Addressed Strategy IUAR IUH IRP IAE

HB Y N N N

HXa,HXb,HXc Y N Y N

HCa,HCb Y N Y Y

HEa,HEc Y N N Y

HEb Y N Y Y

HZa N Y Y N

HZb,HZc N Y Y Y

labels is applied: search for the most similar unique label that appeared in the training set (minimize the difference between predicted and unique training labels), and then set labels that appeared in the unique label and have a score greater than 0.8 times the score of the lowest scored predicted label.

c) (IAE) Same as HZb but the thresholdα is set to 0.8.

The first strategies (HB and HX) are basic operations. The idea is to select the rule if its IM value surpasses a predefined threshold. The difference between them is that in HX, the consequent must have a score greater 0. This will work for classifiers like ML-ARAM that only assign a score to high-ranked labels, and only between them selects the labels to be set. SVM classifiers, when ranking activated, assign a probability to every label, applying of HX strategy will lead to the same results as HB. We also use indexes to indicate a different parameter setting for each strategy; in HX, there are three different parameter settings.

HC and HE are similar to [PF08] and [CHDH13]; however, we do not employ the Support-Confidence AR framework in two fundamental ways. We use different IMs that are more suitable for rare AR extraction, and we apply our rule extraction methods to build a set of suitable rules. The main difference between these two strategies is the utilization of a different method for the extraction of AR rules. The PlAR strategy to determine the best threshold is less restrictive than that of TPS; HC therefore requires the consequent to receive a score as well as a higher quotient between the scores of consequent and antecedent.

Finally, HZ explores various sub-strategies to set the threshold depending on the hier-archy and the available training labels. As discussed in Section 2.3.2 many approaches in MLC use the constraints extracted from the training data to build the classifiers.

The HZ strategy takes the prediction of the classifiers and then applies the constraints implemented in the sub-strategies under consideration of the training data.

Table 4.1 summarizes the strategies and issues addressed. The strategies are designed to cover most of the issues, and some strategies are modified so that another issue is also implied in the solution.

Among the aspects to consider for the strategies, using the ranking to weight the decision is a particularly promising feature for a strategy, since the classifier and the

AR rules can confirm that this might be a good decision. Additionally, strategies that consider the quotient between the consequent and the antecedent have the advantage that the classifier has already provided a weight for each label; labels with similar weights should be set if there is already a rule. However, for labels that were given very different scores, albeit in an extracted rule, it might be the case that with these attributes, the rule should not be applied.

The parameters are fixed for the experiments, but in principle they can be set by exhaustive methods, as in [GCH10]. We decided against this approach because the data we want to examine is large, and such a parameter search would only provide minor improvements.

As emphasized above, our approach was designed to work on classification problems with two different multi-label sets coming from two ontologies, but it can still be applied to problems with only one class taxonomy for purposes of comparison to other methods, as shown in the first experimental Section.