• Keine Ergebnisse gefunden

5.7 Does it Really Matter which Model?

5.7.2 Experimental Design

The goal of this experiment was to investigate how far the selecting an inadequate model can affect faithfulness of representing the underlying probability distribution. There are other factors that influence the difference between the underlying distribution and that specified by the parameters provided by the subject. These are sampling error (a small number of samples makes the actual distribution not to be precisely same as the parametric that was used to draw samples form), and the error introduced by the subject’s misjudgement of experienced probabilities (the error introduced by the recalling task).

In the experiment I exploited the fact that all models I proposed which are particularly suitable for knowledge elicitation share the same set of questions asked to the expert, and they all are amechanistic. This means, that the questions used to elicit knowledge for the noisy-DeMorgan model would be used to elicit for the noisy-OR+/OR, restricted CAST,

and noisy-OR. Since during the original experiment the subjects were not informed, or did not use knowledge that the relationship is the noisy-DeMorgan model either explicitly or implicitly, I could use the results obtained from that study directly.

Therefore I used the parameters obtained form the subject’s for the noisy-DeMorgan model, for the following models: the noisy-OR+/OR, restricted CAST, noisy-OR. The noisy-average model in the case of binary variables is equivalent to the noisy-OR, so I did not included it. For the noisy-OR+/ORI used probabilities twofold: directly as mechanism probabilities and extracted the mechanism parameters by discounting the leak influence (the formally correct way). I decided to do that because of results of similar study with the noisy-OR reported in Section4.1. where the noisy-OR parametrizations (D´ıez and Henrion) indicated that the formally correct method gives worse results. I planned to see if this holds for the other experiment.

5.7.3 Results

To measure accuracy of the elicitation I used a distance measure between two CPTs For purpose of this study I decided to use average of a sum of Euclidean distances between corresponding distributions in two CPTs: (1) the CPT containing actual distributions the subject experienced, and (2) the distribution specified by the subject using probabilities obtained from him/her after the learning phase. I used these parameters to specify the noisy-OR+/OR, the recursive CAST, the noisy-DeMorgan, and the noisy-OR models. As well, I report distance to the full CPT obtained from the subject directly. Table 7 shows the results. The best score was achieved by specifying the noisy-DeMorgan gate, then the second score was the complete CPT, followed by the noisy-OR+/OR. The worst fit is the noisy-OR model. This should not be surprising as it is the only model in the experiment that does not allow for positive and negative influences, while such setting is present in the data.

These results also indicate that the models including both positive and negative influences are indeed useful and needed. As an alternative measure I used maximal distance between two corresponding parameters in a CPT. This is a very conservative measure that shows the worst case scenario. Table 8 shows the results. One can see that the results obtained using

Table 7: Average Euclidean distance between distributions experienced by subjects and these specified by canonical models with parameters provided by subjects.

Model Noisy-DeMorgan Parameters CPT Parameters

CPT 0.256 0.256

noisy-DeMorgan 0.238 0.230

noisy-OR+/OR (D´ıez) 0.283 0.343

noisy-OR+/OR(Henrion) 0.345 0.376

Restricted CAST 0.368 0.392

noisy-OR 0.611 0.593

this alternative measure are qualitatively similar to the average measure.

I performed a pairwise paired two-sided t-tests to verify if the differences between the CPT, the noisy-DeMorgan, and the noisy-OR+/ORare statistically significant. Assuming p=0.05 they turned to be not statistically significant (with the smallest p = 0.065 for the noisy-OR+/OR and the noisy-DeMorgan).

I decided to repeat experiments using parameters from CPT, rather than these obtained for the DeMorgan. Theoretically, the results should be the same, as the probabilities the subject is asked for the noisy-DeMorgan are just a subset of these asked for the CPT.

Apparently, parameters estimated from probabilities for CPTs were worse for models that include positive and negative influences. It may indicate that focusing expert’s attention on a small number of parameters results in better estimates. It may have important implication in practice: if a knowledge engineer decides to use parametric models instead of already specified CPTs, it may be worth coming back to the expert and asking again for the parameters, but this time having him/her focused on a small set of relevant parameters.

Table 8: Average maximal distance between distributions experienced by subjects and these specified by canonical models with parameters provided by subjects.

Model Noisy-DeMorgan Parameters CPT Parameters

CPT 0.528 0.528

noisy-DeMorgan 0.528 0.529

noisy-OR+/OR (D´ıez) 0.516 0.610

noisy-OR+/OR(Henrion) 0.590 0.649

Restricted CAST 0.726 0.711

noisy-OR 0.920 0.901

5.8 SUMMARY

In this section, I formally introduced a new class of models for local probability distributions that is called probabilistic independence of causal influences (PICI). The new class is an extension of the widely accepted concept of independence of causal influences. The basic idea is to relax the assumption that the combination function should be deterministic. I believe that such an assumption is not necessary either for clarity of the models and their parameters, nor for other aspects such as convenient decompositions of the combination function that can be exploited by inference algorithms.

I presented three conceptually distinct models for local probability distributions that address different limitations of existing models based on the ICI. These models have clear parametrizations that facilitate their use by human experts. The proposed models can be directly exploited by inference algorithms due to fact that they can be explicitly represented by means of a BN, and their combination function can be decomposed into a chain of binary relationships. This property has been recognized to provide significant inference speed-ups for the ICI models [21]. Finally, because they can be represented in form of hidden variables, their parameters can be learned using the EM algorithm. To support this claim, I presented

a series of empirical experiments.

I believe that the concept of PICI may lead to new models not described here. One remark I shall make here: it is important that new models should be explicitly expressible in terms of a BN. If a model does not allow for compact representation and needs to be specified as a CPT for inference purposes, it undermines a significant benefit of models for local probability distributions – a way to avoid using large conditional probability tables.