• Keine Ergebnisse gefunden

4 Causal Bayes Nets Theory 33

4.2 Causal Learning with Bayes Nets

4.2.1 Causal Learning through Observations

Two kinds of learning algorithms have been developed in the context of causal Bayes nets: bottom-up constraint-based methods and top-down Bayesian methods.

Constraint-based methods (Pearl, 2000; Scheines, Spirtes, Glymour, & Meek, 1994;

Spirtes et al., 1993) try to induce causal models from the unconditional and conditional dependence and independence relations of the data. These algorithms can infer causal

structures from observational data and can also integrate interventional data. Applied to human causal induction, these bottom-up approaches suppose that people examine the probabilistic dependency relations of the available data and use them to infer the underlying model. Constraint-based methods start with analyzing which probabilistic dependency relations hold between the observed variables (e.g., whether events A and B are (un)conditional (in)dependent). Algorithms such as TETRAD (Scheines et al., 1994) apply standard significance tests to determine whether a dependency relation holds. In a step-by-step procedure the algorithms then construct causal models consistent with the discovered unconditional and conditional dependence and independence relations (see Spirtes et al., 1993, for details). Thus, contingent on which dependency relations are satisfied by the data we can identify the underlying causal structure.

An alternative approach to structure induction is provided by top-down Bayesian methods (e.g., Heckerman, Meek, & Cooper, 1999; Steyvers et al., 2003). These approaches assume that learners start from a set of hypotheses about candidate causal models and update their hypotheses in accordance with the available data. Briefly, Bayesian learning procedures start by assigning a prior probability to each graph; either to the complete set of possible graphs or to a restricted set. The prior probabilities assigned to the causal models can be uniform, but it is also possible to incorporate prior knowledge by giving some models a higher prior probability than others. Together with assumptions about the probability functions relating the variables, the likelihood of a particular data pattern under each of the graphs can be computed. For example, a data pattern such as x.y.z (i.e., all events are present) is more likely if the three variables X, Y, and Z form a common-cause model than when the events form a common effect model (because here the cause events occur independently of each other). By using Bayes theorem it is then possible to compute the posterior probability distribution over the considered causal graphs conditional on the available data. The graph with the highest posterior probability is then chosen as the one most likely to have generated the data.

Both constraint-based and Bayesian algorithms provide powerful computational methods to induce causal structure from statistical data, capitalizing on the fact that different causal structures entail different dependency relations. However, some causal models are not only observationally equivalent but also share the same set of dependency relations. For example, the finding that Y and Z are independent conditional on X is not only consistent with a common-cause model Y←X→Z but also with the causal chains Y→X→Z and Y←X←Z. Thus, from observational data alone these

CAUSAL BAYES NETS THEORY 40 methods can only reduce the space of possible graphs to a subset of models which share

the same set of probabilistic dependency relations. Such models are referred to as being Markov equivalent (cf. Spirtes et al., 1993). Figure 5 gives an example of the possible models we can construct from three variables X, Y, and Z. Shaded areas group models according to their topology, dashed lines indicate Markov equivalent models (cf.

Steyvers et al., 2003).

Figure 5. All possible networks with three variables. Shaded areas group models according to their topology, dashed lines indicate Markov equivalent models (cf. Steyvers et al., 2003). Cyclic graphs (bottom left) cannot be analyzed by standard causal Bayes nets analysis. See text for details.

The scope of these learning algorithms is beyond that of psychological models because they make unrealistic assumptions about the necessary information processing capacities, especially in situations with many variables. For example, constraint-based methods require learners to conduct a large number of comparisons to test which dependency relations exist in the investigated domain. Similarly, Bayesian methods have the problem that the number of possible causal models grows very fast with the number of variables (i.e., at least exponentially). Therefore, even in computer science heuristics are used which constrain the space of candidate models (cf. Heckerman et al., 1999). However, in less complex situations learners might use the analysis of dependency relations (cf. Gopnik et al., 2004). For example, learners could start from a restricted set of candidate models and analyze the available data specifically with respect to the conditional dependency relations implied by their hypothesized models

Thus far, only few studies have investigated whether learners have the capacity to induce causal structure from dependency relations alone. Gopnik and colleagues have argued that children as young as 30 months use information about dependencies to infer

causal relations in accordance with constraint-based methods (see Gopnik et al., 2004, for an overview). However, research with adult reasoners suggests that this competency strongly depends on the complexity of the learning task (e.g., the number of variables, deterministic or probabilistic relations). For example, Lagnado and Sloman (2004) found that neither learners’ probability judgments nor their model choices matched the predictions of constraint-based methods. Similarly, the experiments of Steyvers and colleagues (2003), who proposed a psychologically more plausible model of Bayesian inference, show that only few learners were able to identify the correct model from observational data alone. However, Hagmayer and Waldmann (2000) found an interesting dissociation between explicit and implicit sensitivity to the structural implications of different causal models (e.g., a common-cause vs. a common-effect model). Whereas learners’ explicit probability judgments showed only a limited understanding of the structural implications of the different models, participants performed better in an implicit task requesting them to predict patterns of events they expected to see.

In a recent article, Lagnado and colleagues concluded that there is only little evidence that learners can uncover complex causal models from covariational data alone (Lagnado et al., in press). They point out that there are a number of other cues to causality which can be used to infer causal structure from observational data. For example, temporal information and prior knowledge can assist structure learning by providing additional cues or constraining the set of candidate models. Another effective learning tool to discover a causal system’s structure is through active manipulation of the causal model’s variables, which is discussed next.