• Keine Ergebnisse gefunden

Part II Towards a Flexible Bayesian Logic of Testing Descriptive Rules Testing Descriptive Rules

5 Towards Knowledge-Based – but Normative – Bayesian Modelling Bayesian Modelling

5.7 General Discussion and Summary of Chapter 5

In this section, it is argued that the results of Chapter 5 favour the advocated Bayesian approach over all other theories of the Wason Selection Task. The implications for the debate on the raven paradox have ben considered in Section 5.5.

Support for the Bayesian Approach. In Chapter 5 it has been advocated that the Bayesian model with fixed marginals, which was first fully elaborated by von Sydow (2002, cf. Hattori, 2002, Oaksford and Chater, 2003), is only a valid model if the assumed constancy conditions are actually given in the situation. Since the model is not advocated universally here, but only in this knowledge-based sense, the normative and descriptive predictions of the model do not follow if its preconditions are not empirically and subjectively given. It has been shown that previous results were negative for a universal Bayesian account (cf. pp. 87 f.) and even all results which seem to show p versus non-p effects were shown to be incoherent or criticisable (pp. 89 f.). In order to exclude the problems of previous experiments and in order to induce a model with fixed marginals, a many card selection task (MST) has been proposed. If the knowledge-based account is correct, such a task should make it possible to obtain clear-cut frequency effects as predicted according to the Sydow model (cf. v. Sydow, 2002).

In this chapter, the results of two new experiments have been reported; both corroborate the predictions of the Sydow model (von Sydow, 2002; Oaksford &

Chater, 2003). Experiment 1a and Experiment 1b for the first time showed the predicted p versus non-p and q versus non-q frequency effects using ravens material known from the raven paradox debate (see pp. 39 f.). This also corroborates the psychological adequacy of the standard Bayesian resolution of the paradox (pp. 104 f.). Moreover, Experiment 2a and 2b showed the predicted frequency effects for the first time for the original letter-number material of Wason (1966), without using the problematic sequential design used by Oaksford and Wakefield (2003; cf. p.

89; see also v. Sydow, 2002; cf. pp. 94 f.).

The found non-p effects are at odds with the older Bayesian model of Oaksford and Chater (1994, 1998b) and they seem to be problematic for the accounts which postulate that the truth or falsity of a conditional ‘if p then q’ generally reflects only the likelihood of P(q | p) (Evans, Handley & Over, 2003; Over & Evans, 2003; Over, 2004; Evans, Over, Handley 2005; but confer earlier, e.g., Evans & Over, 1996, 360;

Green, Over and Pyne, 1997, 219; Green and Over, 2000; see also Evans, 1972; cf.

pp. 83 f.).

The results refute Oaksford and Wakefield’s suggestion (2003) that the predominantly negative results in former selection tasks (e.g., Oberauer et al. 1999, Oaksford et al., 1999) are to be explained by not having used a ‘natural’ successive test. As argued before their own successive tasks (Oaksford et al. 1997, Oaksford &

Wakefield, 2003) cannot count as a proper test of their theory (cf. the Discussion of Experiment 2). Moreover, Oberauer, Weidenfeld and Hörnig (2004, 522, 527) showed that a successive task is not sufficient to achieve the selection effects predicted by a model with fixed marginals. In contrast, the two non-successive selection tasks used and reported in this chapter clearly support the predictions of the Sydow model. Hence, in order to achieve positive results successive tests are neither sufficient, as shown by Oberauer et al. (2004), nor necessary, as shown here. In our experiments, the task used was constructed upon a knowledge-based account by inducing the assumed preconditions of the model with fixed marginals explicitly and by ruling out a number of problems within previous experiments6.

Other Theories of the WST. The results of the two experiments reported in this chapter are inconsistent with mental logic theory of the WST and mental model theory. Although domain-specific theories have been mainly concerned with what I call ‘prescriptive conditionals’, the results are also inconsistent with some claims of these theories concerning descriptive conditionals. The General Discussion of Part II provides a more detailed discussion, also including relevance theory and matching bias heuristics.

Mental logic theory (ML theory, e. g., Braine, 1978; Rips, 1994; O’Brien, 1995;

cf. pp. 10 f, 172 f.) has not predicted frequency effects and cannot explain the pattern found here.

The difference in difficulty to reason according to a modus ponens and a modus tollens can be explained using ML theory by postulating an incomplete set of rules not directly including the modus tollens. Also the difficulties achieving correct selections in standard WST (the first anomaly of the WST, cf. p. 8) and the difficulty achieving better results in ‘therapy’ experiments can be explained by postulating a complex reasoning process in order to derive the ‘correct’ falsificationist solution (without the direct use of a modus ponens). However, frequency effects should not affect the

existence or non-existence of the postulated rules of a mental logic. Therefore, ML theory does not provide any positive explanation of the results of Experiment 1 and 2.

The mental model theory of the WST (MM theory, Johnson-Laird & Byrne, 1991, 2002) has, on the one hand, also generally maintained a falsificationist norm of testing conditionals and, on the other, has explained the deviations found in the WST by a set of incomplete representations (pp. 11 f. for details). “In short, the model theory predicts that people will select the card falsifying the consequent whenever the models are fleshed out with explicit representations of that card.” (Johnson-Laird &

Byrne, 1991, p. 80) MM theory has not advocated any frequency based account for the WST and hence all frequency effects obtained here seem to be problematic for MM theory.34

Nonetheless, one may try to defend MM theory, since the probability of finding a counterexample in a situation has been postulated to affect the fleshing out of mental models (cf. Green & Larking, 1995; Love & Kessler, 1995; Green 1998).

However, this integration of frequency effects into MM theory is problematic because there have been cases in which the obvious availability of counterexamples did not lead to ‘correct’ p and non-q selections. For example, “If I eat haddock then I drink gin“ did not lead to facilitation effects (Manktelow & Evans, 1979), although counterexamples should be available. (In contrast, a Bayesian can account for the p and q selections by the plausible assumption that P(eat haddock) < 0.5 and P(drink gin) < 0.5.)

Moreover, even if the probability of thinking of a counterexample, P(counterexample), always led to ‘a fleshing out’ of the mental model, causing p and non-q selections, this would not imply any account of how conditions which vary exclusively in regard of P(p) and P(q) should be linked to P(counterexample) and the resulting fleshing out of a mental model. Hence, mental model theory cannot explain the results.

Additionally, even if we also tried to fill this gap and provided an account to link probabilities and the fleshing out of mental models, this, in my view, would not yield positive results for MM theory. It is not clear whether a high or a low frequency

34 This is the case, although Johnson-Laird, Legrenzi, Girotto, Legrenzi & Caverni (1999) proposed a mental model theory of probabilities. I agree with Oaksford, Chater, & Larkin (2000, 898) that most aspects of such a theory do not intrinsically rely on the notion of a mental model. Moreover, Johnson-Laird et al.’s (1999) probabilistic extension of MM theory has not been applied to the WST. Furthermore, this theory, in my view, cannot account for p versus non-p effects anyway.

condition should enhance the probabilities of envisaging counterexamples. The most plausible and direct proposal would be to assume that the higher the number of visible and salient non-p and non-q cards in an MST, the higher should be the probability that the incomplete model (p & q) is fleshed out. But this straight proposal would imply that a low probability condition, with many non-p and non-q cards, should lead to more fleshed-out models than to a high probability condition, which is contrary to our results.

Furthermore, even if we assumed a more complex relationship between frequency information and the fleshing out of a mental model, this, in my view, would not explain the results. P(counterexample) may rationally be estimated using the independence model, MI, also used in the Bayesian model (in the dependence model there are no counterexamples). Hence, p and non-p cases would have to be combined randomly with q and non-q cases. However, in this case with the parameters used in Experiment 1 and 2 an equal number of counterexamples, P(p & non-q | MI), results in both the high and the low frequency conditions, and this would not allow for any frequency effects from the viewpoint of MM theory. If P(p & non-q | MI) would alternatively be relativised by the number of those positive instances, which are represented in a mental model of a conditional, P(p & q), this would again yield predictions of an increase of non-q selections in the low frequency condition, which would be reversed to our findings (cf. also the General Discussion of Part II).

In any case, such complex calculation would be problematic from a MM viewpoint, since such calculations presupposes a much more complex representation to allow for the construction of a relatively simple model only made up of four instances. Any such account, if viable at all, would clearly go far beyond MM theory.

Finally, MM theory cannot explain the simultaneous increase of non-p selections which accures together with the increase of non-q selections (cf. General Discussion of Part II).

Therefore, current MM theory clearly cannot account for the results of Experiment 1 and Experiment 2.

The two main domain-specific theories, social contract theory and pragmatic reasoning schema theory, mainly made proposals concerned with social or deontic WSTs, and only took a minor interest in descriptive WSTs. Nonetheless, the results of this chapter also show that some of their claims and assumptions made about descriptive WSTs need to be given up.

As Social contract theory has been formulated, it has been argued that no thematic rule that was not a social contract had ever produced robust content effects (Cosmides, 1989, pp. 200; Cosmides & Tooby, 1992, p. 183; cf. later, Fiddick, Cosmides, & Tooby, 2000; Fiddick, 2004). In the current Experiment 1 and 2 clearly systematic ‘content effects’ have been achieved for rules which are not social contracts.

Moreover, Cosmides and colleagues have used descriptive WSTs as a yardstick against which to measure their findings with social contracts. They assumed and found a standard pattern of diffuse selections, generally predominated by p and q selections, in descriptive tasks. The findings of Experiment 1 and 2 show that this pattern is not an adequate general yardstick, since the selections in descriptive WSTs are shown to be dependent on P(p) and P(q).

Although pragmatic reasoning schema theory (Cheng & Holyoak, 1985;

Holyoak & Cheng, 1995a, b, cf. pp. 13 f., 176 f., 270 f.) was almost completely concerned with the permission and the obligation schema, in principle it also allowed for schemata in the field of descriptive WSTs (see particularly Cheng & Holyoak, 1989, 306). However, Cheng and Holyoak have not developed a detailed positive theory on descriptive WSTs. In their writings on the WST they only briefly mentioned two schemas of causality and covariance which are claimed to lead generally to p and q selection patterns. In my view, this was nothing but a rediscription of the data which were at hand at that time. But this description is inconsistent with the frequency effects observed in Experiment 1 and 2.

Moreover, Cheng and Holyoak (1985, 396) argued that an “arbitrary rule, being unrelated to typical life experiences, will not reliably evoke any reasoning schema”. Although the predicted frequency effects had a lower effect size for the abstract letter-number hypothesis than for the realistic ravens hypothesis, significant effects were obtained for both experiments (but see, pp. 176 f.).

Finally, also other non-Bayesian mechanisms or approaches like matching bias heuristics or relevance theory cannot explain the findings of Experiment 1 and 2 (cf. General Discussion of Part II).