• Keine Ergebnisse gefunden

Experiment 2a, b – Letter-Number MST The Letter-Number WST The Letter-Number WST

Part II Towards a Flexible Bayesian Logic of Testing Descriptive Rules Testing Descriptive Rules

5 Towards Knowledge-Based – but Normative – Bayesian Modelling Bayesian Modelling

5.6 Experiment 2a, b – Letter-Number MST The Letter-Number WST The Letter-Number WST

The original WST of Wason (1966), with the hypothesis “if a card has an ‘A’ on one side, then it has a ‘2’ on the other side”, was the paradigmatic case of the research on the WST. The first anomaly for a falsificationist account (see pp. 8 f.) became

apparent exactly for this kind of letter-number hypotheses. Even the first experimements of Wason and collegues showed a predominance of confirmative ‘A’

(p) and ‘2’ (q) selections (Johnson-Laird & Wason, 1970).

It would be particularly convincing for the alternative Bayesian approach to obtain p versus non-p and q versus non-q frequency effects in the Sydow model with this original letter-number material (see also the results of v. Sydow, 2002; pp. 94 f.).

But as the results with other material were rather negative (pp. 70 f.) or ambivalent (pp. 83 f.), the results specifically with letter-number material have not been much better (see particularly Oberauer, Wilhelm, and Diaz, 1999; cf. Feeney &

Handley, 2000, Oaksford, 2002, Oaksford & Wakefield, 2003, Oberauer, Weidenfeld

& Hörnig, 2004):

Oberauer, Wilhelm, and Diaz (1999) in two experiments used letter-number material when systematically testing frequency effects, but their results were clearly negative (Exp. 2, 3).

Oaksford and Wakefield (2003) suggest that a reason why Oberauer et al. (1999) did not find the predicted probability effects was that they did not use a natural sampling process, in which each data point is learned sequentially, one at a time (cf.

Gigerenzer & Hoffrage, 1995). This might also explain the rather negative results of Oaksford, Chater, and Grainger (1999; see p. 86). However, Oaksford and Wakefield (2003) conceded that their earlier sequential test using a RAST (Oaksford, Chater, Grainger, & Larkin, 1997) could not count as a proper test of their theory of the WST, since cards were successively turned over (cf. Oaksford and Chater, 1998b; Klauer, 1999). In a new experiment, Oaksford and Wakefield used the same number-letter material as Oberauer et al. (1999) but employed a sequential natural sampling process without allowing participants to turn over any cards. However, participants could select not only one cards of a logical category (like in a WST) but many. Oaksford and Wakefield at least seemingly obtained p versus non-p and q versus non-q frequency effects. They concluded “that when natural probability manipulations are used people’s data selection behaviour is rational“ (p. 143).

However, Oberauer, Weidenfeld and Hörnig (2004, pp. 522, 527) objected, in my view correctly, that the manipulation used by Oaksford and Wakefield confounded the frequency manipulation with the number of opportunities to turn over cards from a particular logical category. Oaksford and Wakefield divided the number of selections by the numbers of opportunities to select a card. But based on such an analysis, non-p

selections may appear to predominate the selections of the sequence “p, p, p, non-p, p, p, p, non-p” even if there were two p selections and only one non-p selection (cf.

pp. 89 f.). Actually, if one reanalyses Oaksford and Wakefield’s data without dividing the obtained selections by the selection opportunities, there is a decrease and not an increase in non-p or non-q selections in the high probability condition. Hence, I think one has to concede that their results do not provide corroboration of their Bayesian model of the WST.

Oberauer, Weidenfeld and Hörnig (2004) used a sequential learning phase, as required by Oaksford and Wakefield (2003), but tested frequency effects without confounding them with selection opportunities in the test phase. Their results were negative, even with significant results in the reversed direction. Hence, Oaksford and Wakefield (2003) seem to have been wrong; sequential sampling is clearly not sufficient to obtain positive Bayesian results.

The knowledge-based Bayesian account, advocated here, regards sequential sampling as neither necessary nor sufficient for obtaining positive results, although a clear frequency format may well be advantageous. Here it is argued that the preconditions of the Sydow model cannot simply be assumed as general features of testing conditional hypotheses, but that they need to be introduced actively by using for instance MSTs, also ruling out some other possible misunderstandings (cf. pp.

83 f.). Hence, from the knowledge-based perspective it is predicted that p versus non-p as well as q versus non-q frequency effects should be obtained if a task is used that actively introduces the preconditions of the tested model in a salient way, even without using sequential sampling.

Method Experiment 2a, b Design and Participants

Experiments 2a and 2b were almost identical; they only differed in their dependent variable. Experiment 2a tested p versus non-p effects, Experiment 2b q versus non-q effects. Both Experiments were presented successively, varying the task order. Experiment 2 had three between-subject factors, one concerning the order of the experiments (Exp. 2a or Exp 2b first), one concerning the probabilities (low versus high probability condition) and one controlling for the tested rule (‘vowel → even number’ versus ‘consonant → odd number’ rule). The last factor served as a

control to check for additional frequency effects caused by probability assumptions about atomic propositions mentioned in the rule (vowels and consonants might have different subjective probabilities regardless of the salient frequency manipulation).

Ninety-six students from the University of Göttingen participated in the experiment (72 % female, 28 % male; mean age 23 years). Most of the students studied psychology (53 %); the second largest group studied biology (10 %). The participants were randomly assigned to the resulting eight conditions.

In Experiment 2a seven participants and in Experiment 2b eight participants were excluded due to formal errors (they selected more cards or other cards than those they were formally allowed to select). Hence, 89 participants were analysed for Experiment 2a, and 88 participants for Experiment 2b.

Procedure and Materials

The order of Experiment 2a and 2b was varied as a separate factor but both tasks were each formulated almost identically independent of their serial order. If a task was presented in the first position, the instruction commenced “Below you see some cards”. Tasks in the second position commenced “This is a new task, which is completely independent of the previous task. Below you see a new set of cards.”

The tasks in Experiment 2a and 2b were formulated similarly. Both continued:

“On one side of each card is a letter (consonant or vowel), on the other side is a number (even or odd). In this task you should check whether the following additional assertion is true or false”. Then the rule was stated (the formulations of the rules were all set in bold print):

• In the vowel condition, the following rule was used: “If there is a vowel on the letter side of the card, then there is always an even number on the card side. In brief: If vowel then even number.”

• In the consonant condition, the rule was formulated as follows: “If there is a consonant on the letter side of the card, then there is always an odd33 number on the card side. In brief: If consonant then odd number.”

The instruction in Experiment 2b (q versus non-q selections) continued: “You should check, whether the cards correspond to this rule or not. [Break] The same 20 cards have been displayed twice; in the first display all letter sides were put upwards”.

33 Translation note: The German word for ‘odd’, ‘ungerade’, literary means ‘uneven’.

Twenty cards were shown with letters facing upwards. “In the second display of the same cards now all number sides are shown.”

Twenty cards were shown with numbers facing upwards. In the two displays ‘A’

cards were used as vowels, ‘K’ cards as consonants, ‘2’ cards as even numbers, and, finally, ‘7’ cards as odd numbers. In the low and high probability condition two different card probabilities were used:

• In the low probability condition (‘.10 → .20’) there were 2 p card sides (10 %), 18 non-p card sides (90 %), 4 q card sides (20 %), and 16 non-q card sides (80 %).

• In the high probability condition (‘.80 → .90’) there were 16 p card sides (80 %), 4 non-p card sides (20 %), 18 q card sides (90 %), and 2 non-q card sides (10 %).

The order of the cards was analogous to Experiment 1 (see Table 16, p. 101). The instruction continued: “Between the two displays the cards were completely mixed;

the card order does not correspond in the two displays. [Break] Please indicate by ticking a card, which single card you would turn over, to test the truth or falsity of the proposition in this sample. You are only allowed to turn over one currently displayed gray card (either a ‘2’ or a ‘7’).” In the described instruction of Experiment 2b the q and non-q card sides were gray, in order to make clear which cards can be chosen.

The instruction of Experiment 2a tasks was almost identical to the described Experiment 2b, only the first display with the letters facing upwards (p and non-p cards) was now described as representing the present display and the second display of the number sides (q and non-q cards) was described as being a previous display.

Correspondingly, here the first display was gray. The final instruction again asked to select a presently displayed gray card, which here referred either to one of the ‘A’ or

‘K’ cards.

Results Experiment 2a and Experiment 2b

In all four conditions of each experiment the order of the task neither significantly influenced the q versus non-q selections (exact Fisher tests: p = .67, p = 1.0, p = 1.0, p = 1.0) nor the p versus non-p selections (exact Fisher tests: p = .45, p = .66, p = .32, p = .66). Hence, the results were collapsed across this factor.

Table 19 presents the p versus non-p selections found in Experiment 2a.

Descriptively all differences went in the predicted direction. Table 21 presents the q versus non-q selections found in Experiment 2b.

There was no difference whether participants tested the ‘consonant → odd’ or the

‘vowel → even’ hypothesis, both in the low and the high probability conditions of Experiment 2a (Pearson χ2(1) = 1.58, p = .20; exact Fisher test, p = .19). Likewise this had no effect in Experiment 2b (Pearson χ2(1) = .03, p = .86; χ2(1) = .80, p = .37). The explicit and salient frequency information in the MST seems to have expunged any remaining influence of previous knowledge about the frequency of vowels and consonants. Hence, the data can also be collapsed across this dimension.

The main test concerning the low versus the high frequency condition became significant for the predicted p versus non-p effect (Exp. 2a: Pearson χ2(1) = 9.81, p <

0.01, rφ = .33) as well as for the predicted q versus non-q effect (Exp. 2b: χ2(1) = 6.76, p < 0.01, rφ = .28).

Table 19

Experiment 2a: Percentage and Number of Card Selections Concerning P Versus Non-P Selections in the Low and High Probability Conditions

Consonant → odd Vowel → even Overall

Note. Selections which are predicted to increase are darkened.

Table 20

Experiment 2b: Percentage and Number of Card Selections Concerning Q Versus Non-Q Selections in the Low and High Probability Conditions

Consonant → odd Vowel → even Overall

Note. Selections which are predicted to increase are darkened.

According to the final questionnaire only 10 % of the participants claimed that they solved the task using formal logic, 71 % used their ‘intuition or own reasoning’.

Discussion Experiment 2

The results of Experiment 2a and 2b both corroborated the predictions of the Bayesian model with fixed marginals (von Sydow, 2002; Oaksford & Chater, 2003). Although the effect sizes were lower than in Experiment 1 (see pp. 103 f.), when using

‘realistic’ ravens material, here too highly significant p versus non-p and q versus non-q effects were found.

For the first time, p versus non-p and q versus non-q frequency effects are were shown for a non-sequential selection task with the original letter-number material (Wason, 1966). We have seen before that Oaksford and Wakefield’s results (2003) cannot count as proper confirmation of the postulated effects (cf. Oberauer, Weidenfeld and Hörnig, 2004, 522, 527). Oaksford and Wakefield’s (2003) explanation, that earlier negative results (e.g., Oberauer, Wilhelm, Diaz, 1999;

Oaksford, Chater, Grainger, 1999, Exp. 2-4, see p. 86) are presumably due to not using a natural sequential sampling process, appears to be incorrect. Firstly, Oberauer et al. (2004) have shown that supporting results of Oaksford and Wakefield (2003) may be due to confounding selection opportunities and selections, and if this confounding factor is removed, the results are also negative for a sequential task (Oberauer, Weidenfeld and Hörnig, 2004). Secondly, the current results show that the predicted results can be obtained if we use an MST in which all preconditions are clearly fulfilled. Reasons for the negative results in many previous tasks may have been that probabilities or constancy assumptions were not actively introduced, or that these preconditions were obscured by a complicated procedure. Of course, this needs not to be the only reason for the results. For instance, the interesting reversed effects of Oberauer, Weidenfeld and Hörnig (2004) may be due to a selection tendency learned in the first phase. In any case, here positive results have been achieved by using a non-sequential task in which the probabilities and constancy assumption of the model with fixed marginals were clearly given.

It will be shown in the General Discussion of Chapter 5 that other non-Bayesian accounts of the WST cannot explain the results.