Statistical hypothesis testing - Corpus linguistics

Obviously, if our hypothesis is stated in terms of proportions rather than abso-lutes, we must also look at our data in terms of proportions rather than absolutes.

A single counterexample will not disprove our hypothesis, but what if the major-ity cases we come across are counterexamples? For example, if we found more black swans than white swans, would this not falsify our hypothesis that most swans are white? The answer is: not quite. With a hypothesis stated in absolute terms, it is easy to specify how many counterexamples we need to disprove it:

one. If we find just one black swan, then it cannot be true that all swans are white, regardless of how many swans we have looked at and how many swans there are.

But with a hypothesis stated in terms of proportions, matters are different:

even if the majority or even all of the cases in our data contradict it, this does not preclude the possibility that our hypothesis is true – our data will always just constitute a sample, and there is no telling whether this sample corresponds to the totality of cases from which it was drawn. Even if most or all of the swans we observe are black, this may simply be an unfortunate accident – in the total population of swans, the majority could still be white. (By the same reasoning, of course, a hypothesis is not verified if our sample consists exclusively of cases that corroborate it, since this does not preclude the possibility that in the total population, counterexamples are the majority).

So if relative statements cannot be falsified, and if (like universal statements) they cannot be verified, what can we do? There are various answers to this ques-tion, all based in probability theory (i.e., statistics). The most widely-used and broadly-accepted of these, and the one we adopt in this book, is an approach sometimes referred to as “Null Hypothesis Significance Testing”.¹

In this approach, which I will refer to simply asstatistical hypothesis testing, the problem of the non-falsifiability of quantitative hypotheses is solved in an in-direct but rather elegant way. Note that with respect to any two variables, there are two broad possibilities concerning their distribution in a population: the dis-tribution could be random (meaning that there is no relationship between the values of the two variables), or it could be non-random (meaning that one value of one variable is more probable to occur with a particular value of the other variable). For example, it could be the case that swans are randomly black or white, or it could be the case that they are more probable to have one of these colors. If the latter is true, there are, again, two broad possibilities: the data could agree with our hypothesis, or they could disagree with it. For example, it could be the case that there are more white swans than black swans (corroborating our hypothesis), or that there are more black swans than white swans (falsifying our hypothesis).

1It should be mentioned that there is a small but vocal group of critics that have pointed out a range of real and apparent problems with Null-Hypothesis Significance Testing. In my view, there are three reasons that justify ignoring their criticism in a text book like this. First, they have not managed to convince a significant (pun intended) number of practitioners in any field using statistics, which may not constitute a theoretical argument against the criticism, but certainly a practical one. Second, most, if not all of the criticisms, pertain to the way in which Null Hypothesis Significance Testing is used and to the way in which the results are (mis-)interpreted in the view of the critics. Along with many other practitioners, and even some of the critics, I believe that the best response to this is to make sure we apply the method appropriately and interpret the results carefully, rather than to give up a near-universally used fruitful set of procedures. Third, it is not clear to me that the alternatives suggested by the critics are, on the whole, less problematic or less prone to abuse and misinterpretation.

Unless we have a very specific prediction as to exactly what proportion of our data should consist of counterexamples, we cannot draw any conclusions from a sample. For most research hypotheses, we cannot specify such an exact proportion – if our hypothesis is that Most swans are white, then “most” could mean anything from 50.01 percent to 99.99 percent. But as we will see in the next subsection, we can always specify the exact proportion of counterexamples that we would expect to find if there was a random relationship between our variables, and we can then use a sample whether such a random relationship holds (or rather, how probable it is to hold).

Statistical hypothesis testing utilizes this fact by formulating not one, but two hypotheses – first, a research hypothesis postulating a relationship between two variables (like “Most swans are white” or like the hypotheses introduced in Chap-ter 5), also referred to as H₁oralternative hypothesis; second, the hypothesis that there is a random relationship between the variables mentioned in the research hypothesis, also referred to as H₀ ornull hypothesis. We then attempt to falsify thenull hypothesisand to show that the data conform to the alternative hypoth-esis.

In a first step, this involves turning the null hypothesis and the alternative hypothesis are turned into quantitative predictions concerning the intersections of the variables, as schematically shown in (1a, b):

(1) a. Null hypothesis (H₀): There is no relationship between Variable A and Variable B.

Prediction: The data should be distributed randomly across the inter-sections of A and B; i.e., the frequency/medians/means of the intersec-tions should not differ from those expected by chance.

b. Alternative hypothesis (H₁): There is a relationship between Variable A and Variable B such that some value(s) of A tend to co-occur with some value(s) of B.

Prediction: The data should be distributed non-randomly across the intersections of A and B; i.e., the frequency/medians/means of some the intersections should be higher and/or lower than those expected by chance.

Once we have formulated our research hypothesis and the corresponding null hypothesis in this way (and once we have operationalized the constructs used in formulating them), we collect, annotate and quantify the relevant data, as dis-cussed in the preceding chapter.

The crucial step in terms of statistical significance testing then consists in de-termining whether the observed distribution differs from the distribution we would expect if the null hypothesis were true – if the values of our variables were distributed randomly in the data. Of course, it is not enough to observe a difference – a certain amount of variation is to be expected even if there is no relationship between our variables. As will be discussed in detail in the next sec-tion, we must determine whether the difference is large enough to assume that it does not fall within the range of variation that could occur randomly. If we are satisfied that this is the case, we can (provisionally) reject the null hypothesis. If not, we must (provisionally) reject our research hypothesis.

In a third step (or in parallel with the second step), we must determine whether the data conform to our research hypothesis, or, more precisely, whether they differ from the prediction of H₀ in the direction predicted by H₁. If they do (for example, if there are more white swans than black swans), we can (provisionally) accept our research hypothesis, i.e., we can continue to use it as a working hy-pothesis in the same way that we would continue to use an absolute hyhy-pothesis in this way as long as we do not find a counterexample. If the data differ from the prediction of H₀ in theoppositedirection to that predicted by our research hypothesis – for example, if there are more black than white swans – we must, of course, also reject our research hypothesis, and treat the unexpected result as a new problem to be investigated further.

Let us now turn to a more detailed discussion of probabilities, random varia-tion and how statistics can be used to (potentially) reject null hypotheses.

Im Dokument Corpus linguistics (Seite 183-186)