• Keine Ergebnisse gefunden

.04 Fraction of all (normalized) t−values

3.4 Publication Bias

The econometric art as it is practiced at the computer terminal involves fitting many, perhaps thousands, of statistical models.

One or several that the researcher finds pleasing are selected for reporting purposes. This searching for a model is often well intentioned, but there can be no doubt that such a specification search invalidates the traditional theories of inference. [. . . ] all the concepts of traditional theory, utterly lose their meaning by the time an applied researcher pulls from the bramble of computer output the one thorn of a model he likes best, the one he chooses to portray as a rose.

Leamer(1983)

When analyzing data, writing and publishing a study, the reported results may be biased for various reasons:

• The researcher initiative bias. This bias is introduced if the researcher intentionally or neg-ligently forces his results in one direction. This may happen by inappropriately cleaning the data, using misspecified models or choosing inappropriate methods for evaluation. Then the

“true” distribution of estimates the data yields does systematically differ from the distribu-tion of the published estimates (Glaeser,2006).

• The publication bias. This bias is introduced by publishing only those results, which seem to support some hypothesis and holding back those results, which do not (or vice versa).

The distribution of the published estimates differs systematically from the distribution of the calculated estimates.

• Any other unintentional or unavoidable bias. Even if the author has done everything possible his estimates may still differ systematically from the distribution of the “true” estimates.

This may be caused by the data itself, lack of knowledge about better methods, missing important variables and other reasons.

It is not surprising that different results can be drawn from the same data source, when enough free parameters are available, asDijkstra(1995) puts it: “by simply adding one regressor one can obtain essentially every set of desired regression coefficients and predictions as well as t-values and standard errors.”

Detecting a bias of the third category is merely of theoretical relevance and infeasible to mea-sure in practice. Therefore, we will deal with the first and second category only. These are are undistinguishable to us since we have only access to the published studies. Subsequently we sub-sume these three categories in the following as publication bias. As McManus(1985) remarks, it is only natural and understandably that a researcher picks these specifications which he thinks

are the best, “those that make the strongest case for the researcher’s prior hypothesis”. Hopefully, publication bias arises out of the good intention to report the results he thinks best, and not to mislead the public by reportingthe results he likes best. Donohue and Wolfers(2005) emphasize this by remarking that such a bias “may occur without any of the authors being aware of it: they might simply want to report useful findings, and evidence falsifying a null hypothesis is typically regarded as more valuable”.

Publication bias as a problem has long been recognized by medical researchers and social sci-entists (Sterling,1959;Rosenthal,1979) and has also become popular in empirical economics (de Long and Lang, 1992). Traditionally, it is assumed that researchers, reviewers, editors and even readers are more willingly to accept positive results - i.e., the significant rejection of the null hy-pothesis of no effect (Stanley,2005a). The most important reason to expect this, are that reviewers and editors might more readily accept results which are consistent with conventional views; only models are selected which show expected characteristics; researchers might want to get results consistent with their theory; everyone is more confident with significant results than inconclu-sive statistics; inadequate techniques might lead to (in)significant results or that suspicious data (outliers) may be wrongly excluded from a study.

Furthermore, the published results may tend to be more significant than they ought to be, since results - or even whole studies - which are insignificant may remain in the ’file-drawer’34. It has to be remembered that such omitted results are assumed to be insignificant - and not to have (as assumed by Rosenthal) an effect size or significance of zero (Scargle,2000).

Usually, the publication process only refers to refereed journals and, as a consequence, attempts are made to minimize the publication bias by either requiring prior registration of studies (as is done by leading medical journals (Krakovsky,2004)) or by including working papers and drafts in a meta analysis (e.g.,Florax et al.(2002);Nijkamp and Poot(2005)). We think that neither method is sufficient for typical economic problems35, since the main bias may be introduced during the conception and calculation of the estimations:

[. . . ] there is much uncertainty as to the “correct” empirical model that should be used to draw inferences, and each researcher typically tries dozens, perhaps hundreds, of specifications before selecting one or a few to report.

McManus(1985)

Advances in technology have decreased the costs of running tests and alternative specifications.

The consequence is that “the ability of researcher to influence results must be increasing over time”

(Glaeser,2006) and newer results should be faced with more scepticism. Techniques concerning meta analysis and publication bias which are used in the medical field are not applicable here,

34In fact, publication bias was initially called the ’file-drawer’ problem byRosenthal(1978,1979) and modified by Rosenberg(2005) and still refers to the intentional omitting of results (Scargle,2000)

35This is especially the case in the field of empirical economics, where the researcher is free to chose his models and the estimation techniques. Furthermore, the “published” working paper version is usually already very close to the final version.

because they are too specialized (medical tests are almost always conceptualized as controlled experiments).

In theory, omitting insignificant results should be less of a problem in criminometrics, as pointed out byEide et al.(1994), since evidence supporting or rejecting the deterrence hypothesis should be of equivalent importance - as is the case in some economic theories like the natural rate hy-pothesis (Stanley, 2005b). Nonetheless, authors preferring the deterrence hypothesis might omit insignificant or positively signed results, while those authors who do not like the deterrence hy-pothesis, might do the contrary. Insignificant results may be disliked by both types of authors:

although insignificance already implies the absence of an affect, many people who oppose the deterrence hypothesis seem to “prefer” positive and significant results to discard any deterrent effect. Therefore, no reasonable prior assumption about the properties of the potential omitted studies can be made and methods based on the file-drawer approach should not be applied here.

Stanley(2005a) advises against this approach anyway.

There is also a self-cleaning effect in every field of research: competition. A published study may be an incentive for other researchers to refute its results - even if it would have not been of any interest on its own (Glaeser, 2006). In the deterrence literature, there certainly was and is a strong scientific competition going on. As a consequence, increased scepticism of sceptic results is appropriate.

3.4.1 Methods to Detect Publication Bias

People must not attempt to impose their owntruthon others.

The right to profess the truth must always be upheld, but not in a way that involves contempt for those who may think differently.

Truth imposes itself solely by the force of its own truth.

Karol Wojtyla (Pope John Paul II), 1991

Some authors try to circumvent publication bias by interpreting “missing” studies as “missing values” and augment them (Smith et al., 1997), but this seems unfeasible in our case. Most standard graphical methods, as summarized inStanley(2005a), are also not applicable here, since they are based on the interpretation of graphs which are not compatible to our weighting scheme.

However, we present some other graphs insubsection 3.4.2.

The principle of the main method we employ in subsection 3.4.3is rather simple. In the case that an effect exists, the significance value should increase as the sample size increases (and the standard deviation decreases), whereas it should be independent of the sample size in the case of no effect. Leamer(1983) already pointed out that any null hypothesis can be rejected - whether reasonable or not - if the sample size is large enough. Usually, we don’t expect this relationship to hold perfectly but it should, at least, be positive and significant when an effect exists (Stanley, 2005a). This test is used in many studies (see, for example, Stanley (2005a) and Waldorf and Byun(2005)).