• Keine Ergebnisse gefunden

3.2 The Data

3.2.2 The (Normalized) t-Value

Everything which is merely probable is probably wrong.

Ren´e Descartes We chose to extend the usual procedure to rely solely on the reported, original t-values, by using all given values of significance (transforming them into t-values) and then normalizing them.

12Estimates with the property “favored by the author” were always recorded and it is noted in the data base whether they are also randomly chosen. Favored estimates which are not randomly chosen may be analyzed in the future and are neglected in this study. Fox example,Rose(2004) analyzes the results favored by the author of each study.

Normalization removes any systematic differences caused by the various t-distributions (which depend on the degrees of freedom and the implemented estimator), so that weighting them by the degrees of freedom (or just the sample size like inKnell and Stix (2005)), is not necessary. The procedure follows these rules:

1. If a t-value is reported, we take it as it is.

2. If a coefficient and its standard deviation is reported, we calculate the corresponding t-value, regardless of the used estimator13

3. If only the significance of a F- orχ2- test is given,

a) the value is transformed into a t-value, if the degrees of freedom are given;

b) if the degrees of freedom are not given, they are approximated by the number of ob-servations and covariates;

c) if the number of observations and covariates is not given, it is approximated by a median number of 232 and 15 respectively.

4. If only the category of the p-value (not significant, 0.1, 0.05, 0.01 and 0.001) is given, the corresponding t-value is approximated by the following rules:

a) an uniformly distributed number between the upper and lower limit of the category is chosen, representing the “exact” p-value14;

b) the corresponding t-value of this p-value is calculated, assuming two-sided tests15, according to the following rules:

i. using the degrees of freedoms, if reported;

ii. if the degrees of freedom are not reported, they are approximated by the number of observations and covariates;

iii. if these are not reported too, they are approximated by the median number of 232 and 15 respectively.

5. If an estimate supports the deterrence hypothesis it’s t-value is provided with a negative sign and with a positive sign otherwise.

6. All t-values are normalized:

a) a t-value is transformed into the corresponding p-value using the reported or approxi-mated degrees of freedom and the t-distribution;

13We acknowledge that some of these values are only asymptotically t-distributed or even not at all. However, we prefer this inaccuracy, which should be quite small, to losing such estimates.

14However, this implies that even, in absolute values, the largest t-values calculated in this fashion are much smaller than the largest t-values reported by several studies; seetable 3.5.

15If the study reports one-sided tests, this is considered in the data base accordingly.

b) this p-value is transformed by the inverse normal distribution to a value of significance which is independent of the number of degrees of freedom.

Due to limited precision, we were not able to normalize t-values16 below -38. This affects 11 t-values, including eight which are favored by the author but are not randomly chosen; so, practically, only three t-values are affected. Some authors remove outliers in their meta analysis (e.g.,Murphy et al.(2003) orKnell and Stix(2005)), but we want to keep them since we have no prior knowledge (except the sample size) of what could cause such outliers. However, since the relative difference between the t-values and their normalized counterparts can be quite large, we did not want to include them unadjusted and chose to transform these few outliers by the following formula, to conserve most of their relationship:

tnew:= log(|told|) log(|tmin|)tmin,

wheretmin is the smallest normalized t-value. This reduces the influence of these extreme values and retains the relationship between those values, at least at a logarithmic scale. In the case of the three values (excluding the eight favored values), this means the following transformations:

−582→ −64.81134,−86.517→ −45.40674 and−40.58823→ −37.7018. Some of the effects of this normalization procedure can be seen intables 3.5and3.6.

Table 3.5: Comparison of the original and transformed (normalized) t-values

t-values mean median min max % #e #s

Overall −1.40 −1.37 −64.81 19.05 41.66 6530 663

Original −1.66 −1.69 −64.81 15.16 44.93 2662 285

Calculated −2.18 −1.47 −37.70 19.05 42.90 888 98

Transformed −0.95 −0.89 −4.97 3.96 38.36 2980 328

Overall: all t-values;Original: all t-values reported in a study;Calculated: all t-values which were calculated by a given coefficient and sd;Transformed: p-values transformed into t-values.

%is the percentage of estimates which are consistent with the deterrence hypothesis and significant at a 5% level in a two sided test.#eis the weighted number of all valid estimates.#sis the number of studies the estimates are based on.

Weighting

As mentioned before, it was necessary to restrict ourselves to one estimate per crime and source per study. This makes it necessary to weigh the estimates in our data base in some way. In principle, there are three different approaches from which we chose the last one:

16In fact, this depends on the t-value and the degrees of freedoms simultaneously. Although being a very subjective limit, defining these t-values as outliers seems very practical.

1. Leave everything unchanged: i.e., use the unweighted estimates. However, studies which present numerous estimates would squeeze out the effects of studies with only a few (Stan-ley, 2005a). Moreover, “our” studies (i.e., those recorded in Darmstadt) would be under-represented (seetable 3.6).

2. Treat each estimate equally: weight each estimate in such a way that the sum of all weighted estimates of each study is equal to the total amount of results it contains. This would be an approximation of the case in which we record all results and would bias the analysis in favor of those studies with many results.

3. Treat each study equally: weight every estimate by the inverse number of the estimates in the data base belonging to the corresponding study. If a study recorded by the team in Heidelberg providesnestimates, it is weighted by 1/n. A study recorded by us, of whichm out ofnestimates are in our data base, each is weighted by 1/m. Therefore the sum of all weights of each study amounts to one.

Since “our” studies seem to differ significantly from the others, which can be readily appreciated by examining table 3.6, and the number of results per study varies substantially (from one to several hundred), we decided to use the latter weighting scheme.

Table 3.6: Weighted (normalized) t-values distinguished by the source of data

Source obs. % mean median sd min max

Both, unweighted, not normalized 6530 100.00 −1.30 −0.91 7.77 −582 20.93

Both, unweighted 6530 100.00 −1.15 −0.91 2.70 −64.81 19.05

Darmstadt, unweighted 2320 35.53 −1.57 −1.20 3.61 −64.81 19.05 Heidelberg, unweighted 4210 64.47 −0.92 −0.75 1.99 −17.91 11.72

Darmstadt, weighted 2320 48.16 −1.76 −1.61 3.66 −64.81 19.05

Heidelberg, weighted 4210 51.84 −1.07 −1.11 2.36 −17.91 11.72

Both, weighted, not normalized 6530 100.00 −1.51 −1.37 4.51 −582 20.93

Both, weighted 6530 100.00 −1.40 −1.37 3.08 −64.81 19.05

The rows of the unweighted data refer to the first weighting scheme: leave everything unchanged. The rows of the weighted data refer to the third weighting scheme, which weights each study equally. Naturally, the number of observations and the extreme values are not affected by weighting. The % column indicates the fraction of the whole data set belonging to this row (measured either by the number of observations or sum of weights).

We acknowledge that there might be more weighting schemes possible, like using some impact factor of the publications (van der Sluis et al., 2003), removing heteroscedasticity (Murphy et al., 2003), using the sample size or the time frame (Knell and Stix,2005) or adjusting for significance (Waldorf and Byun, 2005) or others, like the inverse variance of the results, number of results, number of regressors,R2, etc. (Weichselbaumer and Winter-Ebmer,2005). However, we refrain from using any of them because we do not want to mix different weighting schemes.