Case II - Effect Does Exist
3.6 Multivariate Statistics
3.6.6 Bayesian Model Analysis
of soft drugs are almost equally strong but positive. This supports the view that possession (and usage) of marijuana, for example, is less affected by anti-drug laws, while people more readily react when, for example, crack is involved. Obviously, this might be partially explained by the more severe penalty for possessing hard drugs and by the larger public acceptance of soft drugs.
It is somewhat surprising that almost all included variables describing the method of analysis bear a negative sign. The correlation-dummy is a rare exception with inconsistent results (positive sign by the sw-algorithm, negative sign by the stepAIC-algorithm). Although COX-regressions are only rarely used, the associated dummy stands out because it is chosen by all four regressions.
The opposite effect is found for 2SLS and GLS models which are significantly associated with smaller (i.e., more negative) estimates. Furthermore, it seems to be the case that methods which do not consider simultaneity overestimate deterrent effects.
Other noteworthy observations are that results from studies using Canadian data are less in favor of deterrence. The same applies when the nation under study does not belong to the most frequent nations. Results which are entered by the user tr into the data base appear to be significantly more negative. This is probably explained by the fact that he entered all economic studies while all other users entered the sociological and criminological studies; tr also worked at a different location, while all other users worked in the same department. The possibility of any intentional bias can be excluded. Positively signed offenses included in the regressions are drunk driving, environmental offenses, fraud, tax evasion, negligent assault, burglary, vehicle theft. Negatively signed are severe larceny, assault, drug related crimes, as well as assault. Results are more in favor of the deterrence hypothesis if deterrence is the focus of the study. The high significance of the realized sample sizes is a bit odd. These variables are not diversified by the sign of the results, as is the case with the number of observations. Since the latter are included in every regression, the relationship, explained insection 3.4, should already be taken care of. Again, also the stepwise regression methods include some variables with only very little variation. These seem to catch some oddities in the data which cannot be explained sufficiently by more general variables.
Bayesian Model Selection (BMS)
At first, for each model∆, its probability over all possible modelsMk, given the dataD(δk is the vector of model parameters of ModelMk), is calculated:
P(∆|D) =
K
∑
k=1
P(∆|Mk,D)P(Mk|D), with P(Mk|D) = (P(D|Mk)P(M))
∑Kl=1P(D|Ml)P(Ml) and P(D|Mk) =
Z
P(D|δk,Mk)P(δk|Mk)dδk.
Then, the models with the highest posterior probabilities are chosen. However, the implementation of the estimator poses several difficulties (also see Koop and Potter (2003) or Chipman et al.
(2001)). The quality of the results may hinge on the selection of the hyper-parameters which are necessary for the calculations (Chipman et al., 2001). They can be chosen manually, calculated from the data or simply set to trivial values. Since we do not possess any usable information about the priors, we chose to use uninformative priors. As usually, there are several other problems to cope with: the huge model space involves all 2N models. Therefore, optimization algorithms and monte carlo methods should be employed in our case. We resort to the BMA-package in R. Since the algorithm can only cope with 200 variables simultaneously and has no monte carlo features, we used a weakened version of test 4 fromsubsection 3.6.4 to preselect a reduced set of variables. The level of significance was chosen in such a way that 199 variables are included (plus the constant). Due to limited computational resources and algorithm restrictions we limited the number of variables, which are simultaneously included in each model, to a maximum of 50.
Each BMA-regression then took about four days on a 4.2GHz Athlon X2 using only one core.
Basically, we will use BMS in a comparison of the methods in section 4.2 and simply select those variables with a posterior probability larger than 0.9 (which are essentially all variables used in the final BMA-model.
Bayesian Model Averaging (BMA)
BMA is essentially the same as BMS, with the exception of using the information of all considered models. Instead of the coefficient of the most probable model, BMA calculates the weighted (by their posterior model probability) average of each coefficient. Fern´andez et al.(2001b) compare BMA with EBA (as implemented by Sala-I-Martin (1997)) and find that BMA achieves better results. BMA has also been applied to crime data by Raftery et al. (1997); Fern´andez et al.
(2001a);Liang et al.(2001) andNott and Green(2004).
We note here that there are multiple other possibilities to chose the model-weights for calcu-lating the coefficients. We chose BMA because it is used in the deterrence literature, is acknowl-edged by many researches and - certainly an important argument - is implemented in an available
statistical software package. Hansen (2007) compares several averaging methods (based on the AIC, BIC, Mallows criterion and MMA - the Mallows Model Average estimator) and their per-formance. Performing a simulation he concludes that the MMA estimator has the lowest risk (expected squared error) and the weights based on the bayesian coefficients perform only better when the number of observations andR2 is low. In contrast to the other weights, the risk of the BIC-based estimator is not decreasing in the number of observations. Overall, the MMA estimator is found to be performing best but could not be implemented into this analysis because the article was published too late. Nevertheless, the BMA-results are given intable 3.53.
Table 3.53: Multivariate analysis - bayesian model averaging
Variable var p6=0 coef. sd
Study: size of first realized sample 58.8 100.0 −0.0001 0.000
Study: size of second realized sample 2.4 100.0 −0.0004 0.000
Study: tests of significance 9.4 100.0 −0.8635 0.119
Study: user, mw 1.4 100.0 0.8113 0.166
Study: author, Steven D. Levitt 1.7 100.0 1.2320 0.266
Study: author, Daniel S. Nagin 1.3 100.0 1.3570 0.281
Study: author, Isaac Ehrlich 1.0 100.0 −1.9980 0.327
Study: publication, criminology 20.5 100.0 0.4058 0.088
Study: publication, psychology 2.8 93.5 −0.7055 0.271
Study: institute, economics 41.8 100.0 −0.4757 0.088
Study: institute, miscellaneous 14.9 100.0 0.4654 0.101
Study: first population, Canada 4.3 100.0 0.9006 0.160
Study: first population, other country 6.2 100.0 0.59450 0.134
Study: sample unit, first population, states 21.9 100.0 0.5283 0.083
Study: sample individuals, first population, pupils 3.1 100.0 0.7591 0.187
Study: sample individuals, second population, population 2.6 100.0 0.8753 0.200
Study: complete sample 9.8 100.0 −0.8037 0.110
Study: main location>500000 inhabitants 3.5 100.0 −1.6430 0.176
Study: does not check representativeness 26.9 100.0 0.4378 0.074
Study: mixed questions for pretest 2.1 100.0 1.1620 0.228
Estimate: deterrence is covariate 14.8 100.0 0.5168 0.094
Estimate: exogenous, index multiplicative 1.9 100.0 0.9222 0.230
Estimate: study type, death penalty 8.2 100.0 0.6196 0.122
Estimate: exogenous, crime data, arrest rate 9.5 100.0 −0.6628 0.115
Estimate: exogenous, crime data, conviction rate 3.9 100.0 −0.7776 0.168
Estimate: exogenous, crime data, police expenditures 4.0 100.0 0.7563 0.171
Estimate: exogenous, crime data, police strength 7.9 100.0 0.8597 0.128
Estimate: exogenous, crime data, probability dummy (regime shift) 3.5 100.0 −0.9804 0.191
Estimate: exogenous, crime data, severity dummy (regime shift) 3.6 100.0 −1.0500 0.183
Estimate: exogenous, survey, severity of punishment by others 0.4 100.0 2.1570 0.490
Estimate: exogenous, experiment, experimental variation of probability of detection
2.1 100.0 −1.9570 0.232
Estimate: exogenous, in differences 3.1 100.0 0.9154 0.207
continued on the next page. . .
. . . last page oftable 3.53continued
Variable var p6=0 coef. sd
Estimate: endogenous, recidivism 0.7 94.7 1.3530 0.491
Estimate: endogenous, accidents 4.2 100.0 0.8311 0.181
Estimate: offense, drug possession (soft) 0.7 100.0 1.6650 0.408
Estimate: offense, drug possession (hard) 0.5 100.0 −2.5740 0.530
Estimate: offense, environmental crimes, violations of prescriptive limits 2.3 92.4 0.7232 0.292
Estimate: endogenous, binary category 9.6 100.0 −0.4611 0.116
Estimate: endogenous, not in logs 32.0 100.0 0.4240 0.077
Estimate: endogenous, other transformation 7.8 100.0 −0.9145 0.136
Estimate: covariate, fixed effects (spatial) 10.0 100.0 −0.66730 0.116
Estimate: covariate, poverty, welfare 6.4 100.0 0.78340 0.137
Estimate: covariate, urbanity 8.2 100.0 0.53860 0.123
Estimate: covariate, population (-growth) 11.5 100.0 0.43300 0.106
Estimate: no correction for simultaneity 19.3 100.0 −0.49180 0.088
Estimate: bivariate method, t-test for independent samples 0.5 100.0 −2.31400 0.427
Estimate: no test of significance 5.5 100.0 −0.65720 0.166
Estimate: square root of sample size for negative values 79.2 100.0 −0.01422 0.001
Estimate: square root of sample size for positive values 82.5 100.0 0.05388 0.002
constant 100.0 −0.96310 0.165
Bayesian model averaging with a maximum of 50 variables per regression. Algorithm supports only 200 variables, therefore 199 variables were preselected by EBA (weakened version of test D). The columnvar refers to the variation of a variable (i.e., the percentage of valid observations); the maximum variation for dummy variables is fifty percent. Properties of the best model:R2: 0.348, BIC:−2364; posterior probability: 0.805. The reference category for dummies is usually the opposite property or, in the case of multiple categories, the missing values.
end of thetable 3.53
The results are in line with the previous results. Among the authors, Levitt, Nagin and Ehrlich are considered important enough to be included, while only the latter has a negative (finding more deterrent effects) impact. Studies published in a criminological journal have a positive effect, in opposition to psychological journals. When a Canadian population (or “other” country) is studied, the results are also more positive, while they are more negative when large cities are studied. When the death penalty is analyzed or the deterrence variable is just a covariate, the findings are less in favor of the deterrence hypothesis.
Among the deterrence variables, the arrest and conviction rate, as well as regime shifts and experimental variation of the detection probability are considered to be very important and have a negative effect. The influence of using police measures as deterrence variables keep their positive sign, while not correcting for simultaneity (which has negative effect) is also included. Drug possession, distinguished by soft and hard drugs, is also signed as expected.
The covariates considered to be most important are poverty, urbanity, population and the usage of spatial fixed effects, while only the latter has a negative influence. Studies which report tests of significance are associated with lower (more in favor of the deterrence hypothesis) normalized t-values while this is somewhat put into perspective on the estimate-level (having the opposite
effect). As expected, the relationship between the square root of the number of observation and the t-values is considered to be of great importance.