Bayesian Model Analysis - Multivariate Statistics

Case II - Effect Does Exist

3.6 Multivariate Statistics

3.6.6 Bayesian Model Analysis

of soft drugs are almost equally strong but positive. This supports the view that possession (and usage) of marijuana, for example, is less affected by anti-drug laws, while people more readily react when, for example, crack is involved. Obviously, this might be partially explained by the more severe penalty for possessing hard drugs and by the larger public acceptance of soft drugs.

It is somewhat surprising that almost all included variables describing the method of analysis bear a negative sign. The correlation-dummy is a rare exception with inconsistent results (positive sign by the sw-algorithm, negative sign by the stepAIC-algorithm). Although COX-regressions are only rarely used, the associated dummy stands out because it is chosen by all four regressions.

The opposite effect is found for 2SLS and GLS models which are significantly associated with smaller (i.e., more negative) estimates. Furthermore, it seems to be the case that methods which do not consider simultaneity overestimate deterrent effects.

Other noteworthy observations are that results from studies using Canadian data are less in favor of deterrence. The same applies when the nation under study does not belong to the most frequent nations. Results which are entered by the user tr into the data base appear to be significantly more negative. This is probably explained by the fact that he entered all economic studies while all other users entered the sociological and criminological studies; tr also worked at a different location, while all other users worked in the same department. The possibility of any intentional bias can be excluded. Positively signed offenses included in the regressions are drunk driving, environmental offenses, fraud, tax evasion, negligent assault, burglary, vehicle theft. Negatively signed are severe larceny, assault, drug related crimes, as well as assault. Results are more in favor of the deterrence hypothesis if deterrence is the focus of the study. The high significance of the realized sample sizes is a bit odd. These variables are not diversified by the sign of the results, as is the case with the number of observations. Since the latter are included in every regression, the relationship, explained insection 3.4, should already be taken care of. Again, also the stepwise regression methods include some variables with only very little variation. These seem to catch some oddities in the data which cannot be explained sufficiently by more general variables.

Bayesian Model Selection (BMS)

At first, for each model∆, its probability over all possible modelsM_k, given the dataD(δ_k is the vector of model parameters of ModelM_k), is calculated:

P(∆|D) =

∑

k=1

P(∆|M_k,D)P(M_k|D), with P(M_k|D) = (P(D|M_k)P(M))

∑^K_l=1P(D|M_l)P(M_l) and P(D|M_k) =

P(D|δ_k,M_k)P(δ_k|M_k)dδ_k.

Then, the models with the highest posterior probabilities are chosen. However, the implementation of the estimator poses several difficulties (also see Koop and Potter (2003) or Chipman et al.

(2001)). The quality of the results may hinge on the selection of the hyper-parameters which are necessary for the calculations (Chipman et al., 2001). They can be chosen manually, calculated from the data or simply set to trivial values. Since we do not possess any usable information about the priors, we chose to use uninformative priors. As usually, there are several other problems to cope with: the huge model space involves all 2^N models. Therefore, optimization algorithms and monte carlo methods should be employed in our case. We resort to the BMA-package in R. Since the algorithm can only cope with 200 variables simultaneously and has no monte carlo features, we used a weakened version of test 4 fromsubsection 3.6.4 to preselect a reduced set of variables. The level of significance was chosen in such a way that 199 variables are included (plus the constant). Due to limited computational resources and algorithm restrictions we limited the number of variables, which are simultaneously included in each model, to a maximum of 50.

Each BMA-regression then took about four days on a 4.2GHz Athlon X2 using only one core.

Basically, we will use BMS in a comparison of the methods in section 4.2 and simply select those variables with a posterior probability larger than 0.9 (which are essentially all variables used in the final BMA-model.

Bayesian Model Averaging (BMA)

BMA is essentially the same as BMS, with the exception of using the information of all considered models. Instead of the coefficient of the most probable model, BMA calculates the weighted (by their posterior model probability) average of each coefficient. Fern´andez et al.(2001b) compare BMA with EBA (as implemented by Sala-I-Martin (1997)) and find that BMA achieves better results. BMA has also been applied to crime data by Raftery et al. (1997); Fern´andez et al.

(2001a);Liang et al.(2001) andNott and Green(2004).

We note here that there are multiple other possibilities to chose the model-weights for calcu-lating the coefficients. We chose BMA because it is used in the deterrence literature, is acknowl-edged by many researches and - certainly an important argument - is implemented in an available

statistical software package. Hansen (2007) compares several averaging methods (based on the AIC, BIC, Mallows criterion and MMA - the Mallows Model Average estimator) and their per-formance. Performing a simulation he concludes that the MMA estimator has the lowest risk (expected squared error) and the weights based on the bayesian coefficients perform only better when the number of observations andR² is low. In contrast to the other weights, the risk of the BIC-based estimator is not decreasing in the number of observations. Overall, the MMA estimator is found to be performing best but could not be implemented into this analysis because the article was published too late. Nevertheless, the BMA-results are given intable 3.53.

Table 3.53: Multivariate analysis - bayesian model averaging

Variable var p6=0 coef. sd

Study: size of first realized sample 58.8 100.0 −0.0001 0.000

Study: size of second realized sample 2.4 100.0 −0.0004 0.000

Study: tests of significance 9.4 100.0 −0.8635 0.119

Study: user, mw 1.4 100.0 0.8113 0.166

Study: author, Steven D. Levitt 1.7 100.0 1.2320 0.266

Study: author, Daniel S. Nagin 1.3 100.0 1.3570 0.281

Study: author, Isaac Ehrlich 1.0 100.0 −1.9980 0.327

Study: publication, criminology 20.5 100.0 0.4058 0.088

Study: publication, psychology 2.8 93.5 −0.7055 0.271

Study: institute, economics 41.8 100.0 −0.4757 0.088

Study: institute, miscellaneous 14.9 100.0 0.4654 0.101

Study: first population, Canada 4.3 100.0 0.9006 0.160

Study: first population, other country 6.2 100.0 0.59450 0.134

Study: sample unit, first population, states 21.9 100.0 0.5283 0.083

Study: sample individuals, first population, pupils 3.1 100.0 0.7591 0.187

Study: sample individuals, second population, population 2.6 100.0 0.8753 0.200

Study: complete sample 9.8 100.0 −0.8037 0.110

Study: main location>500000 inhabitants 3.5 100.0 −1.6430 0.176

Study: does not check representativeness 26.9 100.0 0.4378 0.074

Study: mixed questions for pretest 2.1 100.0 1.1620 0.228

Estimate: deterrence is covariate 14.8 100.0 0.5168 0.094

Estimate: exogenous, index multiplicative 1.9 100.0 0.9222 0.230

Estimate: study type, death penalty 8.2 100.0 0.6196 0.122

Estimate: exogenous, crime data, arrest rate 9.5 100.0 −0.6628 0.115

Estimate: exogenous, crime data, conviction rate 3.9 100.0 −0.7776 0.168

Estimate: exogenous, crime data, police expenditures 4.0 100.0 0.7563 0.171

Estimate: exogenous, crime data, police strength 7.9 100.0 0.8597 0.128

Estimate: exogenous, crime data, probability dummy (regime shift) 3.5 100.0 −0.9804 0.191

Estimate: exogenous, crime data, severity dummy (regime shift) 3.6 100.0 −1.0500 0.183

Estimate: exogenous, survey, severity of punishment by others 0.4 100.0 2.1570 0.490

Estimate: exogenous, experiment, experimental variation of probability of detection

2.1 100.0 −1.9570 0.232

Estimate: exogenous, in differences 3.1 100.0 0.9154 0.207

continued on the next page. . .

. . . last page oftable 3.53continued

Variable var p6=0 coef. sd

Estimate: endogenous, recidivism 0.7 94.7 1.3530 0.491

Estimate: endogenous, accidents 4.2 100.0 0.8311 0.181

Estimate: offense, drug possession (soft) 0.7 100.0 1.6650 0.408

Estimate: offense, drug possession (hard) 0.5 100.0 −2.5740 0.530

Estimate: offense, environmental crimes, violations of prescriptive limits 2.3 92.4 0.7232 0.292

Estimate: endogenous, binary category 9.6 100.0 −0.4611 0.116

Estimate: endogenous, not in logs 32.0 100.0 0.4240 0.077

Estimate: endogenous, other transformation 7.8 100.0 −0.9145 0.136

Estimate: covariate, fixed effects (spatial) 10.0 100.0 −0.66730 0.116

Estimate: covariate, poverty, welfare 6.4 100.0 0.78340 0.137

Estimate: covariate, urbanity 8.2 100.0 0.53860 0.123

Estimate: covariate, population (-growth) 11.5 100.0 0.43300 0.106

Estimate: no correction for simultaneity 19.3 100.0 −0.49180 0.088

Estimate: bivariate method, t-test for independent samples 0.5 100.0 −2.31400 0.427

Estimate: no test of significance 5.5 100.0 −0.65720 0.166

Estimate: square root of sample size for negative values 79.2 100.0 −0.01422 0.001

Estimate: square root of sample size for positive values 82.5 100.0 0.05388 0.002

constant 100.0 −0.96310 0.165

Bayesian model averaging with a maximum of 50 variables per regression. Algorithm supports only 200 variables, therefore 199 variables were preselected by EBA (weakened version of test D). The columnvar refers to the variation of a variable (i.e., the percentage of valid observations); the maximum variation for dummy variables is fifty percent. Properties of the best model:R²: 0.348, BIC:−2364; posterior probability: 0.805. The reference category for dummies is usually the opposite property or, in the case of multiple categories, the missing values.

end of thetable 3.53

The results are in line with the previous results. Among the authors, Levitt, Nagin and Ehrlich are considered important enough to be included, while only the latter has a negative (finding more deterrent effects) impact. Studies published in a criminological journal have a positive effect, in opposition to psychological journals. When a Canadian population (or “other” country) is studied, the results are also more positive, while they are more negative when large cities are studied. When the death penalty is analyzed or the deterrence variable is just a covariate, the findings are less in favor of the deterrence hypothesis.

Among the deterrence variables, the arrest and conviction rate, as well as regime shifts and experimental variation of the detection probability are considered to be very important and have a negative effect. The influence of using police measures as deterrence variables keep their positive sign, while not correcting for simultaneity (which has negative effect) is also included. Drug possession, distinguished by soft and hard drugs, is also signed as expected.

The covariates considered to be most important are poverty, urbanity, population and the usage of spatial fixed effects, while only the latter has a negative influence. Studies which report tests of significance are associated with lower (more in favor of the deterrence hypothesis) normalized t-values while this is somewhat put into perspective on the estimate-level (having the opposite

effect). As expected, the relationship between the square root of the number of observation and the t-values is considered to be of great importance.

Im Dokument Meta Analysis of Crime and Deterrence : A Comprehensive Review of the Literature (Seite 177-181)