• Keine Ergebnisse gefunden

Case II - Effect Does Exist

4.2 The Tournament

4.2.2 The Contest

It is far better to foresee even without certainty than not to foresee at all.

Jules H. Poincar´e In the following we present and compare the performance of the different estimators. The first part, section 4.2.2, contains the measures to assess the quality of the estimators. They describe how well the estimators predict and fit the actual data. The second part, section 4.2.2, shows the percentages of correctly and incorrectly classified (normalized) t-values. These are used to judge the performance of the estimators in reproducing certain properties of the (normalized) t-values.

Predicting and Fitting Performance

Tables 4.1and4.2contain the resulting values of the loss-functions for the 50%-observation based test sets, while tables 4.3 and 4.4 contain those which are based on the 10%-study based test

sets. The rows contain the loss-functions, while the various competing methods are located in the columns. To simplify the interpretation, the number of variables used in each method is given in the second row. Bold cells contain the best value of all methods (in the case of the error statistics this means more similarity to the normal distribution). Light (dark) cells indicate that the model is superior (inferior), as described insubsection 4.1.1, to the best naive model (SET0 or SET1).

It is obvious that the SETS 7, 8 and 11 perform very well compared to all other methods.

They perform best in many criteria and especially the backward stepwise regression (based on significance) has a high correlation, the best root mean squared error (for all observations as well as for those with a false predicted sign), the best mean absolute error and log predictive score.

Furthermore, the variance of its residuals is the smallest and the encompassing test does also state that it does contain relevant information. Considering the fact that the stepwise forward method just uses 43 variables it performs very well. This is reflected by the best adjusted LPS (but not the adjustedR2), the hit ratio and the relative RMSE. The same applies to the BMS-set, which has the highest correlation and, although not the best, many fairly good values. It should be mentioned that the means given for BMA inhibit a very large deviation - in some runs it performed very well, in others extraordinarily bad.

Intable 4.2we test how well the estimators are in reproducing the data. The whole data set is used to establish the estimator, and a randomly chosen 50% set of the data is then estimated. This is repeated ten times. The same notation as intable 4.3applies. This is done to study whether and by how much any method performs significantly better or worse than the naive methods3, when the estimators are based on the whole data set.

It is not surprising that SET1 has the best RMSE, correlation values and is the only method chosen by the encompassing test - and that its residuals are “the best” - because the model is optimized according to these criteria. Nevertheless, the stepwise estimators perform fairly well:

the relative RMSE is the best for SET7 while SET9 and SET10 are the only estimators which are not significantly worse than the full OLS estimator and even outperform it in some criteria (Sign.

and fsRMSE). As could be expected, estimators which are based on many variables perform better than those based on only a small set of variables in fitting the data.

Instead of using very large test sets (50% of the data) and to lessen the effect that many obser-vations will not be independent (because they belong to the same study), we repeat the procedure with a 90% vs. 10% partition on the study-level. Thus, the (normalized) t-values belonging to a randomly chosen 90% set of all studies are assigned to the training set, while the rest remains in the test set. Due to time constraints and the mediocre performance of BMA, we removed the bayesian model averaging estimator from the sets and did not recalculate the set of variables for the bayesian model selection approach for each run4.

3The OLS estimator with all variables and data shows the best performance in many criteria (eg. RMSE, correlation, encompassing test) because it is constructed that way.

4We use the same set of variables derived from the whole data set and merely recalculated the coefficients for each training set.

Table4.1:Howwellthemodelspredictrandomdatasets MethodSET0SET1SET2SET3SET4SET5SET6SET7SET8SET9SET10SET11SET12 #var.0492772541561198043802692334949 RMSE3.0766.5572.8032.9282.8132.7752.7634.0252.6674.2122.7422.7246.478 Cor.0.0000.2950.4440.4310.4640.4750.4750.3770.5200.4430.5200.4910.333 Adj.R2 -0.0000.1690.4310.3830.4370.4550.4610.3680.5080.3930.4830.4830.322 U0.9071.9370.8280.8660.8320.8200.8171.2130.7881.2440.8110.8051.914 U.bias0.0030.0040.0030.0030.0030.0030.0030.0030.0030.0040.0040.0030.149 U.var.0.9970.1090.2190.1010.1470.1660.1870.2920.1910.1320.0950.1970.186 U.cov.0.0000.8870.7780.8970.8510.8300.8100.7050.8060.8650.9020.8010.664 RMSPE18.56227.12817.98825.23022.28021.65019.26816.55621.31927.47926.33419.75748.125 CI.hit0.5560.4790.5950.5410.5700.5750.5780.6170.6000.5570.5770.5910.424 Sign.0.0000.4940.3970.4750.4630.4570.4260.4710.4870.5160.5130.4490.349 Neg2pos416.5091078.90013.46012.59411.61611.45011.363158.3911.368277.93311.01011.375620.187 FsRMSE12.774149.4708.07310.5908.5448.0927.82265.5846.90352.6469.1497.57030.235 Min.Dev.-18.632-79.830-19.350-19.507-18.896-18.643-18.662-20.400-18.363-47.543-19.650-18.690-17.611 Max.Dev.54.69450.85351.00250.17250.06150.08250.05768.29550.37149.71649.55450.35375.909 Meanpos.2.0142.5911.8692.0541.9541.9161.9092.0611.8331.9961.9041.8554.157 Meanneg.-1.906-2.511-1.658-1.813-1.704-1.684-1.666-1.568-1.564-1.912-1.651-1.629-2.012 Meanabs.1.9632.5521.7651.9321.8291.6001.7871.8041.6981.9541.7781.7423.891 LPS0.8301.0560.7310.7940.7420.7220.7220.7130.6820.7640.7180.6961.968 Adj.LPS0.8301.1320.7420.8330.7660.7400.7340.7200.6950.8050.7530.7041.975 cEncomp.0.000-0.165-0.060-0.2490.1390.0270.1100.1190.3500.2260.4450.223-0.295 pEncomp.0.0000.1020.0150.3590.4300.3230.2510.0000.0480.0000.1030.083 Mean-0.003-0.214-0.068-0.084-0.085-0.080-0.063-0.059-0.087-0.123-0.109-0.063-2.314 Median-3.632-5.343-3.549-4.048-3.850-3.745-3.705-3.432-3.391-3.901-3.666-3.561-8.410 Sd2.2362.8342.0672.2632.1462.0952.0802.0111.9632.1592.0632.0183.508 Skewness-0.509-0.455-0.462-0.386-0.445-0.439-0.425-0.550-0.458-0.351-0.388-0.400-0.665 Kurtosis4.5424.3214.5883.7714.0394.0204.0625.0024.2013.6853.8394.0245.438 10runs.Trainingsetscontainrandom50%ofthedata(restbelongstothetestsets).SET0:novariables.SET1:allvariables.SET2:allsignificantvariables(<0.1)from SET1.SET3-6:EBAwithcriterionA-D.SET7/8:stepwiseforward/backward(basedonsignificance).SET9/10:stepwisebackward/forward(basedonAICimprovement). SET11:BMS(maximumof49variablesatonce).SET12:BMA(maximumof49variablesatonce).#var.isthenumberofvariablesusedbyeachmethod.Light/darkgrey cells:95%-CIdoesnotcontainthebestaverageofthenaiveapproaches(i.e.,isbetter/worsethanthebestofSET0/1).Boldcellsarethebestofeachrow.Rowsbelowthe linearestatisticsoftheerrorterm(excludingthesmallestandlargest1%oftheerrors)andonlythebest(i.e.,similaritytowardsN(0.σ2 ))cellsaremarked.

Table4.2:Howwellthemodelsfitrandomdatasets

MethodSET0SET1SET2SET3SET4SET5SET6SET7SET8SET9SET10SET11SET12#var.0492772541561198043802692334949

RMSE3.0732.2282.5602.3912.4422.4552.4972.6112.4442.2482.2572.4912.491Cor.0.0000.6890.5520.6280.6070.6010.5830.5270.6060.6810.6780.5850.585Adj.R2-0.0000.6330.5410.5970.5870.5860.5720.5210.5960.6530.6530.5790.579U0.9060.6580.7560.7060.7220.7250.7380.7710.7220.6640.6670.7360.736U.bias0.0010.0000.0010.0010.0010.0010.0010.0010.0010.0000.0000.0010.001U.var.1.0000.1760.2840.2220.2350.2410.2520.3010.2380.1810.1840.2510.252U.cov.0.0000.8240.7160.7780.7650.7590.7480.6990.7620.8190.8170.7480.748RMSPE19.01621.70517.16822.09120.18820.82618.27316.52620.71423.20523.62718.22418.326CI.hit0.5570.6640.6220.6230.6310.6230.6100.6290.6240.6640.6550.6080.608Sign.0.0000.5670.4330.5530.5220.5150.4550.4830.5110.5840.5840.4670.469Neg2pos416.5888.83512.8799.88410.10510.11910.23412.78610.8009.0339.28210.37210.380FsRMSE12.9476.2656.5646.1286.2096.2906.4638.3376.7456.0996.1906.4296.383Min.Dev.-18.664-16.042-17.481-15.739-15.760-15.676-15.830-16.648-16.150-16.166-16.011-15.697-15.698Max.Dev.54.66348.57950.55149.25049.08248.99449.18649.73449.75748.82148.70449.69549.691Meanpos.2.0021.4881.7351.6271.6731.6701.7401.8071.6871.4891.5071.7271.724Meanneg.-1.919-1.376-1.555-1.506-1.535-1.559-1.563-1.499-1.490-1.410-1.403-1.566-1.568Meanabs.1.9611.4311.6421.5651.6031.6141.6491.6421.5851.4481.4541.6441.644LPS0.8290.5640.6840.6290.6490.6540.6720.6960.6450.5720.5750.6690.669Adj.LPS0.8290.6390.7080.6680.6720.6730.6840.7020.6570.6130.6110.6770.677cEncomp.0.0000.9010.0100.063-0.0540.052-0.062-0.0250.017-0.0000.0891.406-1.418pEncomp.0.0000.2930.5040.4110.5740.4020.5580.4260.6260.4840.4770.480

Mean0.029-0.014-0.001-0.008-0.001-0.005-0.006-0.008-0.019-0.019-0.015-0.009-0.005Median-3.601-2.760-3.313-3.227-3.226-3.227-3.308-3.292-3.080-2.836-2.831-3.294-3.283Sd2.2361.6601.9141.8231.8641.8781.9151.9351.8361.6781.6841.9081.911Skewness-0.507-0.312-0.463-0.413-0.449-0.427-0.474-0.543-0.453-0.301-0.347-0.381-0.364Kurtosis4.5433.9784.3874.0264.1584.1394.1394.8054.1743.9343.9773.9173.939

10runs.Trainingsetscontainrandom50%ofthedata(restbelongstothetestsets).SET0:novariables.SET1:allvariables.SET2:allsignificantvariables(<0.1)fromSET1.SET3-6:EBAwithcriterionA-D.SET7/8:stepwiseforward/backward(basedonsignificance).SET9/10:stepwisebackward/forward(basedonAICimprovement).SET11:BMS(maximumof49variablesatonce).SET12:BMA(maximumof49variablesatonce).#var.isthenumberofvariablesusedbyeachmethod.Light/darkgreycells:95%-CIdoesnotcontainthebestaverageofthenaiveapproaches(i.e.,isbetter/worsethanthebestofSET0/1).Boldcellsarethebestofeachrow.Rowsbelowthelinearestatisticsoftheerrorterm(excludingthesmallestandlargest1%oftheerrors)andonlythebest(i.e.,similaritytowardsN(0,σ2))cellsaremarked.

Table4.3:Howwellthemodelspredictrandomstudies MethodSET0SET1SET2SET3SET4SET5SET6SET7SET8SET9SET10SET11 #var.04927725415611980438026923349 RMSE2.8674.0242.5902.8522.6092.5422.5102.5042.4122.5842.5102.436 Cor.0.0000.2050.4660.3960.4760.5020.5060.5110.5580.5100.5340.543 Adj.R2 -0.0000.0640.4530.3450.4490.4830.4930.5050.5460.4660.4990.535 U0.8821.2630.7970.8860.8050.7830.7730.7670.7410.8000.7750.748 U.bias0.0060.0080.0110.0080.0080.0080.0080.0080.0080.0080.0050.010 U.var.0.9960.0480.2290.0860.1370.1720.1980.2970.1870.0720.0930.223 U.cov.0.0000.9450.7620.9080.8560.8210.7960.6970.8060.9210.9040.769 RMSPE15.76335.51816.15120.36819.39120.15416.31714.03619.93125.84124.80817.368 CI.hit0.5140.3320.5570.4760.5160.5300.5310.5930.5650.4910.5020.567 Sign.0.0000.4570.4140.4830.5020.5010.4240.4920.5160.5210.5150.464 Neg2pos412.55822.60611.63611.28110.32010.2169.94110.55810.39510.76710.6189.640 FsRMSE11.50926.6658.78811.5619.3089.0288.8918.8888.54410.26110.2037.356 Min.Dev.-11.599-14.217-11.136-11.541-11.150-11.185-10.884-10.957-11.106-11.282-10.943-10.624 Max.Dev.17.87917.62315.76615.83115.23315.23415.16015.79015.05314.40714.52515.341 Meanpos.2.0012.9891.8432.1551.9481.8871.8381.8181.7441.9471.8801.803 Meanneg.-1.841-2.844-1.624-1.873-1.697-1.639-1.630-1.436-1.508-1.755-1.679-1.515 Meanabs.1.9142.9201.7312.0121.8201.7601.7331.6211.6221.8551.7781.656 LPS0.9281.6290.8010.9590.8300.7960.7810.7460.7260.8350.7950.739 Adj.LPS0.9281.7050.8130.9980.8540.8150.7940.7530.7380.8760.8310.747 cEncomp.0.000-0.354-0.072-0.4570.106-0.0320.1700.2390.1980.4860.5100.219 pEncomp.0.0000.3550.0150.4610.3720.3010.1970.2430.1630.0800.313 Mean0.002-0.020-0.012-0.092-0.064-0.070-0.071-0.121-0.055-0.107-0.081-0.090 Median-3.272-6.146-3.415-4.092-3.742-3.596-3.421-3.391-3.212-3.814-3.680-3.370 Sd2.1613.4781.9972.3142.1002.0232.0001.9191.8912.1092.0411.926 Skewness-0.5890.006-0.538-0.418-0.430-0.476-0.492-0.769-0.527-0.269-0.344-0.548 Kurtosis5.2994.1234.6023.6733.8253.9444.1355.7894.5573.1493.4234.261 10runs.Trainingsetscontainrandom90%ofallstudies(allotherstudiesbelongtothetestsets).SET0:novariables.SET1:allvariables.SET2:allsignificantvariables (<0.1)fromSET1.SET3-6:EBAwithcriterionA-D.SET7/8:stepwiseforward/backward(basedonsignificance).SET9/10:stepwisebackward/forward(basedonAIC improvement).SET11:BMS(maximumof49variablesatonce).#var.isthenumberofvariablesusedbyeachmethod.Light/darkgreycells:95%-CIdoesnotcontain thebestaverageofthenaiveapproaches(i.e.,isbetter/worsethanthebestofSET0/1).Boldcellsarethebestofeachrow.Rowsbelowthelinearestatisticsoftheerror term(excludingthesmallestandlargest1%oftheerrors)andonlythebest(i.e.,similaritytowardsN(0,σ2 ))cellsaremarked.

Compared with table 4.1 the results in table 4.3are quite similar. However, the good perfor-mance of SET8, the backward stepwise regression, is not as dominating as before. Other stepwise procedures (SET7 and SET9) get closer to SET8. Although the variables for the BMS approach are not recalculated, SET11 performs much better.

Table 4.4 is the analogy to table 4.2. The estimators are based on the full data set and their performance in fitting random 10% sets of the studies are compared.

There are not many differences between the results from the test sets based on random 10%

studies or 50% data. Of course, SET1 is the best while the StepAIC results come second.

Classification Performance

For reasons of parsimony, the classification ratings are given only for the 10%-test sets, which can be interpreted as a simulation of the estimation of unknown studies.Tables 4.5and4.6report the statistics of the precision of the estimates for predicting and fitting unknown studies. Again, the columns contain the models5 while the rows display the categories. SET1 is not a constant anymore (which does not make much sense when classifying observations) but random draws from the statistical distribution6 of the (normalized) t-values. The last column (N) contains the number of observations in the various categories. To simplify the interpretation, the number of variables used in each method is given in the second row. Finally, the cells report the precision ratings (the number of correctly classified observations divided by N). Again, bold cells contain the best value of all methods. Light (dark) cells indicate that the model is superior (inferior) to the best naive model (all or no variables used).

Overall, SET7 (which uses only 43 variables) performs best when classifying negative (normal-ized) t-values, while the AIC-based stepwise estimators are somewhat better in predicting positive values. The naive approach (SET1) perform worse in almost all categories (except the positive (normalized) t-values which are significant at a 5% level). Random guessing is especially bad.

When all studies are used to construct the estimators, the picture changes somewhat. Although the naive approach is still much better in fitting than in predicting, it performs not as well. SET1 is only the best in one category while SET9 and SET10 are not significantly worse in any category.

SET9 seems to perform best in this case.

Tables 4.7 and 4.8 contain the classification error statistics of the models of predicting and fitting other studies. The models are in the columns and the statistics are given in the rows (the second row reports the number of variables used in each model). The rows are organized in groups of three lines and contain the average values of all ten runs.

The first line reports the category and the error rate, which is calculated by the percentage of falsely classified estimates in that category. This value is the number of estimates which actually

5Since the results by BMA are unreliable (very high variance in their quality) and would have taken several weeks to compute, we have, again, omitted BMA from the study-based analysis and did not recalculate the sets for BMS.

6In fact, we did not draw from the sample but used the normal distribution with the corresponding moments.

Table4.4:Howwellthemodelsfitrandomstudies MethodSET0SET1SET2SET3SET4SET5SET6SET7SET8SET9SET10SET11 #var.04927725415611980438026923349 RMSE2.8662.0202.3962.1932.2432.2642.3082.4212.2612.0462.0552.306 Cor.0.0000.7110.5620.6470.6270.6170.5950.5460.6180.7020.6980.597 Adj.R2 -0.0000.6590.5520.6170.6080.6020.5850.5400.6080.6750.6750.590 U0.8810.6210.7360.6740.6900.6970.7110.7420.6950.6290.6320.710 U.bias0.0050.0010.0080.0040.0040.0040.0050.0050.0050.0020.0030.006 U.var.0.9970.1710.2840.2030.2150.2240.2410.3050.2240.1760.1800.248 U.cov.0.0000.8290.7090.7950.7830.7730.7560.6910.7720.8230.8190.747 RMSPE15.85019.18615.84518.17417.12818.22716.14113.75618.61619.44420.26216.182 CI.hit0.5150.6490.5960.6040.5960.5840.5640.6120.5940.6400.6370.583 Sign.0.0000.5560.4460.5620.5470.5430.4500.4990.5220.5670.5660.466 Neg2pos412.5808.41510.8978.7018.8949.0178.96010.1679.7648.5288.8109.105 FsRMSE11.5616.8317.7636.8007.0486.8987.1868.5658.4146.9467.0256.747 Min.Dev.-11.609-10.589-11.015-10.430-10.677-10.730-10.655-10.913-10.703-10.564-10.587-10.496 Max.Dev.17.87012.74514.82913.62813.64513.80813.98615.16214.25512.88312.90814.210 Meanpos.1.9941.3761.7011.5641.6211.6251.6541.7591.6051.4011.4251.681 Meanneg.-1.847-1.317-1.476-1.414-1.446-1.481-1.524-1.385-1.407-1.337-1.326-1.484 Meanabs.1.9131.3421.5831.4871.5301.5501.5891.5601.5031.3661.3711.580 LPS0.9270.5650.7110.6400.6620.6730.6940.7100.6590.5760.5810.689 Adj.LPS0.9270.6400.7230.6790.6860.6920.7070.7170.6710.6170.6160.697 cEncomp.0.0000.9230.0760.043-0.1070.041-0.0050.098-0.019-0.0280.081-0.029 pEncomp.0.0140.2510.3870.4330.4520.5640.2130.2670.4950.3540.477 Mean0.009-0.0280.013-0.068-0.039-0.048-0.049-0.070-0.024-0.049-0.042-0.068 Median-3.268-2.615-3.135-3.005-3.090-3.088-3.156-3.208-3.030-2.683-2.650-3.241 Sd2.1721.5871.8661.7261.7581.7991.8251.8771.7821.6121.6051.827 Skewness-0.613-0.128-0.429-0.457-0.436-0.473-0.436-0.715-0.334-0.180-0.095-0.565 Kurtosis5.4964.4954.9684.0303.8884.1293.9345.9914.9524.5404.5244.209 10runs.Trainingsetcontainsallstudies.SET0:novariables.SET1:allvariables.SET2:allsignificantvariables(<0.1)fromSET1.SET3-6:EBAwithcriterionA-D. SET7/8:stepwiseforward/backward(basedonsignificance).SET9/10:stepwisebackward/forward(basedonAICimprovement).SET11:BMS(maximumof49variables atonce).#var.isthenumberofvariablesusedbyeachmethod.Boldcellsarethebestofeachrow.Rowsbelowthelinearestatisticsoftheerrorterm(excludingthe smallestandlargest1%oftheerrors)andonlythebest(i.e.,similaritytowardsN(0,σ2))cellsaremarked.

Table4.5:Classificationratingsoftheprecisioninpredictingrandomstudies

MethodSET0SET1SET2SET3SET4SET5SET6SET7SET8SET9SET10SET11N#var.04927725415611980438026923349

Sign0.6220.6720.8130.7800.7940.7970.8120.8440.8190.7660.7810.816696Pos.sign0.2700.4420.3860.4900.4810.4870.4640.4550.4770.5180.5310.486178Neg.sign0.7400.7500.9570.8790.9000.9030.9290.9750.9340.8500.8660.92751820%sign.0.4540.5290.6290.5870.6260.6370.6450.6540.6350.5990.6060.65039020%pos.0.1140.2890.2370.2860.2550.2670.2600.2230.2810.3390.3490.2336820%neg.0.5250.5800.7160.6510.7050.7170.7280.7470.7110.6560.6620.7393225%sign.0.3440.4570.4140.4830.5020.5010.4240.4920.5160.5210.5150.4643025%pos.0.0660.2810.1730.2700.2180.2230.2110.1820.2750.3330.2420.166435%neg.0.3920.4860.4570.5220.5530.5520.4630.5480.5610.5540.5620.513258

10runs.Trainingsetscontainrandom90%ofallstudies(allotherstudiesbelongtothetestsets).SET0:randomguessing.SET1:allvariables.SET2:allsignificantvariables(<0.1)fromSET1.SET3-6:EBAwithcriterionA-D.SET7/8:stepwiseforward/backward(basedonsignificance).SET9/10:stepwisebackward/forward(basedonAICimprovement).SET11:BMS(maximumof49variablesatonce).Nisthenumberofobservationsineachcategoryofthetest-set.#var.isthenumberofvariablesusedbyeachmethod.Light/darkgreycells:95%-CIdoesnotcontainthebestaverageofthenaiveapproaches(i.e.,isbetter/worsethanthebestofSET0/1).Boldcellsarethebestofeachrow.

Table4.6:Classificationratingsoftheprecisioninfittingrandomstudies MethodSET0SET1SET2SET3SET4SET5SET6SET7SET8SET9SET10SET11N #var.04927725415611980438026923349 Sign0.6220.8610.8230.8350.8330.8320.8330.8480.8450.8630.8620.832696 Pos.sign0.2690.6430.3880.5370.5240.5170.4740.4490.5350.6450.6470.503178 Neg.sign0.7400.9340.9690.9370.9370.9380.9540.9820.9490.9360.9340.942518 20%sign.0.4540.6760.6580.6670.6690.6710.6610.6920.6460.6830.6650.658390 20%pos.0.1140.3220.2320.3050.2890.2950.2660.2170.2670.3550.3300.22868 20%neg.0.5250.7510.7510.7440.7500.7510.7450.7930.7270.7520.7370.750322 5%sign.0.3430.5560.4460.5620.5470.5430.4500.4990.5220.5670.5660.466302 5%pos.0.0660.3270.1630.2940.2390.2920.1970.1650.2860.3050.3150.15343 5%neg.0.3920.5960.4970.6090.6010.5890.4950.5600.5660.6120.6100.520258 10runs.Trainingsetscontainrandom90%ofallstudies(allotherstudiesbelongtothetestsets).SET0:randomguessing.SET1:allvariables.SET2:allsignificant variables(<0.1)fromSET1.SET3-6:EBAwithcriterionA-D.SET7/8:stepwiseforward/backward(basedonsignificance).SET9/10:stepwisebackward/forward(based onAICimprovement).SET11:BMS(maximumof49variablesatonce).Nisthenumberofobservationsineachcategoryofthetest-set.#var.isthenumberofvariables usedbyeachmethod.Light/darkgreycells:95%-CIdoesnotcontainthebestaverageofthenaiveapproaches(i.e.,isbetter/worsethanthebestofSET0/1).Boldcellsare thebestofeachrow.

do not belong to that category (given first in the second line) divided by the number of obser-vations estimated to be in that category (given in the second line in parentheses). The number in parentheses in the first column is the actual number of observations in that category. The total miss rate is given in the third line and is calculated by the sum of the not correctly (the observation actually belongs to the category but not the estimate) and falsely classified (the estimate belongs to the category but not the actual observation) values divided by the total number of actual values belonging to that category (thus a 0 indicates a perfect classification while 2 is the worst case when every observation is incorrectly classified7.

Looking at table 4.7the stepwise regressions (SET7 and SET8) appear to perform best while it is appropriate to point out that all estimators do much better than the naive approaches. Not surprisingly, the number of positive outcomes is generally underestimated and the percentage of wrong classifications is better for negative (normalized) t-values than for positive values. It is interesting to note that the number of positive results is underestimated by the stepwise regressions based on the significance levels while it is overestimated by the stepAIC algorithm. For SET7 and SET8 the total miss rates are all below one. Although the EBA results are not as good, they perform fairly well in predicting the sizes of the categories.

Again, the picture changes when all studies are used to construct the estimators (table 4.8).

SET1 performs better but is the best in just one category. SET7 and SET8 fall back behind SET9 and SET10. All in all, the total miss rates have been reduced by a large margin. Surprisingly, the estimated number of observations in each category has, overall, become worse.

4.2.3 And the Winner. . .

There is no more common error than to assume that, because prolonged and accurate mathematical calculations have been made, the application of the result to some fact of nature is absolutely certain.

Alfred N. Whitehead, Alfred North Whitehead: An Anthology, 1953

. . . depends on the aim of the researcher. Shall the estimator fit (ex post prediction) the existing data as well as possible? Or should the estimator predict (ex ante) unknown data? Is a good fit more or less important than a general classification?

One general conclusion seems to be that selecting fewer variables is better for predicting but worse for fitting the data. Bayesian Model Averaging was restricted to 50 variables and stands out by its high variability: in some runs its predictive/fitting performance is very good and in some cases it is extraordinarily bad. This may come from utilizing too much detailed information from some studies which are rather specialized and not suited to be used for other studies because

7This is the case when every value which is estimated to belong to that category actually does not, and every observation which does not belong to that category is estimated to be part of it.

Table4.7:Classificationratingsoftheerrorsinpredictingrandomstudies MethodSET0SET1SET2SET3SET4SET5SET6SET7SET8SET9SET10SET11 #var.04927725415611980438026923349 Sign0.3780.3280.1870.2200.2060.2030.1880.1560.1810.2340.2190.184 #false(696)263(696)228(696)130(696)153(696)144(696)141(696)131(696)108(696)126(696)163(696)152(696)128(696) Totalmissrate0.7560.6550.3730.4400.4130.4070.3760.3110.3630.4680.4370.369 Pos.sign0.7410.6320.2230.4200.3780.3610.2980.1330.2890.4610.4250.304 #false(178)131(177)154(243)30(136)75(179)67(177)64(178)34(114)10(76)42(144)101(219)85(201)37(121) Totalmissrate1.4661.4230.7850.9330.8950.8750.7270.6020.7561.0500.9490.722 Neg.sign0.2490.2000.1760.1640.1630.1620.1630.1580.1580.1610.1540.157 #false(518)129(519)90(452)98(559)84(516)85(518)84(517)95(582)98(619)87(552)77(476)76(495)90(574) Totalmissrate0.5100.4250.2330.2840.2630.2580.2540.2140.2350.2980.2810.247 Sig200.5430.5120.3320.3880.3480.3430.3290.3170.3010.3730.3560.318 #false(390)239(440)237(463)124(375)144(371)125(359)122(356)125(379)125(394)105(348)143(382)137(385)124(389) Totalmissrate1.1581.0800.6900.7820.6950.6760.6750.6670.6340.7660.7450.667 Pos.sig200.8850.8130.4370.6170.6090.4680.4000.4500.3930.6230.5820.549 #false(68)66(74)117(144)25(57)44(71)36(59)25(54)10(24)10(23)17(44)61(97)43(74)14(25) Totalmissrate1.8532.4371.1311.3561.2701.1020.8810.9300.9701.5531.2810.972 Neg.sig200.4760.4170.3230.3470.3180.3250.3210.3050.2880.3230.3130.303 #false(322)174(365)133(319)103(318)104(300)95(300)98(303)114(356)113(371)88(304)92(285)97(311)110(363) Totalmissrate1.0150.8330.6030.6720.5910.5890.6260.6040.5610.6300.6410.603 Sig950.6100.5790.3960.4430.3800.3800.3830.3500.3480.4080.3910.362 #false(302)194(319)217(375)78(198)113(255)91(238)88(232)80(210)76(216)75(217)103(253)103(264)80(221) Totalmissrate1.2991.2620.8460.8920.7980.7920.8420.7590.7340.8210.8260.801 Pos.sig950.9230.8410.5100.6340.7260.5670.5180.4840.4190.6140.6160.634 #false(43)39(43)95(113)15(30)32(51)22(30)15(26)9(17)6(13)9(21)34(55)25(40)9(14) Totalmissrate1.8502.9231.1801.4841.2931.1200.9930.9600.9301.4541.3331.038 Neg.sig950.5640.5010.3900.4120.3550.3700.3750.3410.3410.3730.3600.352 #false(258)156(276)131(262)66(168)84(204)74(208)76(206)72(193)69(203)67(196)74(198)80(224)73(207) Totalmissrate1.2111.0220.7970.8030.7330.7450.8170.7210.6980.7320.7490.769 10runs.Trainingsetscontainrandom90%ofallstudies(allotherstudiesbelongtothetestsets).SET0:randomguessing.SET1:allvariables.SET2:allsignificant variables(<0.1)fromSET1.SET3-6:EBAwithcriterionA-D.SET7/8:stepwiseforward/backward(basedonsignificance).SET9/10:stepwisebackward/forward(based onAICimprovement).SET11:BMS(maximumof49variablesatonce).#var.isthenumberofvariablesusedbyeachmethod.Light/darkgreycells(firstrowofeach category):95%-CIdoesnotcontainthebestaverageofthenaiveapproaches(i.e.,isbetter/worsethanthebestofSET0/1).Boldcellsarethebestofeachrow.#false:the numberoffalselyestimateddatainthecorrespondingcategory;thenumberinparenthesesistheestimatednumberofobservationsinthatcategory(thenumberinthefirst columnistheactualnumber).Thetotalmissrateisthesumoferrorsofbothkindsdividedbytheactualnumberofobservationsinthecorrespondingcategory.