A Small Simulation Study - Binomial Model

Binomial Model

3.6 A Small Simulation Study

To illustrate the use of ourRpackageROptEst(cf. AppendixD.3) and the need of robust estimation in case of the binomial model, we conclude this chapter with the presentation of a small simulation study.

We consider three different situations. First, the data is generated by the ideal model Binom (25,0.25) , we then replace 2% of the data by Binom (25,0.75) ; i.e., we consider the (realistic) gross error model

0.98 Binom (25,0.25) + 0.02 Binom (25,0.75) (3.6.1) Hence, we could say that in 2% of the considered cases the failures were noted instead of the successes – a rather realistic situation; confer Subsection 1.2c of Hampel et al.(1986) for the frequency of gross errors in real data sets. In the third situation, we replace the same 2% of the data by I_{25}; i.e., we also study the (“extreme”) gross error model

0.98 Binom (25,0.25) + 0.02 I_{25} (3.6.2) We computed 1000 samples of size 100 and determined the mean (classical optimal) and the Kolmogorov(–Smirnov) minimum distance (ksMD) estimator. The ksMD estimator then serves as initial estimator for two robust one-step estimators. The first one-step estimator (r = 0.2 ) is based on the optimally robust IC for radius r= 0.2 ; i.e., the amount of contamination is known (r/√

100 = 0.02 ). The second one-step estimator (r =r0) was calculated using the radius–minimax IC for the least favorable radius r0; i.e., the radius is completely unknown. For a boxplot of the results see Figure3.24.

Remark 3.6.1 If we only admit contaminating distributions which are also con-centrated on {0,1, . . . , m}, any value in R\ {0,1, . . . , m} is identified as outlier with probability 1 and could therefore be omitted from the sample. Thus, in this situation contaminating with I_{m} if θ <0.5 , respectively I_{0} if θ >0.5 heuris-tically seems to have the largest effect on the estimation of θ — at least for the mean. As our results indicate, this is not necessarily true for the ksMD estimator;

confer Tables3.1and3.2. In addition, in case of the Poisson model (cf. Section4.6) the ksMD estimator performs even better in the “extreme” than in the realistic sit-uation and hence, this probably also holds in case of the binomial model for larger values of m. Moreover, the robust estimators which use the ksMD estimator as

initial estimator show a similar behavior. ////

As we are dealing with the estimation of a parameter that has a very limited range (θ ∈ (0,1) ), there is no large difference in absolute values between the empirical MSEs as well as the asymptotic MSEs of the considered estimators in any case.

The results are given in Table 3.1where we also provide 95% confidence intervals based on the central limit theorem.

However, with respect to MSE–inefficiency we see clear differences between the considered estimators. In the ideal case, the subefficiency of the ksMD and the first

106 Binomial Model

one-step estimator (r= 0.2 ) with respect to the mean is clearly below 10% and also the second one-step estimator (r=r0) loses only about 20% efficiency; confer Table3.2. Somewhat surprisingly, the first one-step estimator (r= 0.2 ) performs even better than the ksMD estimator although we are in the ideal model.

In the contaminated samples the ksMD estimator (≈ 18% ) and, even more, the radius–minimax estimator (≈5% ) do not lose much efficiency. The mean, however,

mean ksMD r=0.2 r=r0 mean ksMD r=0.2 r=r0 mean ksMD r=0.2 r=r0

Figure 3.24: Boxplot for a small simulation study in case of contamination neighborhoods (∗=c).

situation mean ksMD r=r0 r= 0.2

emp. ideal 0.0078±0.0021 0.0085±0.0023 0.0093±0.0024 0.0082±0.0021

as. ideal 0.0075 — 0.0087 0.0078

realistic 0.0230±0.0061 0.0114±0.0032 0.0102±0.0026 0.0097±0.0025 extreme 0.0426±0.0111 0.0114±0.0032 0.0102±0.0026 0.0097±0.0025

asympt. 0.0300 — 0.0093 0.0087

Table 3.1: Empirical and asymptotic MSEs for a small simulation study in case of contamination neighborhoods (∗=c).

3.6 A Small Simulation Study 107

situation mean ksMD r=r₀ r= 0.2 emp. ideal 1.000 1.090 1.192 1.051

as. ideal 1.000 — 1.160 1.040 realistic 2.371 1.175 1.052 1.000 extreme 4.392 1.175 1.052 1.000 asymptotic 3.448 — 1.069 1.000

Table 3.2: MSE–inefficiencies for a small simulation study in case of contami-nation neighborhoods (∗=c).

is strongly affected by the contamination in any case and already in the rather harmless realistic situation has a subefficiency of about 140% . This efficiency loss increases up to about 340% (!) in the “extreme” case. Thus, even if the amount of contamination is small and the contaminating distribution is rather harmless, the mean cannot be regarded as an appropriate estimator. In contrast, the robust estimators do not lose much efficiency in the ideal case and also the computational effort is not much larger compared to the mean. Therefore, we recommend to use the robust estimators in any case.

Furthermore, we see that the empirical MSE and the empirical MSE–inefficiency are already in good agreement with the corresponding asymptotic values – at least in case of the robust estimators. However, the empirical values and probably also the exact finite-sample values appear to be rather larger than the asymptotic values especially in case of the mean. This points in the same direction as the results which we obtain in PartV of this thesis and inRuckdeschel and Kohl (2005) as well as with the results of the higher order studies of Ruckdeschel (2004a), Ruckdeschel (2004b),Ruckdeschel(2004c) andRuckdeschel(2005e).

Remark 3.6.2 (a)We do not give the asymptotic values for the ksMD estimator as we are not sure about its asymptotic distribution in this setup.

(b)In view of this small simulation study, it might be of interest to take a closer look at the finite-sample and asymptotic behavior of the Kolmogorov(–Smirnov) minimum distance estimator. In addition, one could perhaps investigate how much a one-step estimator is influenced by the initial estimator and if it is worth to use two- or even k-step estimators (k > 2 ). The higher order studies for one-step estimators inRuckdeschel(2005e) indeed confirm that one can gain some efficiency by using k-step estimators.

(c)Of course, a similar simulation study can be made in case of total variation neighborhoods (cf. also Section4.6) where the “extreme” situation is of the form

k7→ Binom (m, θ)({k})−r/√ n

∨0 +r/√

nI_{m}(k) ifθ <0.5 (3.6.3) respectively

k7→ Binom (m, θ)({k}) +r/√ n

∧1 +r/√

nI_{0}(k) ifθ >0.5 (3.6.4)

108 Binomial Model

similar to Rieder(1994), p 175. In particular, a larger simulation study may be of interest to compare empirical, respectively finite-sample and asymptotic results for different (small and medium) sample sizes and to investigate if there is a differ-ence concerning the speed of convergdiffer-ence towards the asymptotic values between contamination and total variation neighborhoods as encountered in PartV. ////

Chapter 4

Im Dokument Numerical Contributions to the Asymptotic Theory of Robustness (Seite 175-179)