• Keine Ergebnisse gefunden

The Need for and Justification of Robust Statistics

For detailed answers to the question “Why Robust Procedures?”, respectively “Why Robust Statistics?” we refer to Section 1.1 ofHuber(1981) and Chapter 1 ofHuber (1997), respectively Section 1.2 ofHampel et al.(1986).

In addition, Marazzi (1993), in his introduction, gives a nice motivation for robust methods which is based on linear regression and covariance matrices. We instead use the even simpler one-dimensional normal location model; i.e., Pθ = N(θ, σ) where σ= 1 is known. Although this is probably the best known model in robust statistics, some new aspects (finite-sample results, higher order asymptotics) and ideas will be presented.

In our approach; the setup of infinitesimal neighborhoods, the aim of robustness is to safeguard against deviations from the assumptions which are below or near the limits of detectability; confer also p 61 of Huber(1997). The purpose of this introduction is to demonstrate, in a quantitative manner, that such small devia-tions, may have nontrivial effects on statistical procedures, while they cannot be detected surely by goodness-of-fit tests; confer Remark 4.2.7 of Rieder(1994). On the other hand, robust procedures are very stable and lose only little efficiency in the ideal model.

Gross Error Model

As noted in Subsection 1.2c ofHampel et al.(1986), 1−10% “wrong values” (gross errors, outliers) are typical in routine data. Such real data sets can be modeled by

xxxv

the well-known gross error model (convex contamination) Q= (1−ε)Pθ+εH

where H is some arbitrary probability measure and ε ∈ [0,1] is the amount of gross errors (contamination); conferTukey(1960).

Infinitesimal Neighborhoods

In our asymptotic setup, which is based on neighborhoods that are shrinking at a rate of √

n, we have to identify ε with r/√

n where r∈[0,∞] . A motivation for this shrinkage in terms of the outlier probability is given in Ruckdeschel(2005a).

Moreover, in the finite-sample setup we use a modification of this model. That is, for sample size n ∈N and random variables U1, . . . , Un

i.i.d.

∼ Binom (1, r/√ n) we instead work with the following conditional probabilities

Qn(r) =n L

[(1−Ui)Xi+UiYi]i=1,...,n

XUi< n/2o

where X1, . . . , Xn

i.i.d.

∼ Pθ, (Y1, . . . , Yn)∼Hn ∈ M1(Bn) and all random variables are stochastically independent. This modification is motivated by the observation that no meaningful estimator can draw useful information out of a sample where PUi≥n/2 . This is a similar phenomenon as breakdown point 0.5 . An application of Theorem 2 inHoeffding(1963) shows that

P X

Ui≥n/2

≤exp

−2n(0.5−r/√ n2

decays exponentially fast. Hence, the above modification is asymptotically negli-gible; i.e., all results on weak convergence over infinitesimal neighborhoods remain unchanged. For more details we refer to Sections 2.2–2.4 of Ruckdeschel(2004c).

Remark It is a result ofRuckdeschel(2004b) that, with this modification of the 1/√

n neighborhoods, the maximum mean square error (MSE) of asymptotically linear estimators with bounded (!) influence curves converges even without clipping of the square loss function. As for the artificial clipping of unbounded loss functions confer Le Cam(1986),Rieder(1994),Bickel et al. (1998) orvan der Vaart (1998).

////

Estimators

As estimators we choose mean, median and robust estimators with influence curves (cf. Definition1.1.1) of Hampel-type form5

η(x) =A[−c∨x∧c] withA= [2Φ(c)−1]−1

where c∈(0,∞) is a suitable clipping bound and Φ is the cumulative distribution function of N(0,1) .

5in allusion to the solution derived in Lemma 5 ofHampel(1968)

xxxvi

Optimality PropertiesIn case of normal location, extending the list on p 285 of Huber(1981), estimators with influence curves of Hampel-type form are optimal in various aspects. They minimize:

(1) The maximum asymptotic variance for symmetric convex contaminations;

conferHuber(1964).

(2) The maximum asymptotic variance subject to a bias bound for infinitesi-mal contamination and total variation neighborhoods; confer Lemma 5 ofHampel (1968), Section 2.5 ofHampel et al.(1986), Section 5.5 ofRieder(1994) and Sub-section1.3.3.

(3) The maximum finite-sample under-/overshoot probability for contamina-tion/total variation neighborhoods; conferHuber(1968),Rieder(1989) and PartV.

(4) The maximum asymptotic under-/overshoot probability for infinitesimal contamination/total variation neighborhoods; conferRieder(1980) and PartV.

(5) The maximum asymptotic mean square error (MSE) for infinitesimal con-tamination and total variation neighborhoods; confer Section 5.5 of Rieder(1994) and Subsection1.3.4.

(6)More generally: The maximum asymptotic risk for infinitesimal contamina-tion and total variacontamina-tion neighborhoods, where risk may be any convex and isotone function of asymptotic variance and bias; conferRuckdeschel and Rieder (2004).

(7)The second order expansion of the maximum asymptotic MSE for infinites-imal contamination neighborhoods; conferRuckdeschel(2004b). ////

For the purpose of this introduction, we put n= 16 and radius r= 0.2 (i.e., 5%

gross errors) and choose quadratic loss (i.e., MSE). We consider the asymptotically optimal-robust estimator forr= 0.2 (i.e., c= 1.492 ) as well as the radius–minimax estimators for r∈ [0.1,0.4] (i.e., radius known up to factor 2 , ε∈[0.025,0.1] ), r∈[0, 2.0] (i.e., ε∈[0, 0.5] ) and r∈[0,∞] . The corresponding asymptotic opti-mal clipping bounds are c= 1.356 , c= 0.824 and c= 0.718 , respectively. For the definition of the radius–minimax estimator we refer to Section2.2.

Choice of Clipping Bounds

(1)We use the asymptotically optimal estimators since we want to demonstrate that these estimators work well down to small sample sizes. But, the clipping bound could also be chosen optimally with respect to the finite-sample maximum MSE, respectively the second or third order expansion of the asymptotic MSE.

As numerical results in Ruckdeschel (2004b) show, the differences between these various choices are small.

(2)There are also only small efficiency losses when we use different asymptotic risks to determine the optimal clipping bound c; confer Section 7.2 ofRuckdeschel and Rieder(2004).

(3) Under an additional homogeneity condition on the loss function, which for instance holds for all Lq risks with q≥1 , the radius–minimax estimator for r∈[0,∞] is independent from the chosen loss function; confer Sections 6 and 7.3 ofRuckdeschel and Rieder (2004). This is in fact the reason why we included this estimator. Actually, ε∈[0,1] entails r∈[0,4] at sample size n= 16 ; i.e., radii

r >4 are actually not admitted. ////

xxxvii

Finite-Sample Maximum MSE

Our finite-sample investigation proves and makes precise what has been asserted in robust statistics on asymptotic grounds all along: In the ideal situation (i.e., r = 0 ), suitable chosen asymptotically optimal-robust estimators have a slightly larger finite-sample maximum MSE than the mean. However, they do not lose much efficiency and perform clearly better than the median in the ideal model.

Contrary, for r >0 the finite-sample maximum MSE of the mean is unbounded, whereas robust estimators have a bounded finite-sample maximum MSE. That is, already small deviations from the ideal model may lead to very large errors in case of the mean. In particular, the asymptotically optimal-robust estimators again perform better than the median. These are common statements; confer for instance Sections 1.1 and 1.2 ofHuber(1981) or Sections 1.1 and 1.2 ofHampel et al.(1986).

The (numerically) exact sample distribution and corresponding finite-sample maximum risk for robust estimators with Hampel-type influence curves, which are constructed by means of the M principle, can be computed via algo-rithms developed in Subsection 11.3.2 and Ruckdeschel and Kohl (2005). These procedures use the fast Fourier transform (FFT) in crucial way. In Table 4 one can find the finite-sample maximum MSE for n = 16 and r = 0,0.2 . In these situations the median shows an efficiency loss larger than 22% (r= 0 ) and 16%

(r= 0.2 ), respectively.

r mean r= 0.2 r∈[0.1,0.4] r∈[0,2.0] r∈[0,∞] median

0 1.000 1.035 1.049 1.145 1.176 1.446

0.2 ∞ 1.450 1.431 1.443 1.465 1.713

Table 4: Finite-sample maximum MSE for normal location and sample size n= 16 .

Finite-Sample versus Asymptotic Optimal Clipping Bounds In compari-son with the asymptotic optimal clipping bound, our investigation shows that the clipping bound, which is optimal in the finite-sample sense, is in general smaller, that is, more conservative. This follows by higher oder asymptotics and numerical evaluations; confer PartV andRuckdeschel(2004b). This fact is also reflected by the results contained in Table4where the asymptotically optimal-robust estimator (c= 1.492 ) has a larger finite-sample maximum MSE than the asymptotic radius–

minimax estimators for r∈[0.1,0.4] (c= 1.356 ) and r∈[0,2.0] (c= 0.824 ). In fact, the numerically determined finite-sample optimal clipping bound is c= 1.130 and leads to a finite-sample minimax MSE of 1.418 . ////

Cniper Contamination

This notion means nice and pernicious and threatens the accuracy of estimators in an unexpected and dangerous way as a sniper does; confer Section 5 ofRuckdeschel

xxxviii

(2004a). Now, we do not admit arbitrary Hn∈ M1(Bn) but only contaminations by Dirac measures at a∈R; i.e.,

Qn(r, a) =

(1−r/√

n)Pθ+r/√

nI{a}⊗n

We determine the gross error point a minimal such that a given robust estima-tor under Qn(r, a) eliminates the classically optimal estimator (in this case: the mean ¯Xn); i.e.,

a= sup z >0

MSEQn(r,z)( ¯Xn)≤MSEQn(r,z)(Snc)

where Snc is a robust estimator with influence curve of Hampel-type form for some given clipping bound c ∈ (0,∞) . As a consequence, the robust estimator Snc has a smaller MSE for any contaminating distribution H with support [a,∞) (or (−∞,−a] ); confer Proposition 5.1 of Ruckdeschel (2004a). Under Qn(r, a) he obtains

nMSEQn(r,a)( ¯Xn) = (1−r/√

n) +a2(r2+r/√

n −r2/n) confer Section 5.3 (ibid.). Hence, for Mc:=nmaxMSE (Snc) we get

a= s

Mc−(1−r/√ n) r2+r/√

n −r2/n

For our robust estimators given in Table 4 this leads to a = 2.391 (r = 0.2 ), a= 2.345 (r∈[0.1, 0.4] ), a= 2.374 (r ∈[0, 2.0] ) and a= 2.427 (r∈[0,∞] ), respectively. These small contaminations lie well within 2.5 standard deviations from zero. Note, that under cniper contamination we even encounter less outliers, if outliers under standard normal are defined as observations with absolute value larger than 2.5 ; more precisely, this identifies the largest 1.24% as outliers in the ideal model whereas under cniper contamination we obtain (1−r/√

n)1.24% = 1.18% . Thus, this situation, which destroys the superiority of the mean, is surely innocent.

A Small Simulation Study

Next, we present the results of a small simulation study in the submodel introduced above of the type II errors of goodness-of-fit tests and of the MSE of location estimators. We computed M = 1e05 = 105 samples of size n = 16 with radius r = 0.2 (i.e., ε = 0.05 and P(PUi ≥ 8) = 3.50e−07 ). In view of the above results, a = 2.45 ( Φ(−2.45) ≈0.71% ) should be sufficient such that our robust estimators outperform the mean. To avoid ties, we used H = Unif ([2.45,2.46]) instead of H= I{2.45}.

First, we tried some diagnostics. That is, we computed well-known tests for normality using R package fBasics; confer Wuertz et al. (2005). In Table 5 one can find the empirical type II errors (the null hypothesis is not rejected when it is false) of the considered tests using a significance level of 5% . As we see, the results for the chosen tests are very similar and indicate that the power (ability to reject

xxxix

the null hypothesis when it is actually false) of goodness-of-fit tests is very small in case of such innocent contaminations. Thus, estimators should also be evaluated and compared under such innocent deviations from the normal.

Test for Normality Type II Error

Anderson-Darling 93.3%

Cram´er-von Mises 93.7%

Kolmogorov-Smirnov (Lilliefors) 94.2%

Shapiro-Wilk 93.4%

Table 5: Empirical Type II error of tests for normality under cniper contami-nation.

RemarkThese empirical results suggest to change the null hypothesis from exact normality to approximate normality. This is in the spirit of Section 3 in Rieder (1981b) where he extends the null hypothesis of exact symmetry to approximate symmetry and derives a nonparametric asymptotic maximin test. The correspond-ing modification of goodness-of-fit tests seems open. ////

Second, we computed the empirical MSE based on 1e05 samples of size 16 and corresponding 95% confidence intervals (based on the central limit theorem) of mean, median and our robust estimators; confer Table 6. This study is similar to

Estimator n×Emp.MSE 95% conf. interval

mean 1.480 [1.467,1.493]

r= 0.2: M principle 1.445 [1.431,1.458]

one-step construction 1.434 [1.420,1.447]

r∈[0.1,0.4]: M principle 1.428 [1.414,1.441]

one-step construction 1.423 [1.410,1.436]

r∈[0, 2.0]: M principle 1.441 [1.428,1.454]

one-step construction 1.448 [1.435,1.461]

r∈[0, ∞]: M principle 1.462 [1.449,1.476]

one-step construction 1.468 [1.455,1.481]

median 1.712 [1.696,1.727]

Table 6: Empirical MSE for normal location, sample size n = 16 and radius r= 0.2 under cniper contamination.

the study presented in Section 5 ofRuckdeschel(2004b). It also is in the spirit of the Princeton robustness study; conferAndrews et al.(1972). However, we choose particular asymptotically optimal estimators, compare these estimators with re-spect to their finite-sample MSE and consider only cniper contamination. Under

xl

the chosen cniper contamination the mean has a (numerically) exact finite-sample MSE of 1.477 which lies well within the given empirical confidence interval. In case of our robust estimators, the corresponding estimates are determined as M es-timators, respectively as one-step estimators starting with the median. In view of the general construction problem we also included the corresponding one-step estimates. As we see, our robust estimators indeed outperform mean and median where the results for the M principle and the one-step method are very similar.

Remark

(1) M principle and one-step construction work equally well down to even smaller sample sizes. It follows by the work of Ruckdeschel (2004b) and Ruck-deschel(2005e) on higher order asymptotics of the MSE of robust estimators with Hampel-type influence curves, that in normal location the M estimators and the one-step estimators are asymptotically equivalent up to second order.

(2) The median has the property which is required for initial estimators (√ n consistency on full 1/√

n Kolmogorov neighborhoods). This will be shown in Sub-section2.3.4. For more details on one-step constructions we refer to Section 6.4 of Rieder(1994) and Section2.3.

(3) In his Theorem 3.4 (b) Ruckdeschel(2004b) shows that contamination to the right of an := c(1 +Ap

2 log(n)/n) with A = [2Φ(c)−1]−1 is essentially sufficient such that a robust estimator with influence curve of Hampel-type for some given clipping bound c∈(0,∞) achieves his maximum asymptotic MSE up to third order. In case of our robust estimators this leads to an = 2.508 (r= 0.2 ), an = 2.324 (r ∈ [0.1,0.4] ), an = 1.645 (r ∈ [0,2.0] ) and an = 1.520 (r ∈ [0,∞] ), respectively. Hence, it is not surprising, that the previous empirical MSEs under cniper contamination (cf. Table6) are already very close to the finite-sample

maximum MSEs evaluated in Table4. ////

Conclusions

Since 1−10% gross errors are reported as typical in routine data, we draw the following conclusions:

(1) Under cniper contamination our asymptotically optimal-robust estimators supersede mean and median.

(2)Such small deviations cannot be detected surely by goodness-of-fit tests.

(3) Our asymptotically optimal-robust estimators perform well down to small sample sizes; in particular, the radius–minimax estimators for r∈ [0.1,0.4] (i.e., ε ∈ [0.025,0.1] ) and r ∈ [0,2.0] (i.e., ε ∈ [0,0.5] ) seem to be good choices for routine data if the neighborhood radius is only roughly known.

(4)M principle and one-step construction work equally well.

Proposal

In case of routine data stemming from an ideal normal model the previous conclu-sions suggest to proceed as follows:

xli

Step 1: Depending on the quality of the data, try to find a rough estimate for the amount ε∈[0,1] of gross errors such that ε∈[ε, ε] .

Step 2: Compute the influence curve of our asymptotically optimal radius–minimax estimator for r∈[√

n ε,√

n ε] usingS4generic functionradiusMinimaxICof pack-ageROptEstwhich is part of ourRbundleRobASt; confer Appendix D.

Step 3: Choose and evaluate an appropriate initial estimate. Possible implemented candidates are the median, the MAD or the Kolmogorov(–Smirnov) MD estimator (cf.S4generic functionksEstimator of packageROptEst).

Step 4: Estimate the parameter of interest by means of the one-step construction usingS4generic functiononeStepEstimatorof packageROptEst. ////

In this thesis we will show that the proposal given above works not only in case of normal location but, in case of general smoothly parameterized ideal models like exponential families or linear regression models. In addition, we provide the imple-mentation of these models and of the corresponding optimally robust estimators by means of our RbundleRobASt.