• Keine Ergebnisse gefunden

But what exactly is an influence curve IC as it is mentioned on the Wikipedia page or used in the two examples above? It is part of the so called ”differential approach” going back to Hampel in [Hampel (1968)] and several subsequent surveys by Hampel, Krasker, Ronchetti, Rousseeuw et al.. The differential approach is based on three central concepts:

qualitative robustness, influence function and breakdown point. They correspond in some sense to continuity, first derivative of a function and nearest singularity. Without stressing on these concepts in too much detail we stay with the influence curve (IC) or influence function (IF). Providing the richest quantitative robustness information, it describes the (approximate and standardized) effect of an additional observation in any point x on a statistic T, given a (large) sample with distribution F. Roughly speaking, the influence function is the first derivative of a statistic T at an underlying distribution F, where the point x plays the role of the coordinate in the infinite-dimensional space of probability distributions. But by the interpretation of derivation as a linearization of function, we can use an one-step Taylor-expansion to replace our statistic T by a linear statistic or functional, respectively. In the infinite dimensional setup there are different notions of a derivative. The following notion is taken from [Huber (1981)], p.35.

Definition 2.1. We say that a statistical functional T is Frech´et differentiable at F if it can be approximated by a linear functional L (defined on the space of finite signed measures) such that, for all distributions G,

|T(G)−T(F)−L(G−F)|=o(d(F, G)) (2.2) with d a metric in the space Mof probability measures, that:

(1) d is compatible with the weak topology in the sense that {F|d(G, F)< ε} is open for all ε >0.

(2) d is compatible with the affine structure of M: let Ft = (1 −t)F0 + tF1, then d(Ft, Fs) =O(|t−s|).

Remark 2.2. a) The usual distance functions metrizing the weak topology satisfy both conditions, i.e. the Levy metric3 dλ, the Prokhorov metric4 dπ and the bounded Lipschitz metricdBL. For a short proof with respect to Condition (2) in the previous definition we refer to [Huber (1981)], p. 35.

b) Another (strong) concept of differentiability is compact differentiability (confer [Rieder (1994)], [Reeds (1976)], [Serfling (1980)] or [Fernholz (1983)]).

The next proposition is taken from [Huber (1981)], p. 37, called Proposition 5.1 (ibid.).

Proposition 2.3. IfT is weakly continuous in a neighborhood of F and Frech´et differen-tiable at F, then its Frech´et derivative at F is a weakly continuous linear functional, and it is representable as

L(G−F) = Z

ψFdG (2.3)

3confer (3.7)

4confer (3.5)

with ψF bounded and continuous, and R

ψFdF = 0.

Proof. Proposition 5.1 of [Huber (1981)]

This function ψF will be called influence function. But first, we have to weaken up our concept of differentiability, because unfortunately the concept of Frech´et differentiability appears to be too strong or elaborate, respectively. A way out is offered by the weakest concept of differentiability, theGˆateaux derivative. The following definition is taken from [Huber (1997)] and mixed up with the definition in [Hampel et al. (1986)] p. 83:

Definition 2.4. A functional T is called Gˆateaux differentiable atF, if there is a function ψ such that for all G∈ M,

limt→0

T((1−t)F +tG)−T(F)

t =

Z

ψ(x)G(dx).

which may also be written as

∂t[T((1−t)F +tG)]t=0 = Z

ψ(x)dG(x). (2.4)

We take the next remarks from [Huber (1981)] and [Hampel et al. (1986)], p. 84:

Remark 2.5. a) Clearly, ifT is Frech´et differentiable, it is also Gˆateaux differentiable, and the two derivatives agree.

b) The basic idea of differentiation of statistical functionals goes back to von Mises ([von Mises (1937)], [von Mises (1947)]) and Filippova ([Filippova (1961)]); one says that T is a von Mises functional, with first kernel function ψ.

At this point the practical meaning ofψ(x) is not yet evident, as it appears only implicitly in (2.4). Following Hampel an explicit expression may be obtained by putting G=δx in (2.4). We then get Definition 1 of [Hampel et al. (1986)] p. 84, for an influence function ψ(x):

Definition 2.6. The influence function ψ(x) of T at F is given by ψ(x;T, F) = lim

t→0

T((1−t)F +tδx)−T(F)

t (2.5)

in those x where the limit exists.

The heuristically important interpretation of this form was first pointed out by Hampel in [Hampel (1968)]: ψ(x) gives the suitably scaled differential influence of one additional observation with valuex, if the sample sizen→ ∞. Therefore, Hampel called it the influ-ence curve (IC). Huber notes in this context that there indeed are pathological cases where the influence curve exists, but not the Gˆateaux derivative, confer [Huber (1997)]. There-fore we use a more elegant definition of the influence function given by [Rieder (1994)].

To be able to give the definition we need the concept of L2 differentiability. To avoid domination assumptions in the definition of L2 differentiability, we employ the following square root calculus that was introduced by Le Cam. The following definition is taken from [Rieder (1994)]; for more details confer Subsection 2.3.1 of [Rieder (1994)].

Definition 2.7. For any measurable space (Ω,A)and k ∈Nwe define the following real Hilbert space that includes the ordinaryLk2(P)

Lk2(A) ={ξ√

dP |ξ∈Lk2(P), P ∈ Mb(A)} (2.6) On this space, an equivalence relation is given by

ξ with real coefficients and a scalar product by

αξ√

Then for fixed θ ∈ Θ we define L2 differentiability of the family P atθ using the square root calculus; confer Definition 2.3.6 of [Rieder (1994)].

Definition 2.8. Model P is called L2 differentiable at θ if there exists some function Λθ ∈Lk2(Pθ) such that, as t→0,

The function Λθ is called the L2 derivative and the k×k matrix Iθ Fisher Information of P at θ.

The following definition corresponds to Definition 4.2.10 of [Rieder (1994)].

Definition 2.9. Suppose P is L2 differentiable at θ, and assume some matrix D∈Rp×k of full rank p≤k. Let α = 2,∞, respectively.

(a) Then the set Ψ2(θ) of all square integrable and the subset Ψ(θ) of all bounded influence curves at Pθ , respectively, are

Ψα(θ) =

ψθ ∈Lk2(Pθ)|Eθψθ = 0, EθΛτθ =Ik (2.12) (b) The set ΨD2(θ) of all square integrable and the subset ΨD(θ) of all bounded, partial influence curves at Pθ, respectively, are

ΨDα(θ) =

ψθ ∈Lp2(Pθ)|Eθψθ = 0, EθΛτθ =D (2.13) For the sake of completeness we add Remark 4.2.11 of [Rieder (1994)] parts (a) to (c):

Remark 2.10. (a) The attribute square integrable will usually be omitted.

(b) The classical scores and the classical partial scores,

ψh,θ =Iθ−1Λθ ∈Ψ2(θ) (2.14) ηh,θ =Dψh,θ =DIθ−1Λθ ∈ΨD2 (θ) (2.15)

are always ICs, respectively, partial ICs, at Pθ.

(c)The definition of Ψ2(θ) and Ψ(θ) requiresIθ 0, and Λθnondegenerate in the sense that, for all t∈Rk,

tτΛθ = 0 a.e. Pθ =⇒ t= 0 (2.16)

For questions of existence of (square integrable) partial ICs we cite Lemma 1.1.3 from [Kohl (2005)], which gives a necessary and sufficient condition.

Lemma 2.11. It holds

ΨD2(θ)6=∅ ⇔ ∃A∈Rp×k :D=AIθ ⇔kerIθ ⊂kerD (2.17) Proof. Lemma 1.1.3 in [Kohl (2005)].

M. Kohl adds two further remarks on this lemma, from which we only cite part a):

Remark 2.12. (a) The previous lemma shows that we do not necessarily need Iθ 0 for the existence of partial ICs. But, sincerank(D) =p, it has to holdrank(AIθ) = p where

rank(AIθ) = rank(Iθ)−dim(C(Iθ)∩ N(A)) (2.18) with C(Iθ) the column space of Iθ and N(A) the null space of A; confer Theorem 17.5.4 of [Harville (1997)]. Consequentially, the Fisher information Iθ at least has to have rank p.

Definition 2.9 turns out be be very useful in the context of robust asymptotic statistics as most proofs of asymptotic normality in the i.i.d. case head for an estimator expansion, in which ICs canonically occur as summands. This leads to the framework of Asymptotic Statistics in the next section.