Bandwidth selection and Graphical Representation

1.1 Multivariate Kernel Density Estimation

1.1.2 Bandwidth selection and Graphical Representation

The problem of an automatic, data-driven choice of the bandwidth

H

is of great impor-tance in the multivariate case. In one or two dimensions we may choose an "appropriate"

bandwidth interactively by looking at the sequence of density estimates for dierent band-widths. But how can this be done in three, four or more dimensions? The problem of graphical representation arises, which we address next.

Theoretically the bandwidth selection problem can be handled as in the one{dimensional case. Typically, one searches for a global bandwidth

H

or a local bandwidth

H

⁽^t^{). Two}

approaches are frequently used in both cases

plug{in bandwidths, in particular "rule{of{thumb" bandwidths,

resampling methods, in particular cross{validation and bootstrap.

We will introduce generalizations for Silverman's rule{of{thumb and least squares cross{

validation to stress the analogy with the one{dimensional bandwidth selectors.

Rule{of{thumb Bandwidth

Rule{of{thumb bandwidth selection provides a formula arising from a reference distribution. Obviously, the pdf of a multivariate normal distri-bution Nq(

) is a good candidate for a reference distribution in the multivariate case.

Suppose that the kernel ^K is Gaussian, i.e. the pdf of Nq(0q

I

^q). Note that ²(^K) = 1 and ^{kK k}²² = 2^;^q^;^q=² in this case. Hence, from (14) and the fact that

Z tr^f

H

^T^Hf(t)

H

^g]²dt= 1

2^q⁺²^q=²det(

)¹⁼²

h2tr(

H

^;1

H

)²+^ftr(

H

^;1

H

)^g²ⁱ we can easily derive rule{of{thumb formulae for dierent assumptions on

H

^and

In the simplest case, i.e. that we consider

H

^and

to be diagonal matrices

H

⁼

diag(h¹:::hq) and

^{= diag(}¹²^:::q²), this leads to

ehj =

4 q+ 2

1=⁽q⁺⁴⁾

n^;1⁼⁽^q⁺⁴⁾j: (15) Note that this formula coincides with Silverman's rule{of{thumb in the case q = 1, see Silverman (1986, p. 45). Replacing the j's by estimates and noting the rst factor is always between 0.924 and 1.059, we arrive at Scott's rule

hbj =n^;1⁼⁽^q⁺⁴⁾^bj (16) see Scott (1992, p. 152).

It is dicult to derive the rule{of{thumb for general

H

^and

. However, (15) shows that it might be a good idea to choose the bandwidth matrix

H

proportional to

¹⁼²^{. In}

this case we get as generalization of Scott's rule

H

⁼ⁿ^;1⁼⁽^q⁺⁴⁾

^b¹⁼²^: ⁽¹⁷⁾

We remark that this rule is equivalent to apply a Mahalanobis transformation on the data (to transform the estimated covariance matrix to identity), then to compute the kernel estimate with equal bandwidthsh=n¹⁼⁽^q⁺⁴⁾ and nally to retransform the estimated pdf back to the original scale.

But before we go on with applications, let us consider what we can do, if we want to use a kernel dierent from the Gaussian. The idea of canonical kernels by Marron and Nolan (1988) can be easily extended to the multivariate case. Consider a kernel ^K and all equivalent kernel functions ^K = ^;1^K(=) with 0. Although that is a scalar, it is working on q{variates arguments of ^K. Now we have ^kK^k²² =^;^q^{kK k}²² and ²(^K) =²²(^K). As in the one{dimensional case we choosesuch that the bias-variance tradeo in AMISE(

H

^K) is independent of ^K. This yields

²²(^K⁰) =^kK⁰^k²² ⁽⁾ ⁰ =

(

kK k 2

²²(^K2)

)

1=⁽q⁺⁴⁾

⁰ again is called canonical bandwidth of the kernel ^K. Denote now ^K^A a kernel function with canonical bandwidth ⁰^A and ^K^B a kernel function with canonical bandwidth ⁰^B. Suppose we have used

H

^A with kernel ^K^A and we want to recompute the kernel density estimate with kernel ^K^B. Then it holds

AMISE(

H

A^K^A) AMISE(

H

B^K^B)

H

B = ⁰^B

⁰^A

H

A (18)

which allows to adjust bandwidths for dierent kernel as in the one{dimensional case.

Let us consider an example. Suppose we want to use the product Quartic kernel ^K^Q instead of the q-dimensional Gaussian ^K^G which is faster in direct computation because of its compact support on ^;11]. Which is the equivalent rule{of{thumb to (17) in this case? Here we have ⁰^G = ^f1=(2^p)^g^q=⁽^q⁺⁴⁾ and ⁰^Q = (495^q=7^q)¹⁼⁽^q⁺⁴⁾ which gives the canonical bandwidths in Table 1 for dimensions q= 1:::5.

The fourth column of Table 1 gives the factor which the rule{of{thumb bandwidth matrix in (17) needs to be multiplied with to obtain the rule{of{thumb bandwidth for the multiplicative Quartic kernel. Of course all rule{of{thumb bandwidths for other kernel functions can be calculated in a similar way.

q ⁰^G ⁰^Q ⁰^Q=⁰^G 1 0.7764 2.0362 2.6226 2 0.6558 1.7100 2.6073 3 0.5814 1.5095 2.5964 4 0.5311 1.3747 2.5883 5 0.4951 1.2783 2.5820

Table 1: Bandwidth adjusting factors for Gaussian and multiplica-tive Quartic Kernel for dierent dimensionsq.

For a product kernel ^K holds ²(^K) = ²(K) and ^{kK k}² =^kK^k^q² when K denotes the corresponding univariate kernel. A table of values ²(K), ^kK^k²² can be found in Hardle (1991, p.239) for example.

Principally, all plug{in methods for the one{dimensional kernel density estimation can be extended to the multivariate case. See Wand and Jones (1994) for details on multivariate plug{in bandwidth selection.

Cross{validation

As we mentioned before, the cross{validation method is fairly inde-pendent of the special structure of the parameter or function estimate. Considering the bandwidth choice problem, cross{validation techniques allow to adapt to a wider class of density functionsfthan the rule{of{thumb approach. (Remember that the rule{of{thumb bandwidth is optimal for the reference pdf, hence it may fail for multimodal densities for instance.)

Recall, that in contrast to the rule{of{thumb approach, least squares cross{validation for density estimation aims to estimate the ISE optimal bandwidth. Here we approximate the integrated squared error

ISE(

H

^{) =} ^Z ^f^f^b^H⁽^t⁾^;^f⁽^t⁾^g²^dt

= ^Z f^b^H²(t)dt^;2^Z f^b^H(t)f(t)dt+^Z f²(t)dt: (19) Apparently, this is the same formula as in the the one{dimensional case and with the same arguments the last term of (19) can be ignored. The rst term again can be easily calculated from the data. Hence, only the second term of (19) is unknown and has to be estimated. However, observe that ^R f^b^H(t)f(t)dt = Ef^b^H(T), where the only new aspect now is thatT is q{dimensional. As in the one{dimensional case we estimate this term by a leave{one{out estimator

Efd^b^H(T) = 1n

i⁼¹

fb^H^;i(Ti) 10

where

fb^H^;i(t) = 1n^;1

i⁶⁼jj⁼¹

H(Tj ^;t):

This yields the multivariate cross{validation criterion as a straightforward generalization of CV in the one{dimensional case:

CV(

H

) = 1

n²det(

H

⁾ ⁿ

i⁼¹ n

j⁼¹^K?^Kⁿ

H

^;1(Tj ^;ti)^o^; 2 n(n^;1)

i⁼¹ n

j⁼¹ j⁶⁼i

H(Tj ^;Ti): The diculty comes in by the fact that the bandwidth is now a qq matrix

H

. In the most general case, this means, we have to minimize overq(q+1)=2 parameters. Still, if we assume

H

to be a diagonal matrix, this remains a q{dimensional optimization problem.

This holds as well for other cross{validation approaches. Multivariate resampling methods for bandwidth selection are discussed in more detail in Sain, Baggerly and Scott (1994).

Graphical Representation

Consider now the problem to graphically display a multi-variate density estimate. Assume rst q = 2. Here we are still able to show the density estimate in a 3-dimensional plot. This is in particular useful if the estimated function can be rotated on the computer screen interactively. For a two-dimensional presentation a contour plot gives often more insight to the structure of the data.

In the following, we will use the credit data from Fahrmeir and Hamerle (1984), Fahrmeir and Tutz (1994) for illustration. This data set consists of n = 1000 clients, 700 paid a credit back without problems, 300 did not. Among a number of categori-cal variables (running account, previous credits, purpose, personal attributes etc.) three continuous variables are available: duration and amount of credit as well as age.

Figures 3, 4 (upper panels) display a two-dimensional density estimate fbh(t) =f^bh(t¹t²)

for log(duration, log(amount and log(amount, log(age, respectively. We use the subscript h to indicate that we used a diagonal bandwidth matrix

H

^{= diag(}h¹h²).

Additionally, Figures 3, 4 (lower panels) gives contour plots of these density estimates.

It is easily observed, that both distributions are rather symmetric. This is due to the logarithmic transformation. In the duration direction a typical bimodal structure can be recognized. This slightly reproduces in the amount direction. Obviously, both variables are related with positive correlation.

Here, the bandwidth was chosen accordingly to the general \rule{of{thumb" (17), which tends to oversmooth multimodal structures of the data. In fact, the durations of credits are multiples of 6 months in most case. The two clear modes that we observe are those for durations 12 and 24 months. In all applications of this paper we use the

1.5 2.0 2.5 3.0 3.5 4.0

6.0 7.0 8.0 9.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

X: duration Y: amount

Z: density estimate (*10 -1 )

Density: duration & amount

1.5 2.0 2.5 3.0 3.5 4.0

duration

6.06.57.07.58.08.59.09.5

amount

Contours: duration & amount

Figure 3: Two{dimensional density estimate (upper panel) and den-sity contours (lower panel) for duration and amount. h¹ = 0:48, h² = 0:64. Credit data, Fahrmeir and Hamerle (1984).

6.0 7.0 8.0 9.0

3.0 3.2 3.4 3.83.6 4.0 4.2

0.0 1.0 2.0 3.0 4.0 5.0 6.0

X: amount Y: age

Z: density estimate (*10 -1 )

Density: amount & age

6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5

amount

3.03.23.43.63.84.04.2

age

Contours: amount & age

Figure 4: Two{dimensional density estimate (upper panel) and density contours (lower panel) for amount and age. h¹ = 0:64, h² = 0:25. Credit data, Fahrmeir and Hamerle (1984).

1.5 2.0

2.5 3.0

3.5 Y 4.0

6.0 7.0 8.0 9.0

3.0 3.2 3.4 3.6 3.8 4.0 4.2

X: duration Y: amount Z: age

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.02.04.06.08.010.0

Level

Figure 5: Three{dimensional density contours for duration, amount and age. h¹ = 0:56, h² = 0:75, h³ = 0:29. Credit data, Fahrmeir and Hamerle (1984).

Quartic (Biweight) product kernel. Recall that the the univariate Quartic kernel isK(u) = 0:9375(1^;u²)²I(^ju^j1).

For three{dimensional density estimates, it is always possible to hold one variable xed and to plot the density function only in dependence of the other variables. Alternatively, we can again plot contours of the density estimate, which now mean three-dimensional surfaces. Figure 5 shows this for the credit scoring variables. In the original version of this plot, red, green and blue surfaces show the values of the density estimate at the levels (in percent) indicated on the right. Colors and the possibility to rotate the contours on the computer screen eases the exploration of the data structures a lot. Of course, we are restricted to two-dimensional plots here. However, one can clearly recognize the ellipsoidal structure of the contour which indicates a relatively symmetric distribution.

Im Dokument Multivariate and Semiparametric Kernel Regression (Seite 8-14)