• Keine Ergebnisse gefunden

The Empirical Saddlepoint Approximation for GMM Estimators

N/A
N/A
Protected

Academic year: 2022

Aktie "The Empirical Saddlepoint Approximation for GMM Estimators"

Copied!
63
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Munich Personal RePEc Archive

The Empirical Saddlepoint

Approximation for GMM Estimators

Sowell, Fallaw

Carnegie Mellon University

July 2006

Online at https://mpra.ub.uni-muenchen.de/3356/

MPRA Paper No. 3356, posted 30 May 2007 UTC

(2)

The Empirical Saddlepoint Approximation for GMM Estimators

Fallaw Sowell

May 2007 (first draft June 2006)

Abstract

The empirical saddlepoint distribution provides an approximation to the sampling distributions for the GMM parameter estimates and the statistics that test the overidentifying restrictions. The empirical saddlepoint distribu- tion permits asymmetry, non-normal tails, and multiple modes. If identification assumptions are satisfied, the empirical saddlepoint distribution converges to the familiar asymptotic normal distribution. In small sample Monte Carlo sim- ulations, the empirical saddlepoint performs as well as, and often better than, the bootstrap.

The formulas necessary to transform the GMM moment conditions to the estimation equations needed for the saddlepoint approximation are provided.

Unlike the absolute errors associated with the asymptotic normal distributions and the bootstrap, the empirical saddlepoint has a relative error. The relative error leads to a more accurate approximation, particularly in the tails.

KEYWORDS: Generalized method of moments estimator, test of overidenti- fying restrictions, sampling distribution, empirical saddlepoint approximation, asymptotic distribution.

Tepper School of Business, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213. Phone: (412)-268-3769. fs0v@andrew.cmu.edu

Helpful comments and suggestions were provided by Don Andrews, Roni Israelov, Steven Lugauer and seminar participants at Carnegie Mellon Univer- sity, University of Pittsburgh, Concordia, Yale, University of Pennsylvania and the 2006 Summer Meetings of the Econometric Society in Minneapolis. This research was facilitated through an allocation of advanced computing resources by the Pittsburgh Supercomputing Center through the support of the National Science Foundation.

(3)

Contents

1 INTRODUCTION 3

2 MODEL AND FIRST ORDER ASYMPTOTIC ASSUMPTIONS 6

3 MOMENT CONDITIONS TO ESTIMATION EQUATIONS 8 3.1 NOTES . . . 10

4 THE SADDLEPOINT APPROXIMATION 13

4.1 DENSITY RESTRICTIONS . . . 14 4.2 ADDITIONAL ASSUMPTIONS . . . 15 4.3 ASYMPTOTIC NORMAL AND EMPIRICAL SADDLEPOINT APPROX-

IMATIONS . . . 23

5 BEHAVIOR IN FINITE SAMPLES 26

6 ON THE SPECTRAL DECOMPOSITION 33

7 FINAL ISSUES 34

7.1 IMPLEMENTATION AND NUMERICAL ISSUES . . . 34 7.2 EXTENSIONS . . . 35

8 APPENDIX 36

8.1 THEOREM 1: ASYMPTOTIC DISTRIBUTION . . . 36 8.2 THEOREM 2: INVARIANCE OF ASYMPTOTIC DISTRIBUTION . . 39 8.3 THEOREM 3: THE SADDLEPOINT APPROXIMATION . . . 41 8.4 THEOREM 4: EMPIRICAL SADDLEPOINT APPROXIMATION . . . 44 8.5 THEOREM 5: EMPIRICAL SADDLEPOINT DENSITY STRUCTURE 47 8.6 THEOREM 6: INVARIANCE OF SADDLEPOINT APPROXIMATION 50 8.7 THEOREM 7: DERIVATIVE OF THE DECOMPOSITION . . . 55 8.8 A DIFFERENTIAL GEOMETRY PERSPECTIVE . . . 57 8.9 STEPS TO A SADDLEPOINT APPROXIMATION . . . 58

9 REFERENCES 60

(4)

1

INTRODUCTION

The empirical saddlepoint density provides an approximation to the sampling distri- bution for parameters estimated with the generalized method of moments and the statistics that test overidentifying restrictions. Traditional GMM (Hansen (1982)) relies on a central limit theorem. The saddlepoint approach uses a generalization of the central limit theorem and provides a more accurate approximation.

There are three key differences between the results using the commonly first order normal approximation and the saddlepoint approximation. First, the parameters and the statistics that test the overidentifying restrictions are no longer forced to be inde- pendent. The asymptotic normal approximation uses a linear approximation that can be orthogonally decomposed into the identifying space and the overidentifying space, see Sowell (1996). The asymptotic normal approximation inherits the orthogonality from the tangent space; hence, ˆθ and theJ statistic are independent. In contrast, the saddlepoint uses a different approximation at each parameter value. Each approxi- mation generates an orthogonally decomposed tangent space. However, the union of these approximations is not required to give an orthogonal decomposition over the entire space. This lack of independence is a common feature in approximations with higher orders of accuracy. In Newey and Smith (2004) and Rilstone, Srivastava and Ullah (1996), the higher order bias for the parameter estimates involve the parameters for the statistics that test the overidentifying restrictions and vice versa. The lack of independence means the marginal densities are no longer equal to the conditional densities. As simulations in section 5 demonstrate, it is informative to report both the marginal density and the conditional densities (i.e. the sampling distribution for the parameters conditional on the overidentifying restrictions being satisfied).

The second difference is the sampling density does not have to be unimodal. If the GMM objective function has a unique local minimum, then the saddlepoint approxi- mation will be unimodal. However, if the GMM objective function has multiple local minima, then the saddlepoint approximation can have multiple modes and confidence regions can be disjoint. The first order normal approximation using the location and convexity of the global minimum will provide misleading inference concerning the parameter values associated with the other local minima. However, the saddlepoint approximation can be interpreted as the density for the location of local minima of the GMM objective function and will more accurately summarize the available infor- mation. These issues are not new to Econometrics. The s-sets of Stock and Wright

(5)

(2000) can also lead to disjoint confidence intervals.

The third difference is that the saddlepoint approximation has a relative error instead of the absolute error that occurs with the first order normal approximation.

When the underlying distribution is unknown and the empirical distribution is used, the empirical saddlepoint approximation results in a relative error of order1 N1/2. The relative error results in an improvement over the absolute error in the tails of the distribution. This is important in the calculation of p−values. If f(α) denotes the density being approximated, the asymptotic approximation can be written

A(α) =f(α) +O¡

N1/2¢ . The empirical saddlepoint approximation can be written

S(α) =f(α)©

1 +O¡

N1/2¢ª .

When there is a relative error, the error gets small with the density in the tails of the distribution. When the error is absolute, the same error holds in the tails as at the modes of the density. Hall and Horowitz (1996) shows that the bootstrap approximation results in an absolute error of order N1. Hence, theoretically, neither the empirical saddlepoint approximation nor the bootstrap approximation dominate.

The first contribution of this paper is a new procedure to use Economic moment conditions (m moment conditions to estimatek parameters) to create a set of m es- timation equations in m unknowns. The m equations permit the calculation of the empirical saddlepoint distribution. The second contribution is the extension of the saddlepoint approximation to include statistics to test the validity of the overidenti- fying restrictions. Previous work only focused on the parameter estimates and testing hypotheses concerning parameters. The third contribution is a saddlepoint approx- imation that allows for models and samples where the GMM objective function has multiple local minima. Multiple local minima are common for GMM objective func- tions, see Dominguez and Lobato (2004). Previous saddlepoint distribution research restricted attention to a unique local minimum. The final contribution is the interpre- tation of the saddlepoint density as the joint density for the parameter estimates and

1When the underlying distribution is known, the relative error from the saddlepoint approxi- mation is typically of order N1. In some special cases, the relative error can be of order N3/2. These dramatic improvements cannot be realized with empirical applications in Economics when the underlying distribution is unknown.

(6)

the statistics that test the validity of the overidentifying restrictions. In finite sam- ples these will not be independent. The saddlepoint distribution gives a well defined density that accounts for this dependence.

The saddlepoint approximation’s history in the Statistics literature starts with Es- scher (1932) and Daniels (1954). General introductions are available in Reid (1988), Goutis and Casella (1999) and Huzurbazar (1999) and the books Field and Ronchetti (1990), Jensen (1995) and Kolassa (1997). The saddlepoint distribution was first used to approximate the sampling distribution of the sample mean when the underlying distribution was known, e.g. Phillips (1978). It was then generalized to maximum likelihood estimators. Field (1982) gives the saddlepoint approximation to the den- sity for parameter estimates defined as the solution to a system of equations where the underlying distribution is known. Ronchetti and Welsch (1994) presents the em- pirical saddlepoint distribution where the known distribution is replaced with the empirical distribution. In Economic applications the underlying distribution is typi- cally unknown so the empirical saddlepoint distribution is appropriate. If there exists a unique local minimum, this empirical saddlepoint approximation is an alternative approximation to the GMM sampling density. Field (1982), Ronchetti and Welsch (1994) and Almudevar, Field and Robinson (2000) are the natural bridges to the results in this paper.

The closest paper to the current research is Ronchetti and Trojani (2003), denoted RT. RT shows how to go from the GMM moment conditions to estimation equations used in a saddlepoint approximation. RT also extends the parameter testing results from Robinson, Ronchetti and Young (2003) to situations where the moment condi- tions are overidentified.

This paper takes a different approach to the saddlepoint approximation that re- sults in a system with only m equations. While the procedure presented in RT uses

¡m

2 +k¢

(m + 1) equations. In this paper, all the parameters are tilted instead of only a subset. The resulting saddlepoint density approximates the joint density of the parameter estimates and statistics that test the validity of the overidentifying restrictions.

The next section reviews basic GMM notation and the assumptions necessary to obtain first order asymptotic results. Section 3 is the derivation of the estimation equations from the GMM moment conditions. Section 4 presents the saddlepoint ap- proximation including extensions needed to apply the saddlepoint approximation to the estimation equations built on moment conditions. Section 4 ends with an expla-

(7)

nation of how the first order normal approximation is related to the empirical saddle- point approximation. Section 5 presents Monte Carlo simulations, which demonstrate the small sample performance of the empirical saddlepoint approximation. Situations where the empirical saddlepoint approximation is superior to currently available alter- natives are noted. Section 6 shows that the saddlepoint approximation is well defined using the estimation equations presented in section 3. Section 7 concludes with addi- tional interpretations, implementation issues, and directions for future work.

The random vector x will be defined on the probability space (Ω,B, F). Let a vector’s norm be denoted kxk ≡ √

xx. Let a matrix’s norm be denoted kMk ≡ supkM xk/kxk. Aδball centered at xwill be denoted, Bδ(x)≡ {y:kx−yk< δ}for δ >0. Let m(·) denote the Lebesgue measure.

2

MODEL AND FIRST ORDER ASYMPTOTIC ASSUMPTIONS

The saddlepoint approximation requires stronger assumptions than those necessary for the first order normal approximation. The basic assumptions for first order asymptotic normality are recorded for the basic model.

Consider an m−dimensional set of moment conditions gi(θ)≡g(xi, θ)

whereθis ak−dimensional set of parameters withk ≤m. According to the Economic theory, the moment conditions have expectation zero at the population parameter value, i.e.

E[g(xi, θ0)] = 0.

The error process driving the system is assumed to be iid.

Assumption 1 . The p−dimensional series xi is iid from a distribution F(x).

The functions that create the moment conditions satisfy regularity conditions.

Assumption 2. g(x, θ) is continuously partially differentiable in θ in a neighborhood of θ0. The functions g(x, θ) and ∂g(x,θ)∂θ are measurable functions of x for each θ ∈Θ, and Eh

supθΘk∂g(x,θ)∂θ ki

<∞. E£

g(x, θ0)g(x, θ0

< ∞ and E[supθΘkg(x, θ)k] <

∞. Each element of g(x, θ) is uniformly square integrable.

Assumption 3. The parameter space Θ is a compact subset of Rk. The population parameter value θ0 is in the interior of Θ.

(8)

To obtain an estimate from a finite sample, the weighted inner product of the sample analogue of the moment condition

GN(θ) = 1 N

XN i=1

g(xi, θ)

is minimized conditional on,WN, a given symmetric positive definite weighting matrix θˆ= argmin

θ∈Θ GN(θ)WNGN(θ).

The foundation for the first order asymptotic results is a central limit theorem distri- butional assumption.

Assumption 4. The moment conditions evaluated at the population parameter values satisfy the central limit theorem

√N GN0)∼N(0,Σg).

Attention will be restricted to GMM estimates with minimum asymptotic variance.

Assumption 5. The weighting matrix is selected so

WN →Σg1.

Assumption 5 is usually satisfied by performing a first round estimation using an identity matrix as the weighting matrix.

Assumption 6. The matrix M(θ) ≡ Eh

∂g(xi,θ)

∂θ

i

has full column rank for all θ in a neighborhood of θ0. The matrix Σ(θ)≡E£

g(xi, θ)g(xi, θ)¤

is positive definite for all θ in a neighborhood of θ0.

The parameter can be identified from the moment conditions.

Assumption 7. Only θ0 satisfies E[g(xi, θ)] = 0.

These assumptions are sufficient to ensure that the GMM estimator is a root-N consistent estimator of θ0. The GMM estimates are asymptotically distributed as

√N³ θˆ−θ0

´

∼N³ 0,¡

M(θ0)Σg1M(θ01´ .

(9)

The Economic theory implies all m moment conditions should equal zero. The first order conditions

MN³ θˆ´

WNGN³ θˆ´

= 0

setk of the moments to exactly zero, whereMN(θ) = ∂G∂θN(θ). The remaining (m−k) moments can be used to test the Economic theory. The statistic

J =N GN

³θˆ´

WNGN

³θˆ´

tests these overidentifying restrictions. The J−statistic is asymptotically distributed χ2mk when the null hypothesis of the Economic theory being true is correct.

3

MOMENT CONDITIONS TO ESTIMATION EQUATIONS

The saddlepoint approximation uses a just identified system of estimation equations.

This section shows how a just identified system of equations can be created from the moment conditions implied by the Economic theory. The estimation equations are created by augmenting the parameters with a set of (m−k) parameters denoted λ.

Theλparameters are a local coordinate system2, spanning the space of overidentifying restrictions. The λ parameters are selected so that under the null hypothesis of the Economic theory being true, the population parameter values are λ0 = 0. Thus, the overidentifying restrictions can be tested by the hypothesis H0 :λ = 0.

For each value ofθthe sample moment conditionsGN(θ) is anm−dimensional vec- tor. Asθ takes different values, the moment conditions create a k−dimensional man- ifold. For a fixed value ofθ the space spanned by the derivative of thek−dimensional manifold will be called the identifying space. The orthogonal complement of the identifying space is called the overidentifying space. This decomposition is a gener- alization of the decomposition used in Sowell (1996) where the tangent space at ˆθ was decomposed into ak−dimensional identifying space and an (m−k)−dimensional space of overidentifying restrictions. The generalization is defining the decomposition at each value3 of θ, not only at ˆθ.

For each value of θ, let MN(θ) denote the derivative of GN(θ) with respect to θ

2An introduction to the use of local coordinate systems can be found in Boothby (2003). Appli- cation of these tools in Statistics and Econometrics can be found in Amari (1985) and Marriott and Salmon (2000).

3When attention is restricted to the empirical saddlepoint approximation, then the decomposition only needs to exist for parameters in neighborhoods of the local minima.

(10)

scaled (standardized) by the Cholesky decomposition of the weighting matrix, MN(θ) =WN1/2∂GN(θ)

∂θ .

Using this notation, the GMM first order conditions can be written MN(ˆθ)WN1/2GN(ˆθ) = 0.

The columns ofMN(ˆθ) define thek linear combinations used to identify and estimate θ.

The orthogonal complement of the space spanned by the columns of MN(ˆθ) will be the (m −k)−dimensional space used to test the validity of the overidentifying restrictions and will be spanned by λ. The augmenting parameters are determined by performing the decomposition for every value of θ. Denote the projection matrix for the space spanned by MN(θ) as

PM(θ),N =MN(θ)³

MN(θ)MN(θ)´1

MN(θ).

PM(θ),N is a real symmetric positive semidefinite matrix, which is also idempotent.

Denote a spectral decomposition4 PM(θ),N =CN(θ)ΛCN(θ) =h

C1,N(θ) C2,N(θ) i"

Ik 0 0 0(mk)

# "

C1,N(θ) C2,N(θ)

#

where CN(θ)CN(θ) = Im. The column span of C1,N(θ) is the same as the column span of MN(θ), and the columns of C2,N(θ) span the orthogonal complement at θ.

Hence, for each value ofθ, them−dimensional space containing GN(θ) can be locally parameterized by

ΨN(θ, λ) =

"

C1,N(θ)WN1/2GN(θ) λ−C2,N(θ)WN1/2GN(θ)

# .

The first set of equations are the k−dimensions of WN1/2GN(θ) that locally vary with θ. The parameters θ are local coordinates for thesek−dimensions. The second set of equations gives the (m−k)−dimensions ofWN1/2GN(θ) that are locally orthogonal to

4The spectral decomposition is not unique, raising a potential concern. However, the invariance of inference with respect to alternative spectral decompositions is documented in Theorem 2and Theorem 6.

(11)

θ. The parameters λ are local coordinates for these (m−k)−dimensions. For each value ofθ, the parameters λspan the space that is the orthogonal complement of the space spanned by θ.

This parameterization of the m−dimensional space can be used to obtain para- meter estimates by solving

ΨN(θ, λ) = 0.

This set of estimation equations will be used in the saddlepoint approximation. This function can also be written ΨN(θ, λ) = N1 PN

i=1ψ(xi, θ, λ) where ψ(xi, θ, λ) =

"

C1,N(θ)WN1/2gi(θ) λ−C2,N(θ)WN1/2gi(θ)

# .

A generic value of ψ(xi, θ, λ) will be denoted ψ(x, θ, λ). These estimation equations give a just identified system of m equations in m unknowns. The moment condi- tions ψ(x, θ, λ) summarize the first order conditions for GMM estimation and the overidentifying restrictions statistics.

The column span ofC1,N(θ) andMN(θ) are the same. So, the system of equations C1,N(θ)WN1/2GN(θ) = 0

is equivalent to

MN(θ)WN1/2GN(θ) = 0,

and both imply the same parameter estimates, ˆθ. This system of equations is solved independently of λ. The system of equations λ−C2,N(θ)WN1/2GN(θ) = 0 imply the estimate

ˆλ=C2,N

³θˆ´

WN1/2GN

³θˆ´ .

An alternative differential geometric explanation of the estimation equations is presented in the Appendix section 8.8.

3.1

NOTES

1. The inner product of ΨN(θ, λ) will be called the extended-GMM objective func- tion. The estimates are the parameters that minimize the inner product, which

(12)

can be rewritten

QN(θ, λ) = ΨN(θ, λ)ΨN(θ, λ)

= GN(θ)WNGN(θ)−2λC2,N(θ)WN1/2GN(θ) +λλ.

Because ΨN(θ, λ) is a just identified system of equations there is no need for an additional weighting matrix. When λ = 0 the extended-GMM objective func- tion reduces to the traditional GMM objective function. The extended-GMM objective function is created from the traditional GMM objective function by appending a quadratic function in the tangent spaces of overidentifying restric- tions.

2. The parameter estimates using the extended-GMM objective function agree with traditional results.

Theorem 1. IfAssumptions 1-7are satisfied, the asymptotic distribution for the extended-GMM estimators is

√N

à θˆ−θ0

λˆ

!

∼N Ã"

0 0

# ,

" ¡

M(θ0)Σg1M(θ01

0

0 I(mk)

#!

.

Proofs are in the appendix.

3. The estimates ˆθ and ˆλ are independent up to first order asymptotics.

4. The overidentifying restrictions can be tested with

Nλˆλˆ = N GN(ˆθ)WN1/2 C2,N(ˆθ)C2,N(ˆθ)WN1/2GN(ˆθ)

= N GN(ˆθ)WN1/2 PMθ)WN1/2GN(ˆθ)

= N GN(ˆθ)WN1/2 WN1/2GN(ˆθ)

= N GN(ˆθ)WNGN(ˆθ)

= J.

5. The asymptotic inference is invariant to the selection of the spectral decompo- sition.

Theorem 2. For the extended-GMM estimators the asymptotic distribution for

(13)

the parameters, θ, and the parameters that test the overidentifying restriction, λ, are invariant to the spectral decomposition that spans the tangent space.

6. To reduce notation let α ≡ h

θ λ i

. A generic element ψ(xi, α) will be de- notedψ(x, α). The estimation equations used in the saddlepoint approximation can be denoted

ΨN(α) = 0.

The extended-GMM objective function will be denoted QN(α) = ΨN(α)ΨN(α).

7. The point of departure for the saddlepoint approximation is a system of m estimation equations in m parameters. These will be the first order conditions from minimization of the extended-GMM objective function.

The first order conditions imply

∂QN(α)b

∂α = 2∂ΨN(α)b

∂α ΨN(α) = 0.b

Assuming M(θ) and WN are full rank implies the first order conditions are equivalent to

ΨN(α) = 0.b

8. A minimum or root of the extended-GMM objective function can be associated with either a local maximum or a local minimum of the original GMM objective function. Attention must be focused on only the minima of the extended-GMM objective function associated with the local minima of the original GMM objec- tive function. This will be done with an indicator function

Ipos(QN(θ)) =

( 1, if QN(θ) is positive definite 0, otherewise.

This indicator function uses the original GMM objective function not the extended- GMM objective function.

(14)

4

THE SADDLEPOINT APPROXIMATION

The saddlepoint density replaces the central limit theorem in the traditional GMM distribution theory. The central limit theorem uses information about the location and convexity of the GMM objective function at the global minimum. The saddlepoint approximation uses information about the convexity of the objective function at each point in the parameter space. The central limit theorem is built on a two-term Taylor series expansion, i.e. a linear approximation, of the characteristic function about the mean. A higher order Taylor series expansion about the mean can be used to obtain additional precision. This results in an Edgeworth expansion. Because the expansion is at the distribution’s mean, the Edgeworth expansion gives a significantly better approximation at the mean of the distribution, O(N1) versus O¡

N1/2¢ . Unfortunately, the quality of the approximation can deteriorate significantly for values away from the mean. The saddlepoint approximation exploits these characteristics of the Edgeworth expansion to obtain an improved approximation. Instead of a single Taylor series expansion, the saddlepoint uses multiple expansions to obtain improved accuracy, one expansion at every value in the parameter space.

The significantly improved approximation of the Edgeworth expansion only occurs at the mean of the distribution. To obtain this improvement at an arbitrary value in the parameter space, a conjugate distribution is used. For the parameter value α the conjugate density is

hN,β(x) =

expn

β

Nψ(x, α)o dF(x) R exp©β

Nψ(w, α)ª

dF(w).

The object of interest is the distribution of Ψ(α) not an individual element ψ(α), hence the parameter β is normalized byN.

At the parameter value of interest, α, the original distribution is transformed to a conjugate distribution. The conjugate distribution is well defined for arbitrary values of β. This is a degree of freedom, i.e., β can be selected optimally for a given α.

A specific conjugate distribution is selected so its mean is transformed back to the original distribution at the value of interest. Thus, if β is selected to satisfy the saddlepoint equation

Z

ψ(x, α) exp

½β

Nψ(x, α)

¾

dF(x) = 0.

(15)

Denote the solution to the saddlepoint equation as β(α). An Edgeworth expansion is calculated for the conjugate distribution defined by β(α). This Edgeworth expan- sion is then transformed back to give the saddlepoint approximation to the original distribution at the parameter value of interest,α.

The basic theorems from the Statistics literature are in Almudevar, Field and Robinson (2000), Field and Rochetti (1994), Field (1982) and Rochetti and Welsch (1994). To date the saddlepoint distribution theory in the Statistics literature is not well suited for empirical Economics. There are two problems. The first is the assumption that the objective function has a single extreme value. The second is the assumption that the saddlepoint equation has a solution for every value in the parameter space. A common feature of GMM objective functions is the existence of more than one local minimum. In addition, the nonlinearity of the moment conditions can make it impossible to solve the saddlepoint equation for an arbitrary value in the parameter space. The basic theorems from the Statistics literature need slight generalizations to allow for multiple local minima and the non-existence of a solution to the saddlepoint equation. The generalizations are contained in Theorems 3 and 4. The next two subsections elaborate.

4.1

DENSITY RESTRICTIONS

The empirical saddlepoint density applied to GMM moments requires two restrictions:

the local behavior of the objective function must be associated with a local minimum and the parameters must be consistent with the observed data.

Identification for GMM implies there will only be one minimum, asymptotically.

However, in finite samples there may not be enough data to accurately distinguish this asymptotic structure, i.e., often there are multiple local minima. Traditional GMM ignores multiple minima and restricts attention to the global minimum. The traditional asymptotic normal approximation to the GMM estimator is an approxi- mation built on the local structure (location and convexity) of the global minimum.

The saddlepoint approximation takes a different approach. The saddlepoint density approximates the sampling density for the location of solutions to the estimation equations. These include both local maxima and local minima. As with traditional GMM estimation, as the sample size increases one of the local minima will become the unique minimum. For the empirical saddlepoint density, attention is focused on the local minima by setting the saddlepoint density to zero if the first derivative of

(16)

the estimation equation is not positive definite. The term Ipos(QN(θ))

will be used to denote the indicator function for the original GMM objective function being positive definite at theθ value inα. A similar restriction was used in Skovgaard (1990)5.

The second restriction for the saddlepoint applied to the GMM estimation equa- tions concerns the fact that the empirical saddlepoint equation may not have a solu- tion. In this case the approximate density is set equal to zero. The lack of a solution to the empirical saddlepoint equation means the observed data is inconsistent with the selected parameter,α. The indicator

I(β(α))

equals one when the saddlepoint equation for α has a solution.

This type of restriction has occurred recently in the Statistics and Econometrics literature. The inability to find a solution to the empirical saddlepoint equation is equivalent to the restrictions on the sample mean in the selection of the parame- ters in the exponential tilting/maximum entropy estimation of Kitamura and Stutzer (1997). For the simple case of estimating the sample mean, the parameters must be restricted to the support of the observed sample. It is impossible to select nonnegative weights (probabilities) to have the weighted sum of the sample equal a value outside its observed range.

4.2

ADDITIONAL ASSUMPTIONS

To justify the empirical saddlepoint approximation, four results are needed.

1. The distribution for the local minima of the extended-GMM objective function associated with the local minima of the original GMM objective function.

2. The tilting relationship between the density for the zeros (local minima) of the extended-GMM objective function associated with the local minima of the GMM objective function when the observations are drawn from the population distribution

5In Skovagaard (1990) the restriction was applied to the equivalent of the extended-GMM objec- tive function not the equivalent of the original GMM objective function, i.e., the Skovagaard (1990) density would include both the local maxima and the local minima of the original GMM objective function.

(17)

and the density when the observations are drawn from the conjugate distribution.

3. The saddlepoint approximation, assuming the distribution for the observed series is known.

4. The empirical saddlepoint approximation where the distribution for the ob- served series is replaced with the empirical distribution.

The first three are achieved by minor changes to the results in Almudevar, Field and Robinson (2000), denoted AFR. The changes concern the need to restrict at- tention to solutions of the estimation equations associated with local minima of the original GMM objective function. In AFR the density is for all solutions, even those associated with local maxima. Another change is to set the approximation to zero if the empirical saddlepoint equation does not have a solution.

The fourth result is achieved by minor changes to results in Ronchetti and Welsh (1994). The first change is to set the approximation to zero if the empirical saddlepoint equation does not have a solution. This is achieved by including an indicator function for the existence of a solution to the empirical saddlepoint equation. The other change is to allow multiple solutions to the estimation equations. This is achieved by applying the result in Ronchetti and Welsh (1994) to each solution of the estimation equation associated with a local minima of the original GMM objective function.

Unlike traditional Econometrics, the saddlepoint approximation is well defined for some models lacking identification. Traditionally, it is assumed that the limiting objective function uniquely identifies the population parameter, e.g. Assumption 7. The saddlepoint approximation does not require such a strong assumption. There may be multiple solutions to the moment conditions associated with local minima of the GMM objective function. Define the set T = {θ∈Θ :E[g(xi, θ)] = 0}. If T is a singleton, then this reduces to the traditional identification assumption. An individual element in T will typically be denoted θ0.

The empirical saddlepoint approximation requires stronger assumptions than the assumptions needed for the first order normal approximation. These additional re- strictions concern the existence and integrability of higher order derivatives of the moments. The saddlepoint density requires that the asymptotic behavior in the neighborhood of the global minimum of the GMM objective function also holds for the neighborhood of each element in T. Assumptions are also needed to ensure that the Edgeworth expansion of the conjugate density is well defined. Finally, the empir- ical saddlepoint approximation requires the existence of higher order moments and smoothness to ensure the sample averages converge uniformly to their limits.

(18)

The following assumptions are stated in terms of the moment conditions. The assumptions in AFR and RW are stated in terms of a just identified system of esti- mation equations. The “proofs” in the appendix show that the following assumptions are strong enough to ensure that the assumptions in AFR and RW are satisfied for the estimation equations derived above in section 3.

Assumption 1’ . The p−dimensional series xi is iid from a distribution F(x).

The moment conditions satisfy regularity conditions.

Assumption 2’. The function g(x, θ) is uniformly continuous in x and has three derivatives with respect to θ which are uniformly continuous in x.

The functionsg(x, θ)and ∂g(x,θ)∂θ are measurable functions ofxfor each θ ∈Θ, and Eh

supθΘk∂g(x,θ)∂θ ki

<∞. E£

g(x, θ0)g(x, θ0

<∞ and E[supθΘkg(x, θ)k]<∞. Each element of g(x, θ) is uniformly square integrable.

Assumption 2’implies thatN times the GMM objective function will uniformly converge to the nonstochastic function E[g(x, θ)]E[g(x, θ)].

Assumption 3’. The parameter space Θ is a compact subset of Rk. θ0 ∈ T.

Each element of T is in the interior of Θ.

There exists a δ > 0 such that for any two unique elements in T, θ0,j and θ0,i,

¯¯θ0,j −θ0,i¯

¯> δ.

By Assumption 3’, the population parameter is in the set of minima of the extended-GMM objective function associated with the local minima of the original GMM objective function. This assumption also ensures that the limiting objective function can be expanded in a Taylor series about each of its local minima. Finally, this assumption requires that any identification failure results in disjoint solutions.

Assumptions 1’-3’are sufficient to ensure that the GMM estimator is a root-N consistent estimator of an element inT.

The traditional distributional assumption must hold for each local minima.

Assumption 4’. The moment conditions evaluated atθ0 ∈ T satisfy the central limit theorem

√N GN0)∼N(0,Σg0)) where Σg0) may be different for each value of θ0 ∈ T.

(19)

Assumption 5’. The weighting matrix is selected so that it is always positive definite and

WN →Σg0)1 where θ0 ∈ T.

This assumption can be satisfied by performing a first round estimation using a positive definite matrix as the weighting matrix. If T is a singleton this ensures the first order distribution will be efficient.

The objective function must satisfy restrictions in a neighborhood of each solution to the extended-GMM objective function associated with a local minima of the original GMM objective function.

Assumption 6’. For each θ0 ∈ T,

1. The matrix M(θ) ≡ E∂g(x,θ)∂θ is continuously differentiable and has full column rank for all θ in a neighborhood of θ0.

2. The matrixΣ(θ)≡E£

g(x, θ)g(x, θ)¤

is continuous and positive definite for all θ in a neighborhood of θ0.

3. The function

Z ∂g(x, θ)

∂θ exp{βg(x, θ)}dF(x) exists for β in a set containing the origin.

4. For 1≤i, j, s1, s2, s3 ≤m the integrals Z ½∂gi(x, α)

∂αs3

¾2

dF(x),

Z ½∂gi(x, α)

∂αs3

gj(x, α)

¾2

dF(x), Z ½∂2gi(x, α)

∂αs2∂αs3

¾2

dF(x),

Z ½∂2gi(x, α)

∂αs2∂αs3

gj(x, α)

¾

dF(x), Z ½ ∂3gi(x, α)

∂αs1∂αs2∂αs3

¾ dF(x) are finite.

Assumption 6’.1 and Assumption 6’.2 are straightforward. Assumption 6’.3 ensures the saddlepoint equation will have a solution in the neighborhood of

(20)

θ0. Assumption 6’.4 with Assumption 2’ ensures the uniform convergence of the sample averages to their limits.

The next assumption restricts attention to parameter values in a τ ball of a fixed parameter value. Calculate the maximum distance between the derivative at α0 and the derivative of any other parameter value inBτ0). If the derivative is zero for any parameter value in Bτ0), set the distance to infinity.

Define the random variable6

z(θ0, τ) =











supθBτ(θ0)

°°

°2g(x,θ0∂θ)W∂θNg(x,θ0)2g(x,θ)∂θW∂θNg(x,θ)°

°°, 2g(x,θ)∂θW∂θNg(x,θ) is positive definite,

∀ θ ∈Bτ0)

∞, otherwise

Now define the events in Ω such that this maximum deviation between the first derivative is below some value such asγ >0.

H(θ0, γ, τ) ={z(θ0, τ)< γ} ⊂Ω

This restricts attention to events where the objective function has an invertible deriv- ative and is fairly smooth.

Now define

R(x, θ) =







³2g(x,θ0)WNg(x,θ0)

∂θ∂θ

´1

∂g(x,θ)

∂θ WNg(x, α), 2g(x,θ)∂θW∂θNg(x,θ) is positive definite, ∀ θ ∈Bτ0)

∞, otherwise

and the density

fR(x,θ)(z;H(θ, γ, τ)) = Pr{{R(x, θ)∈Bτ(z)} ∩H(θ, γ, τ)} m(Bτ(z)) .

Assumption 7’. For any compact set A ⊂ A and for any 0 < γ < 1, there exists τ >0 and δ >0 such that fR(x,θ)(z;H(θ, γ, τ)) exists and is continuous and bounded by some fixed constant K for any α∈A and z ∈Bδ(0).

6Almudevar, Field and Robinson (2000) useskB1AIkas the distance between the two matrices, Aand B. This paper will use the alternative distance measurekABk, which is defined for none square matrices. This definition is more commonly used in the Econometrics literature, e.g., see Assumption 3.4 in Hansen (1982).

(21)

This is a high level assumption requiring the moment conditions have a bounded density in the neighborhood of zero. This is required to establish the existence of the density for the location of the local minima of the original GMM objective function.

The next assumption permits the moment conditions (and hence estimation equa- tions) to possess a mixture of continuous and discrete random variables. For each θ let Dθ ⊂ Rm be a set with Lebesgue measure zero such that by the Lebesgue decomposition theorem,

Pr

½∂g(x, θ)

∂θ WNg(x, θ)∈A

¾

= Pr

½∂g(x, θ)

∂θ WNg(x, θ)∈A∩Dθ

¾ +

Z

A

fθdm where fθ may be an improper density. Let I = 0 if g(xi, θ) ∈ Dθ and 1, other- wise. Assume the moment conditions can be partitioned into contributions from the continuous components and the discrete components.

Assumption 8’. There are iid random vectorsU

W, V1iθ, V2iθ

´

, whereW

are jointly continuous random vectors of dimension m and V1iθ and V2iθ are random vectors of dimension m and m, respectively, such that

N Tθ = XN

i=1

∂g(x, θ)

∂θ WNg(xi, θ) = XN

i=1

IW+ XN

i=1

(1−I)V1iθ

vec¡ N Sθ

¢= vec à N

X

i=1

2g(xi, θ)WNg(xi, θ)

∂θ∂θ

!

=Aθ

XN i=1

IU, where Aθ is of dimension m2 by 2m+m.

Let U = ³

W V

´

have the distribution of U conditional on I = 1 and V1jθ′′ have the distribution of V1jθ conditional on I = 0.

Define

θ

´

= 1 N

XK i=1

U ,

whereKis a draw from a binomial distribution with parametersN andρ= Pr{I = 1} conditional on being positive. Define

θ = 1 N

XK j=1

W + 1 N

XN j=K+1

V1jθ′′

(22)

when 0< K < N. Finally, define vec³

NS˜θ

´=Aθ

1 N

XK j=1

U .

Assumption 9’. det³ S˜θ

´

6

= 0 and that the transformation T˜θ to S˜θ1θ given V˜θ is one-to-one with probability one.

Assumption 10’. E£ exp©

βTUθ

ª¤<∞ for kβk< γ for some γ >0 and for all θ.

Let

Λ(F;θ, θ) =







°°

°Eh

2g(x,θ0)WNg(x,θ0)

∂θ∂θ

i

−Eh

2g(x,θ)WNg(x,θ)

∂θ∂θ

i°°

°, Eh

2g(x,θ)WNg(x,θ)

∂θ∂θ

i is positive definite,

∞, otherwise

and

Λ(F;θ, τ) = sup

θBτ(θ)

Λ(F;θ, θ).

Assumption 11’. Given 0< γ <1, there is a τ such that supθB˜τ0)Λ(F0;θ, τ)<

γ.

Assumption 12’. For fixed θ ∈ Bτ0), Λ(·;·, τ) is continuous at (F0, θ) in the product topology.

Theorem 3. (Almudevar, Field and Robinson (2000)) Under Assumptions 1’-12’

for θ0 ∈ T, there is, with probability 1−ecN for some c > 0, a uniquely defined M−estimate αˆ on Bτ0) which has a density, restricted to Bτ0),

fN(α) = KN ×Ipos(QN(θ))×I(β(α))

× µN

m2 ¯¯

¯E[ψ(x, α)]¯¯

¯

¯¯

¯E[ψ(x, α)ψ(x, α)]¯¯

¯

1/2

×exp{N κN(β(α), α)}¡

1 +O¡ N1¢¢

where β(α) is the solution of the saddlepoint equation Z

ψ(x, α) exp

½β

Nψ(x, α)

¾

dF(x) = 0,

(23)

κN(β(α), α) = Z

exp

½β(α)

N ψ(x, α)

¾

dF(x), the expectations are with respect to the conjugate density

hN(x) = expn

β(α)

N ψ(x, α)o dF(x) R expn

β(α)

N ψ(w, α)o

dF(w) ,

the termIpos(QN(θ))is an indicator function that sets the approximation to zero if the original GMM objective function is not positive definite at the θ value in α, the term I(β(α))is an indicator function that sets the approximation to zero if the saddlepoint equation does not have a solution, and KN is a constant ensuring the approximation integrates to one.

This theorem shows how the saddlepoint approximation is calculated. The sad- dlepoint approximation is nonnegative and gives a faster rate of convergence than the asymptotic normal approximation.

The calculation of the saddlepoint density requires knowledge of the distribution F(x). In most Economic applications this is unknown. Replacing the distribution with the observed empirical distribution results in the empirical saddlepoint approx- imation. This replaces the expectations with respect to the distribution F(x) with sample averages, i.e., the expectation with respect to the empirical distribution. The empirical saddlepoint density is defined to be

N(α) = KN ×Ipos(QN(θ))×I( ˆβN(α))

× µN

m2 ¯

¯¯

¯¯ XN

i=1

∂ψ(xi, α)

∂α pi(α)

¯¯

¯¯

¯

¯¯

¯¯

¯ XN

i=1

ψ(xi, α)ψ(xi, α)pi(α)

¯¯

¯¯

¯

1/2

×exp (

Nln à 1

N XN

i=1

exp

(βˆN(α)

N ψ(xi, α) )!)

,

where ˆβN(α) is the solution of XN

i=1

ψ(xi, α) exp

½β

Nψ(xi, α)

¾

= 0

(24)

and

pi(α) = expnˆ

βN(α)

N ψ(xi, α)o PN

j=1expnˆ

βN(α)

N ψ(xj, α)o.

The empirical saddlepoint approximation is scaled by its (numerical) integral,KN, to give a density that integrates to one.

Using the empirical distribution instead of a known distribution gives a nonpara- metric procedure. It results in a reduction in accuracy as noted in the next theorem.

The appropriate rate of convergence is achieved by restricting attention to parame- ters in a shrinking neighborhood of the local minima of the estimation equations,

|α−α0| < ∆/√

N for some ∆ < ∞. This can also be thought of as obtaining the density foru=√

N(α−α0).

Theorem 4. (Ronchetti and Welsh (1994))If Assumptions 1’-12’are satisfied for θ0 ∈ T,

fN

³α0+u/√ N´ fˆN

³ ˆ

αN +u/√

N´ = 1 +Op¡

N1/2¢

where the convergence is uniform for u in any compact set.

4.3

ASYMPTOTIC NORMAL AND EMPIRICAL SADDLEPOINT APPROX- IMATIONS

This section compares the empirical saddlepoint approximation with the asymptotic normal approximation. The two densities have similar structures. Their differences concern the “means” and “covariances.”

From Theorem 2, the asymptotic normal approximation can be written fbA(α) = (2π)m2

¯¯

¯¯

¯ Σb N

¯¯

¯¯

¯

1/2

exp



−1 2

"

θ−bθ λ−bλ

#Ã Σb N

!1"

θ−θb λ−bλ

#

where θb is the GMM estimator, i.e. the global minima of the objective function, bλ=C2(θ)Wb 1/2GN(θ) andb Σ is the covariance matrix givenb Theorem 2.

The covariance matrix is the observed second derivative (convexity) of the extended- GMM objective function at the global minima and can be estimated with Σ =b

(25)

(AB1A)1 where A = 1

N XN

i=1

∂ψ(xi,α)ˆ

∂α and B = 1 N

XN i=1

ψ(xi,α)ψ(xˆ i,α)ˆ .

The asymptotic normal approximation is built on the local behavior (location and convexity) of the objective function at its extreme value. The asymptotic normal approximation delivers a familiar, but restricted, structure for the approximation to the sampling distribution: unimodal, symmetric with the thin tails associated with the normal distribution. The asymptotic normal approximation is constructed from a linear approximation to the first order conditions. Linear first order conditions occur when the objective function is quadratic. So the asymptotic normal approximation is built on a quadratic approximation to the objective function. This asymptotic normal approximation is a poor approximation to the sampling distribution if the objective function is not well approximated by a quadratic over relevant parameter values. The local behavior (location and convexity) of the objective function at its extreme value will not contain enough information to accurately approximate the objective function in a large enough region.

Instead of focusing only on the global minimum, the saddlepoint approximation summarizes the information in the sample using the global shape of the objective function. The similar structure to the asymptotic normal approximation is shown in the following theorem.

Theorem 5. The empirical saddlepoint approximation can be written

fbS(α) = KN ×Ipos(QN(α))×I(β(α))

×(2π)m2

¯¯

¯¯

¯ ΣbC(α)

N

¯¯

¯¯

¯

1/2

exp



−1 2

"

θ−θ(α) λ−λ(α)

#Ã Σbe(α)

N

!1"

θ−θ(α) λ−λ(α)

#

×n

1 +O³

N12´o .

The empirical saddlepoint approximation uses the convexity of the objective func- tion at each point in the parameter space: ΣbC(α) and Σbe(α). The matrix Σbe(α) is the convexity of the objective function at the parameter values estimated using the

(26)

empirical distribution, i.e., Σbe(α) = (Ae(α)Be(α)1Ae(α))1 where Ae(α) =

XN i=1

1 N

∂ψ(xi, α)

∂α and Be(α) = XN

i=1

1

Nψ(xi, α)ψ(xi, α).

The matrix ΣbC(α) is the convexity of the objective function at the parameter values estimated using the conjugate distribution, i.e., ΣbC(α) = (AC(α)BC(α)1AC(α))1 where

AC(α) = XN

i=1

pi(α)∂ψ(xi, α)

∂α and BC(α) = XN

i=1

pi(α)ψ(xi, α)ψ(xi, α) and

pi(α) = exp{β(α)ψ(xi, α)} PN

j=1exp{β(α)ψ(xj, α)}.

In addition, the empirical saddlepoint approximation uses each of the local minima as the “mean” for the parameters connected by a path where the objective function remains positive definite. This “mean” is denoted α(α) =

"

θ(α) λ(α)

# .

The saddlepoint approximation is a natural generalization of the asymptotic nor- mal approximation. If the objective function is quadratic then the saddlepoint ap- proximation will be equal to the asymptotic normal approximation. However, if the objective function is not quadratic, the saddlepoint approximation will incorporate the global structure of the objective function.

If the objective function is not quadratic, butT is a singleton, consistency implies that the mass of the sampling distribution converges to a shrinking neighborhood of the population parameter value. In this neighborhood the objective function converges to a quadratic. Hence, the saddlepoint approximation will converge to the familiar normal approximation.

The empirical saddlepoint approximation is the sampling distribution for the loca- tion of the local minima of the GMM objective function. This may be asymmetric and does not force the tails behavior associated with the normal approximation. A given sample may not be informative enough to distinguish the general location of the pop- ulation parameter values. If the GMM objective function does not have a unique local minimum then the empirical saddlepoint approximation can have multiple modes.

(27)

5

BEHAVIOR IN FINITE SAMPLES

Monte Carlo simulations demonstrate the performance of the empirical saddlepoint distribution for the model presented in Hall and Horowitz (1996). Four different confidence intervals/tests derived from the saddlepoint approximation are compared with confidence intervals/tests created from the bootstrap, two-step GMM and the s-sets. For these simulations, the empirical saddlepoint is superior to the alternatives.

Although, the bootstrap occasionally has performance comparable to the empirical saddlepoint approximation.

The model presented in Hall and Horowitz (1996) has been used in Imbens, Spady Johnson (1998), Kitamura (2001), Schennach (2007) and elsewhere. The one parame- ter model is estimated with the two moment conditions

g(θ) =

"

exp{µ−θ(X+Z) + 3Z} −1 Z¡

exp{µ−θ(X+Z) + 3Z} −1¢

#

whereθ0 = 3, X andZ are iid scalars drawn fromN(0, s2) and µis a known constant set equal to θ20s2/2. With one overidentifying restriction, λ is a scalar for these simulations. The simulations consider sample sizes of N = 50 and 100 and s set equal to .2, .3, .4, .5 and .6. The s parameter controls the noise in the system of equations. For a fixed sample size and population parameter value, larger values of s make inference more difficult. This model can possess multiple local minima, a characteristic common in moment estimation. Regularly, two local minima occur because the first moment condition does not uniquely identify θ0. The first moment condition has two roots: θ = 0 and θ = θ0. The second moment condition has a unique root at θ = θ0. Random variability frequently results in a GMM objective function with two local minima.

In finite samples, the sampling distribution of the parameters of interest and the statistics that judge the validity of the overidentifying restrictions are not indepen- dent. Hence, both the marginal and conditional distributions are reported. Under the null, the marginal and conditional distributions converge to the same asymptotic distribution. However, in finite samples it is informative to report both. Asymptot- ically the statistics that test the validity of the overidentifying restrictions converge to zero. So the distribution of θ conditional on λ = 0 is reported. The simulations demonstrate, this may generate appealing confidence intervals.

The empirical saddlepoint density is evaluated on an evenly spaced grid of 341×

Referenzen

ÄHNLICHE DOKUMENTE

My early fascination with computers would ultimately place Corporate Systems Center (CSC) at the forefront of magnetic data storage technology. Noble to head a

To insert a GoTo button to turn to a different document, select the document in the Table of Contents, and draw the gesture for a GoTo button &lt;ID in the cork margin. To have

But then, as in the affine semialgebraic case, (cf. Unfortu- nately this mapping cylinder in general is not a locally semialgebraic space. In particular, let us again

1068 of 23 September 2009 on the organization and functioning of interprofessional organizations for agri food products and the approval the criteria of

Having in view that exports represent the most relevant criterion for assessing the products’ competitiveness, from the analysis made regarding the evolution of

Section 4 reports simulations that demonstrate that the ESPL and the CESPL estimator have smaller bias than currently available estimators and that new tests for the validity of

Our simulation results show that when a weak factor is present in data, our estimator (the one that receives the highest frequency from the estimation with 100 random