GMM Gradient Tests for Spatial Dynamic Panel Data Models

(1)

Munich Personal RePEc Archive

GMM Gradient Tests for Spatial Dynamic Panel Data Models

Taspinar, Suleyman and Dogan, Osman and Bera, Anil K.

2017

Online at https://mpra.ub.uni-muenchen.de/83570/

MPRA Paper No. 83570, posted 02 Jan 2018 23:00 UTC

(2)

GMM Gradient Tests for Spatial Dynamic Panel Data Models

^∗

S¨uleyman Ta¸spınar^† Osman Do˘gan^‡ Anil K. Bera^§

2

September 29, 2016

Abstract

4

In this study, we formulate the adjusted gradient tests when the alternative model used to construct tests deviates from the true data generating process for a spatial dynamic panel data

6

model (SDPD). Following Bera et al. (2010), we introduce these adjusted gradient tests along with the standard ones within a GMM framework. These tests can be used to detect the presence

8

of (i) the contemporaneous spatial lag terms, (ii) the time lag term, and (iii) the spatial time lag terms in an higher order SDPD model. These adjusted tests have two advantages: (i)

10

their null asymptotic distribution is a central chi-squared distribution irrespective of the misspecified alternative model, and (ii) their test statistics are computationally simple and require

12

only the ordinary least-squares (OLS) estimates from a non-spatial two-way panel data model.

We investigate the finite sample size and power properties of these tests through Monte Carlo

14

studies. Our results indicates that the adjusted gradient tests have good finite sample properties.

JEL-Classification: C13, C21, C31.

16

Keywords: Spatial Dynamic Panel Data Model, SDPD, GMM, Robust LM Tests, GMM Gradient Tests, Inference.

18

∗This research was supported, in part, by a grant of computer time from the City University of New York High Per- formance Computing Center under NSF Grants CNS-0855217 and CNS-0958379. Please address all correspondence to S¨uleyman Ta¸spınar at STaspinar@qc.cuny.edu.

†Economics Program, Queens College, The City University of New York, United States, email:

staspinar@qc.cuny.edu.

‡Istanbul Ulasim A.S., Project Department, Istanbul, Turkey., email: odogan10@gmail.com.

§Economics Program, University of Illinois, Illinois, United States, email: a-bera@uiuc.edu.

(3)

1 Introduction

In this study, we consider a spatial dynamic panel data model (SDPD) that includes a time lag

20

term, spatial time lag terms and contemporaneous spatial lag terms. The model is in the form of a high order spatial autoregressive model by including high orders of contemporaneous spatial

22

lag term and spatial time lag term. We formulate the GMM gradient tests, the adjusted GMM gradient tests and the C(α) test to test hypothesis about the parameters of the time lag term, the

24

spatial time lag terms and the contemporaneous spatial lag terms.

In the literature, the model specifications and estimation strategies, including the ML, GMM

26

and Bayesian methods, receive considerably more attention than the specification testing and other forms of hypothesis tests for the SDPD models. For two recent surveys, see Anselin et al. (2008)

28

and Lee and Yu (2010b). Lee and Yu (2010a, 2011, 2012a), Yu and Lee (2010), and Yu et al. (2008, 2012) consider the ML approach for dynamic spatial panel data models when both the number of

30

individuals and the number of time periods are large under various scenarios. The MLE suggested in these studies has asymptotic bias and the limiting distributions of bias corrected versions are

32

properly centered when the number of time periods grows faster than the number of individuals.

Elhorst (2005), Lee and Yu (2015), and Su and Yang (2015) consider the ML approach for the

34

dynamic panel data models that have spatial autoregressive processes in the disturbance terms.

Parent and LeSage (2011) introduce the Bayesian MCMC method for a panel data model that

36

accommodates dependence across space and time in the error components. Kapoor et al. (2007) extend the GMM approach of Kelejian and Prucha (2010) to a static spatial panel data model with

38

error components. Lee and Yu (2014) consider the GMM approach for an SDPD model that has high orders of contemporaneous spatial lag term and spatial time lag term.

40

To date, the focus has been on the specification testing for the cross-sectional and the static spatial panel data models (Anselin et al. 1996; Baltagi and Yang 2013; Baltagi et al. 2003, 2007;

42

Debarsy and Ertur 2010). In this study, we introduce GMM-based tests for an SDPD model that has high orders of contemporaneous spatial lag term and spatial time lag term. In particular, we

44

first consider the GMM-gredient test (or the LM test) of Newey and West (1987), which can be used to test the non-linear restrictions on the parameter vector. We also consider the C(α) test within

46

the GMM framework for the same model. While the computation of GMM-gradient test requires an estimate of the optimal restricted GMME, the computation ofC(α) test statistic requires only

48

a consistent estimate of the parameter vector. For both tests, we provide analytical justification for their asymptotic distributions within the context of our SDPD.

50

Within the ML framework, Davidson and MacKinnon (1987), Saikkonen (1989) and Bera and Yoon (1993) show that the usual LM tests are not robust to local mis-specifications in the alternative

52

models. That is, the usual LM tests have non-central chi-squared distribution when the alternative model (locally) deviates from the true data generating process. Bera et al. (2010) extent this result

54

to the GMM framework and show that the asymptotic distribution of the usual GMM-gradient test is a non-central chi-squared distribution when the alternative model deviates from the true data

56

generating process. In such a context, the usual LM and GMM-gradient tests will over reject the true null hypothesis. Therefore, Bera and Yoon (1993) and Bera et al. (2010) suggest robust (or

58

adjusted) versions that have, asymptotically, central chi-squared distributions irrespective of the local deviations of the alternative models from the true data generating process.

60

By following Bera et al. (2010), we construct various adjusted GMM-gradient tests for an SDPD model. These tests can be used to detect the presence of (i) the spatial lag terms, (ii) the time lag

62

term, and (iii) the spatial time lag terms in an SDPD model. Besides being robust to local mis- specifications, these tests are computationally simple and require only estimates from a non-spatial

64

two-way panel data model. Within the context of our SDPD, we analytically show the asymptotic

(4)

distribution of robust tests under both the null and local alternative hypotheses. We investigate

66

the size and power properties of our suggested robust tests through a Monte Carlo simulation. The simulation results are in line with our theoretical findings and indicate that the robust tests have

68

good size and power properties.

The rest of this paper is organized in the following way. Section 2 presents the SDPD model

70

under consideration and discusses its assumptions. Section 3 lays out the details of the GMM estimation approach for the model specification. Section 4 presents the GMM gradient tests, the

72

adjusted GMM gradient tests and theC(α) test. Section 5 lays out the details of the Monte Carlo design and presents the results. Section 6 closes with concluding remarks. Some of the technical

74

derivations are relegated to an appendix.

2 The Model Specification and Assumptions

76

Using the standard notation, an SDPD model with both individual and time fixed effects is stated as

Y_nt = Xp j=1

λ_j0W_njY_nt+γ₀Y_n,t₋₁+ Xp j=1

ρ_j0W_njY_n,t₋₁+X_ntβ₀+c_n0+α_t0l_n+V_nt (2.1)

for t= 1,2, . . . , T, where Y_nt = (y_1t, y_2t, . . . , y_nt)^′ is the n×1 vector of a dependent variable, X_n is the n×k_x matrix of non-stochastic exogenous variables with a matching parameter vector β₀,

78

and V_nt = (v_1t, . . . , v_nt)^′ is the n×1 vector of disturbances (or innovations). The spatial lags of the dependent variable at time t and t−1 are, respectively, denoted by WnjYnt and WnjYn,t−1

80

for j = 1, . . . , p. Here, W_njs are the n× n spatial weight matrices of known constants with zero diagonal elements, λ₀ = (λ₁₀, . . . , λ_p0)^′ and ρ₀ = (ρ₁₀, . . . , ρ_p0)^′ are the spatial autoregressive

82

parameters. The individual fixed effects are denoted by c_n0 = (c_1,0, . . . , c_n,0)^′ and the time fixed effect is denoted by αt0ln, where ln is the n×1 vectors of ones. For the identification of fixed

84

effects, Lee and Yu (2014) impose the normalization l_n^′c_n0 = 0. For the estimation of the model, we assume that Y_n0 is observable. Let Θ be the parameter space of the model. In order to

86

distinguish the true parameter vector from other possible values in Θ, we state the model with the true parameter vector θ0 = λ^′₀, δ^′₀^′

, where δ0 = γ0, ρ^′₀, β₀^′^′

. Furthermore, for notational

88

simplicity we let S_n(λ) = I_n−Pp

j=1λ_jW_nj

, S_n = S_n(λ₀), A_n = S_n⁻¹ γ₀I_n +Pp

j=1ρ_jW_nj , G_nj(λ) =W_njS_n⁻¹(λ),G_nj =G_nj(λ₀) and N =n(T−1).

90

To avoid the incidental parameter problem, the model is transformed to wipe out the fixed effects. The individual effects can be eliminated from the model by employing the orthonormal eigenvector matrix

F_T,T₋₁,√¹ Tl_T

of J_T = I_T − _T¹l_Tl^′_T

, where F_T,T₋₁ is the T ×(T −1) eigenvectors matrix corresponding to the eigenvalue one and lT is the T ×1 vector of ones corresponding to the eigenvalue zero.¹ This orthonormal transformation can be applied by writing the model in an n × T system. Hence, the dependent variable is transformed as

Y_n1, Y_n2, . . . , Y_nT

×F_T,T₋₁ =

Y_n1^∗, Y_n2^∗ , . . . , Y_n,T^∗ ₋₁

, and also

Y_n0, Y_n1, . . . , Y_n,T₋₁

× F_T,T₋₁ =

Y_n0⁽^∗^,⁻¹⁾, Y_n1⁽^∗^,⁻¹⁾, . . . , Y_n,T⁽^∗^,⁻₋¹⁾₂

. Similarly,

X_nj,1, X_nj,2, . . . , X_nj,T

× F_T,T₋₁ = X_nj,1^∗ , X_nj,2^∗ , . . . , X_nj,T^∗ ₋₁

forj = 1, . . . , kx,

Vn1, Vn2, . . . , VnT

×FT,T−1 =

V_n1^∗, V_n2^∗, . . . , V_n,T^∗ ₋₁ , and

α₁₀, α₂₀, . . . , α_T₀

×F_T,T₋₁ =

α^∗₁₀, α^∗₂₀, . . . , α^∗_T₋_1,0

. Since the column of

F_T,T₋₁,√¹ Tl_T

are orthonormal, we have [cn0,c_n0, . . . ,c_n0]×FT,T−1 = 0_n_×_(T₋₁₎. Thus, the transformed model does

1This orthonormal matrix has the following properties (i) JTFT ,T−1 = FT ,T−1 and JTlT = 0T×1, (ii) FT ,T^′ −1FT ,T−1=IT−1 andF_{T ,T−1}^′ lT = 0(T−1)×1, (iii)FT ,T−1FT ,T^′ −1+_T¹lTl^′T =IT and (iv)FT ,T−1FT ,T^′ −1=JT.

(5)

not include the individual fixed effects and can be written as Y_nt^∗ =

Xp j=1

λ_j0W_njY_nt^∗ +γ₀Y_n,t⁽^∗^,₋⁻₁¹⁾+ Xp j=1

ρ_j0W_njY_n,t⁽^∗^,₋⁻₁¹⁾+X_nt^∗ β₀+α^∗_t0l_n+V_nt^∗ (2.2) for t = 1, . . . , T −1. We consider the forward orthogonal difference (FOD) transformation for the orthonormal transformation. Hence, the terms in (2.2) can be explicitly stated as V_nt^∗ =

T−t T−t+1

1/2

V_nt− _T¹₋_tPT

h=t+1V_nh

,Y_n,t⁽^∗^,₋⁻₁¹⁾ = _T^T₋⁻_t+1^t 1/2

Y_n,t₋₁−_T¹₋_tPT−1 h=t Y_nh

, and the others terms are defined similarly. Let V^∗_n,T₋₁ = V_n1^∗^′, . . . , V_n,T^∗^′ ₋₁^′

. Then, Var V_n,T^∗ ₋₁

= F_T,T^′ ₋₁⊗ In

E V_nTV^′_nT

FT,T−1 ⊗In

= σ²₀IN by Assumption 1. The transformed model in (2.2) still includes the time fixed effect α^∗_t0l_n, which can be eliminated by pre-multiplying the model with J_n=I_n−¹_nl_nl^′_n. The resulting model is free of the fixed effects, fort= 1, . . . , T −1,

J_nY_nt^∗ = Xp j=1

λ_j0J_nW_njY_nt^∗ +γ₀J_nY_n,t⁽^∗^,₋⁻₁¹⁾+ Xp j=1

ρ_j0J_nW_njY_n,t⁽^∗^,₋⁻₁¹⁾+J_nX_nt^∗ β₀+J_nV_nt^∗. (2.3) The consistency and asymptotic normality of the GMME ofθ₀ are established under Assumptions 1 through 5.²

92

Assumption 1. — The innovations v_its are independently and identically distributed across i and t, and satisfy E (v_it) = 0, E v_it²

=σ₀², and E|v_it|^4+η <∞ for someη >0 for all iandt.

94

Assumption 2. — The spatial weight matrix W_njs is uniformly bounded in row and column sums in absolute value for j = 1, . . . , p, and kPp

j=1λ_j0W_njk∞ <1. Moreover, S_n⁻¹(λ) exists and

96

is uniformly bounded in row and column sums in absolute value for all values of λ in a compact parameter space.

98

Assumption 3. — Letη >0 be a real number. Assume thatX_nt,cn0, andα_t0are non-stochastic terms satisfying (i) sup_n,T _nT¹ PT

t=1

Pn

i=1|x_it,l|^2+η <∞ forl = 1, . . . , k_x, where x_it,l is the (i, t)th

100

element of thelthcolumn, (ii) limn→∞ 1 n(T−1)

PT−1

t=1 X_nt^∗ JnX_nt^∗ exists and is non-singular, and (iii) sup_T _T¹ PT

t=1|αt0|^2+η <∞ and sup_n_n¹ Pn

i=1|ci0|^2+η <∞.

102

Assumption 4. — The DGP for the initial observations is Y_n0 = Ph^∗

h=0A^h_nS_n⁻¹(c_n0+X_n,₋_hβ₀+α₋_h,0l_n+V_n,₋_h), whereh^∗ could be finite or infinite.

104

Assumption 5. — The elements of P_∞

h=0abs A^h_n

are uniformly bounded in row and column sums in absolute value, where [abs (A_n)]_ij =|A_n,ij|

106

3 The GMM Estimation Approach

In this section, we summarize the GMM estimation approach for (2.3) under both largeT and finite T scenarios. The model in (2.3) indicates that IVs are needed forW_njY_nt^∗,Y_n,t⁽^∗^,₋⁻₁¹⁾, andW_njY_n,t⁽^∗^,₋⁻₁¹⁾ for each t. Before, we introduce the set of moment functions, it will be convenient to introduce some further notations. LetZ_nt^∗ =

Y_n,t⁽^∗^,₋⁻₁¹⁾, Wn1Y_n,t⁽^∗^,₋⁻₁¹⁾, . . . , WnpY_n,t⁽^∗^,₋⁻₁¹⁾, X_nt^∗

,J_n,T₋₁ =I_T₋₁⊗Jn, andV_n,T^∗ ₋₁(θ) = V_n1^∗^′(θ), . . . , V_n,T^∗^′ ₋₁(θ)^′

whereV_nt^∗(θ) =Snt(λ)Y_nt^∗ −Z_nt^∗ δ−α^∗_tln. We consider the

2For interpretations and implications of these assumptions, see Lee and Yu (2014) and Kelejian and Prucha (2010).

(6)

following (m+q)×1 vector of moment functions

g_nT(θ) =







V^∗_n,T^′ ₋₁(θ)J_n,T₋₁P_n1,T₋₁J_n,T₋₁V^∗_n,T₋₁(θ) V^∗_n,T^′ ₋₁(θ)Jn,T−1P_n2,T₋₁J_n,T₋₁V^∗_n,T₋₁(θ)

...

V_n,T^∗^′ ₋₁(θ)J_n,T₋₁P_nm,T₋₁J_n,T₋₁V_n,T^∗ ₋₁(θ) Q^′_n,T₋₁J_n,T₋₁V^∗_n,T₋₁(θ)







. (3.1)

In (3.1), P_nj,T₋₁ = I_T₋₁ ⊗P_nj, where P_nj is the n ×n quadratic moment matrix satisfying

108

tr (P_njJ_n) = 0 forj= 1, . . . , m, andQ_n,T₋₁ = Q^′_n1, . . . , Q^′_n,T₋₁^′

is theN×q liner IV matrix such thatq≥kx+2p+1. Under Assumptions 1-4, it can be shown that _N¹ ^∂g^nT^(θ⁰⁾

∂θ^′ =DnT+RnT+O √¹ nT

,

110

whereD_nT isO(1) andR_nT isO _T¹ .³

Let vec_D(·) be the operator that creates a column vector from the diagonal elements of an input square matrix. For the optimal GMM estimation, we need to calculate the covariance matrix of moment functions E g_nT^′ (θ0)gnT (θ0)

, which can be approximated by Σ_nT =σ⁴₀

1

N∆_nm,T 0_m_×_q

0_q_×_m _σ¹₂

0

1

NQ^′_n,T₋₁J_n,T₋₁Q_n,T₋₁

!

(3.2) + 1

N

µ4−3σ₀⁴

ω_nm,T^′ ωnm,T 0m×q

0_q_×_m 0_q_×_m

,

where ω_nm,T =

vec_D(J_n,T₋₁P_n1,T₋₁J_n,T₋₁), . . . ,vec_D(J_n,T₋₁P_nm,T₋₁J_n,T₋₁) ,

112

∆_nm,T =

vec(J_n,T₋₁P^′_n1,T₋₁J_n,T₋₁), . . . ,vec(J_n,T₋₁P^′_nm,T₋₁J_n,T₋₁)^′ ×

vec(J_n,T₋₁P^s_n1,T₋₁J_n,T₋₁), . . . ,vec(J_n,T₋₁P^s_nm,T₋₁J_n,T₋₁)

, where A^s_n = A_n + A^′_n for any

114

square matrix An.

Let Σb_nT be a consistent estimate of Σ_nT. Then, the optimal GMME is defined by

θb_nT = argmin_θ_∈_Θg_nT^′ (θ)Σb⁻_nT¹g_nT(θ) (3.3) Under Assumptions 1 - 5, Lee and Yu (2014) show that when both T and ntend to infinity⁴:

√N θb_nT −θ₀ d

−

→N

0,h

plim_n,T_→∞D_nT^′ Σ⁻_nT¹D_nTi₋1

. (3.4)

When T is finite, the GMME in (3.4) is still consistent and unbiased but its limiting covariance

116

matrix is different, since the additional termR_nT =O _T¹

does not vanish. Hence, whenT is finite, the asymptotic covariance matrix of√

N θb_nT−θ0

is given by

plim_n_→∞ D_nT+R_nT^′

Σ⁻_nT¹ D_nT+

118

R_nT₋1

.

3The explicit forms forDnT andRnT are not required for our testing results, hence they are not given here. For these terms, see Lee and Yu (2014).

4 Lee and Yu (2014) state the identification conditions. Here, we simply assume that the parameter vector is identified.

(7)

4 The GMM Gradient Tests

120

In this section, we consider various version of the gradient test (LM test). Let r:R^2p+k^x⁺¹ →R^k^r be a twice continuously differentiable function, and assume that R(θ) = ^∂r(θ)

∂θ^′ has rank k_r.

122

Consider the implicit restrictions denoted by the null hypothesis H0 : r(θ0) = 0. Define bθ_nT,r = argmax_{_θ:r(θ)=0_}Qn, where Qn =g^′_nT(θ)Σb⁻_nT¹g_nT(θ), as a restricted (or constrained) opti-

124

mal GMME.

In order to give a general argument, consider the following partition of θ = β^′, ψ^′, φ^′^′ , where ψ and φ are, respectively, k_ψ ×1 and k_φ × 1 vectors such that k_ψ + k_φ = 2p + 1.

In the context of our model, ψ and φ can be any combinations of the remaining parameters, namely, λ^′, γ, ρ^′^′

. Let G_a = _N¹ ^∂g^nT^(θ)

∂a^′ , C_a = G^′_a(θ)Σb⁻_nT¹g_nT(θ), where a ∈ {β, ψ, φ} and g_nT = _N¹g_nT. Define G(θ) = G_β(θ), G_ψ(θ), G_φ(θ)

, and C(θ) = C_β^′ (θ), C_ψ^′ (θ), C_φ^′ (θ)^′ , and B(θ) = G^′(θ)Σb⁻_nT¹G(θ). Finally, let Ga = plim_n,T_→∞_N¹ ^∂g^nT^(θ⁰⁾

∂a^′ for a ∈ {β, ψ, φ}. Define G = Gβ,Gψ,Gφ

and H = plim_n,T_→∞ DnT +Rnt^′Σb⁻_nT¹ DnT +Rnt

. We consider the following partition of B(θ) and H:

B(θ) =





B_β(θ) B_βψ(θ) B_βφ(θ) B_ψβ(θ) B_ψ(θ) B_ψφ(θ) B_φβ(θ) B_φψ(θ) B_φ(θ)



, H=



Hβ Hβψ Hβφ

Hψβ Hψ Hψφ

Hφβ Hφψ Hφ



. (4.1)

With the notation introduced, the standard LM test statistic for H₀ :r(θ₀) = 0 is defined in the following way (Newey and West 1987):

LM =N C^′ bθ_nT,r

B⁻¹ bθ_nT,r

C bθ_nT,r

. (4.2)

A similar test is the C(α) test.⁵ This test is designed to deal with the nuisance parameters when testing the parameter of main interest (Bera and Bilias 2001). Lee and Yu (2012b) investigate the finite sample properties of this test for a cross-sectional autoregressive model. Their simulation results indicate that this test can be useful to test the possible presence of spatial correlation through a spatial lag in the spatial autoregressive (SAR) model. Here, we provide a general de- scription of this test within the context of our SDPD model. By the implicit function theorem, the set of k_r restrictions on θ₀ can also be stated as h(ξ₀) = θ₀, where h : Rq → R^2p+k^x⁺¹ is continuously differentiable, ξ₀ contains the free parameters, and q = 2p+k_x+ 1−k_r. Define bξ_nT = argmin_φg_nT^′ (h(ξ))Σb⁻_nT¹gnT (h(ξ)). Then, we havebθnT,r =h bξ_nT

. Let ˜ξnT be a consistent estimate ofξ₀. DenoteG_ξ(θ) = _N¹ ^∂g^nT^(θ)

∂ξ^′ ,C_ξ(θ) =G^′_ξ(θ)Σb⁻_nT¹g_nT(θ), andB_ξ(θ) =G^′_ξ(θ)Σb⁻_nT¹G_ξ(θ).

Following the formulation suggested by Breusch and Pagan (1980), we state theC(α) test statistic in the following way

C(α) =N

C^′ h( ˜ξ_nT)

B⁻¹ h( ˜ξ_nT)

C h( ˜φ_nT)

− C_ξ^′ h( ˜ξ_nT)

B_ξ⁻¹ h( ˜ξ_nT)

C_ξ h( ˜ξ_nT) . (4.3) In (4.3), it is important to note that ˜ξ_nT can be any consistent estimator. In the case where ˜ξ_nT is an

126

optimal GMME, the C(α) statistic reduces to LM statistic, since Cξ

h( ˜ξnT)

= 0 by definition.⁶ The asymptotic distributions of C(α) andLM are given in the following proposition.

128

5Breusch and Pagan (1980) call this test the pseudo-LM test, since its test statistic is very similar to the form of the LM statistic.

6In the context of ML estimation, theC(α) statistic reduces to the LM statistic when the restricted MLE is used.

For details, see Bera and Bilias (2001).

(8)

Proposition 1. — Given our stated assumptions, we have the following results under H₀ : r(θ₀) = 0:

LM −→^d χ²_k_r, and C(α)−→^d χ²_k_r. (4.4)

Proof. See Section C.1.

Next, we consider the following joint null hypothesis:

H₀:λ₀= 0, ρ₀ = 0, γ₀ = 0, H_A: At least one parameter is not equal to zero. (4.5) Under the joint null hypothesis, the model reduces to a two-way non-spatial panel data model which can be estimated by an OLSE (for the estimation of two-way models, see Baltagi (2008) and Hsiao (2014)). The joint null hypothesis can be tested either by LM orC(α). Let ˜θ_nT be a constrained optimal GMME under the joint null hypothesis, and letbθ_nT be any other consistent estimator of θ₀ under the null hypothesis. As stated in Newey and West (1987), the LM test statistic should be formulated with the optimal constrained GMME. Letϑ= λ^′, ρ^′, γ^′

. Then, the LM test statistic for the joint null hypothesis can be expressed as

LM_J θ˜_nT

=N C_J^′ θ˜_nT B_ϑ_·_β

θ˜_nT ⁻¹C_J θ˜_nT

, (4.6)

where C_J^′ θ˜_nT

= C_λ^′ θ˜_nT

, C_ρ^′ θ˜_nT

, C_γ^′ θ˜_nT^′

, B_ϑ_·_β θ˜_nT

= B_ϑ θ˜_nT

− B_ϑβ θ˜_nT

B_β⁻¹ θ˜_nT

B_βϑ θ˜_nT

, B_ϑβ θ˜_nT

= B_βϑ^′ θ˜_nT

= B_λβ^′ θ˜_nT

, B_ρβ^′ θ˜_nT

, B_γβ^′ θ˜_nT^′ , and

B_ϑ θ˜_nT

=





B_λ θ˜_nT

B_λρ θ˜_nT

B_λγ θ˜_nT B_ρλ θ˜_nT

B_ρ θ˜_nT

B_ργ θ˜_nT B_γλ θ˜_nT

Bγρ θ˜_nT

Bγ θ˜_nT



. (4.7)

Similarly, the consistent estimator bθ_nT can be used to formulate the following C(α) test for the joint null hypothesis:

C_J(α) =N

C^′ bθ_nT

B⁻¹ bθ_nT

C bθ_nT

−C_β^′ bθ_nT

B_β⁻¹ bθ_nT

C_β bθ_nT

. (4.8)

The properties of the LM test can be investigated under a sequence of local alternatives (Bera and Bilias 2001; Bera and Yoon 1993; Bera et al. 2010; Davidson and MacKinnon 1987; Saikkonen 1989). Bera and Yoon (1993) and Bera et al. (2010) suggest robust LM tests when the alternative model is misspecified. We consider similar robust LM tests within the context of our model. In order to give a general result, we consider the LM test forH₀^ψ :ψ₀ = 0 when H₀^φ :φ₀ = 0, which can be stated as

LM_ψ =N C_ψ^′ θ˜_nT

B_ψ_·_β θ˜_nT₋1

C_ψ θ˜_nT

, (4.9)

where B_ψ_·_β θ˜_nT

= B_ψ θ˜_nT

− B_ψβ θ˜_nT

B_β⁻¹ θ˜_nT

B_βψ θ˜_nT

. We investigate the asymptotic distribution of LM_ψ under the sequences of local alternatives H_A^ψ : ψ = ψ₀ +δ_ψ/√

N, and H_A^φ : φ = φ₀ + δ_φ/√

N, where ψ₀^′, φ^′₀^′

is the vector of hypothesized values under the null, and δ_ψ and δ_φ are bounded vectors. The distribution of (4.9), under H_A^ψ and H_A^φ, can be investigated from the first order Taylor expansions of pseudo-scores Cψ θ˜nT

and Cβ θ˜nT

around

(9)

θ^∗ = β₀^′, ψ^′₀+δ_ψ^′/√

N , φ^′₀+δ_φ^′/√ N^′

. These expansions can be written as

√N C_ψ θ˜_nT

=√

N C_ψ θ^∗

−G^′_ψ(θ^∗)Σb⁻_nT¹G_ψ θ

δ_ψ−G^′_ψ(θ^∗)Σb⁻_nT¹G_φ θ

δ_φ (4.10)

+√

N G^′_ψ(θ^∗)Σb⁻_nT¹G_β θ β˜_nT −β0

+op(1),

√N C_β θ˜_nT

=√

N C_β(θ^∗)−G^′_β(θ^∗)Σb⁻_nT¹G_ψ θ

δ_ψ−G^′_β(θ^∗)Σb⁻_nT¹G_φ θ

δ_φ (4.11)

+√

N G^′_β(θ^∗)Σb⁻_nT¹G_β θ β˜nT −β0

+op(1),

where θlies between ˜θ_nT and θ^∗. Note that θ^∗ =θ₀+o_p(1) impliesθ=θ₀+o_p(1). By Lemma 1, we haveB(θ^∗) =H+o_p(1), andG^′(θ^∗)Σb_nT =G^′Σ_nT+o_p(1). Then, from (4.10) and (4.11), we get the following fundamental result:

√N C_ψ θ˜nT

=

Gψ^′ Σ⁻_nT¹ − HψβH⁻β¹Gβ^′Σ⁻_nT¹ 1

√N gnT(θ0) (4.12)

−

Hψ− HψβH⁻_β¹Hβψ δ_ψ−

Hψφ− HψβH⁻_β¹Hβφ

δ_φ+o_p(1).

By Lemma 1, we have √¹

Ng_nT(θ₀) −→^d N 0,plim_n_→∞Σ_nT

, and thus (4.12) implies that

130

√N C_ψ θ˜_nT d

−

→ N − Hψ·βδ_ψ − Hψφ·βδ_φ,Hψ·β

, where Hψ·β =

Hψ − HψβH⁻_β¹Hβψ , and Hψφ·β =

Hψφ − HψβH_β⁻¹Hβφ

. Hence, LM_ψ θ˜_nT d

−

→ χ²_k

ψ(ϑ₁) under H_A^ψ and H_A^φ, where

132

ϑ₁ = δ^′_ψHψ·βδ_ψ +δ^′_ψHψφ·βδ_φ +δ^′_φH^′ψφ·βδ_ψ +δ_φ^′H^′ψφ·βH⁻_ψ_·¹_βHψφ·βδ_φ is the non-centrality parameter.⁷ We provide the distributional results for LM_ψ θ˜_nT

and its robust version in the following

134

proposition.

Proposition 2. — Given our stated assumptions, the following results hold.

136

1. UnderH_A^ψ and H_A^φ, we have LMψ θ˜nT d

−

→χ²_k_ψ(ϑ1), (4.13)

whereϑ1 =δ_ψ^′ Hψ·βδ_ψ+δ_ψ^′ Hψφ·βδ_φ+δ^′_φH^′ψφ·βδ_ψ+δ_φ^′H^′ψφ·βH⁻ψ·¹βHψφ·βδ_φ. 2. UnderH_A^ψ and H₀^φ, we have

LM_ψ θ˜_nT d

−

→χ²_k_ψ(ϑ2), (4.14)

whereϑ₂ =δ_ψ^′ Hψ·βδ_ψ.

138

3. UnderH₀^ψ and H_A^φ, we have LM_ψ θ˜nT d

−

→χ²_k_ψ(ϑ3), (4.15)

whereϑ3 =δ_φ^′H^′ψφ·βH⁻ψ·¹βHψφ·βδ_φ. 4. Let C_ψ^⋆^′ θ˜_nT

=

C_ψ θ˜_nT

−B_ψφ_·_β θ˜_nT

B_φ⁻_·¹_β θ˜_nT

C_φ θ˜_nT

be the adjusted pseudo-score, whereB_ψφ_·_β θ˜_nT

=B_ψφ θ˜_nT

−B_ψβ θ˜_nT

B_β⁻¹ θ˜_nT

B_βφ θ˜_nT

, and B_φ_·_β θ˜_nT

=B_φ θ˜_nT

−

7For the definition of non-centrality chi-square distribution, see Anderson (2003, p.81-82).

(10)

B_φβ θ˜_nT

B_β⁻¹ θ˜_nT

B_βφ θ˜_nT

. Under H₀^ψ and irrespective of whether H₀^φ or H_A^φ holds, we have

LM_ψ^⋆ θ˜_nT

=N C_ψ^⋆^′ θ˜_nT

B_ψ_·_β θ˜_nT

−B_ψφ_·_β θ˜_nT

B_ψφ^′ _·_β θ˜_nT₋1

C_ψ^⋆ θ˜_nT

−d

→χ²_k_ψ. (4.16)

5. UnderH_A^ψ and H₀^φ, we have LM_ψ^⋆ θ˜_nT d

−

→χ²_k

ψ(ϑ₄), (4.17)

whereϑ4 =δ_ψ^′ H^ψ·β− H^ψφ·βH⁻φ·¹βH^′ψφ·β

δψ.

140

Proof. See Section C.2.

There are three important observations regarding to the results presented in Proposition 2.

142

First, the one directional test has a non-central chi-square distribution when the alternative model is misspecified, i.e., when the alternative model includes φ₀. The non-centrality parameter is

144

ϑ₃ = δ^′_φH^′ψφ·βH⁻ψ·¹βHψφ·βδ_φ, which would be zero if and only if Hψφ·β = 0. Second, the robust test LM_ψ^⋆ θ˜_nT

has a central chi-square distribution even when the alternative model is locally

146

misspecified. Finally, LM_ψ^⋆ θ˜_nT

has less asymptotic power than LM_ψ θ˜_nT

, since ϑ₂ −ϑ₄ ≥ 0 underH_A^ψ and H₀^φ.

148

Proposition 2 provides a template that can be used to determine the test statistics for the following hypotheses:

150

1. The null hypothesis for the contemporaneous spatial lag terms: H₀^λ :λ₀ = 0 in the presence of ρ0 andγ0.

152

2. The null hypothesis for the spatial lag terms at time t−1: H₀^ρ:ρ0= 0 in the presence ofλ0

and γ₀.

154

3. The null hypothesis for the time lag term: H₀^γ :γ₀= 0 in the presence ofλ₀ and ρ₀.

In the following, we provide the test statistic for each hypothesis and leave the detailed derivations to Appendix B. We start with H₀^λ :λ₀ = 0. In the context of this hypothesis, φ= ρ^′, γ^′

. Then, the one directional test can be written as

LMλ θ˜nT

=N C_λ^′ θ˜nT

Bλ·β θ˜nT₋1

Cλ θ˜nT

, (4.18)

where B_λ_·_β θ˜_nT

= B_λ θ˜_nT

−B_λβ θ˜_nT

B_β⁻¹ θ˜_nT

B_βλ θ˜_nT

. Then, LM_λ θ˜_nT d

−

→ χ²_p(ϑ₂) under H_A^λ and H₀^φ; and LM_λ θ˜_nT d

−

→ χ²_p(ϑ₃) under H₀^λ and H_A^φ, where ϑ₂ = δ^′_λHλ·βδ_λ and ϑ₃=δ^′_φH^′λφ·βH⁻_λ_·¹_βHλφ·βδ_φ. The robust version is stated as

LM_λ^⋆ θ˜_nT

=N C_λ^⋆^′ θ˜_nT

B_λ_·_β θ˜_nT

−B_λφ_·_β θ˜_nT

B_λφ^′ _·_β θ˜_nT₋1

C_λ^⋆ θ˜_nT

, (4.19) where C_λ^⋆ θ˜_nT

=

C_λ θ˜_nT

−B_λφ_·_β θ˜_nT

B⁻_φ_·¹_β θ˜_nT

C_φ θ˜_nT

is the adjusted score. Irrespective

156

of whether H₀^φ or H_A^φ holds, LM_λ^⋆ θ˜nT

has an asymptotic χ²_p distribution under H₀^λ by Propo- sition 2. Finally, under H_A^λ and H₀^φ, we have LM_λ^⋆ θ˜_nT d

−

→ χ²_p(ϑ4), where ϑ4 = δ_λ^′ Hλ·β −

158

Hλφ·βH⁻φ·¹βH^′λφ·β

δ_λ.