TECHNICAL APPENDIX - Cross-EntropyEstimationofLinearCointegratedEquations Balcombe,Kelvin Munic

Notation

The notation jbj whereb is a vector in R^K denotes the vector of absolute values of that vector whereaskbk denotes the Euclidean length ofb. The inequality between two vectors, such as b < v; indicates that every element of b is less than v; and max (b) denotes the maximum element of the vector b. An open (closed) ball of radius"inR^K around a point bis denoted asS(b; ") (S[b; "]). (That is,S(b; ")is the set of all vectors xfor whichkx bk< "; whereb; x2R^K and the closed ball is de…ned in the same way with kx bk "). If B_f;T denotes an open set, then B^c

f;T denotes the closure of B_f;T (as in 3.19 Apostol, 1974). 1 denotes a conformable vector of ones. All other quantities are as de…ned in the main text.

The negative cross-entropy function maximised in the paper (3.10)

E_T (b) =f_T(b) +g(b) (8.1) is the sum of the two entropy functions. The …rst is:

g(b) =

g(:) is non-stochastic function which only depends on a K 1 vector b, whereas f_T(:) is stochastic sequence of functions since and M are stochastic. For this reason it is useful to subscript f(:) with T. Consequently, E_T(:); f_T(:) and their domains usefully acquire T subscripts here, although not in the main text. The following can be veri…ed straightforwardly:

a) The domain ofg(:) isB_g =fb:b_u< b <b_lg;

b) The codomain ofg(:) isG= ( 1; g(b )] whereg(b ) = 0; Lemma 1: f_T andgare …nitely twice continuously-di¤erentiable (w:r:t tob) every-where within their domains for allT.

Proof of Lemma 1: f (:) and g (:) (de…ned in [3.9]) are di¤erentiable on I^K = (0;1) (0;1)::: (0;1)and p(b)and w(b) [3.6] [3.7] are di¤erentiable with respect to bk at any point inR^K: Therefore, for any valueb for which p(b)2I^K; w(b) 2I^K the derivatives of the composite functions of f_T and g must (Theorem 5.5. Apostol, 1974) exist. Applying the chain rule, the partial derivatives are de…ned. The …rst order derivatives above are composites of continuous di¤er-entiable functions on the domains ofg(b)andf_T (b)respectively. Therefore, the second order derivatives are (f or d_k;i= 1; k=i; and zero otherwise)

f_k;i;T⁰⁰ (b) = and the domain of g_k;i⁰⁰ (b) is de…ned on everywhere on R^K except at the boundary of B_g:

Lemma 2: Negative Cross Entropy is a concave function w:r:t: b everywhere on its domain.

Proof of Lemma 2:

Under Lemma 1, the condition that the Hessian Matrices for f_T() and g(:) are negative de…nite is su¢cient for concavity, (Magnus and Neudecker, 1994, Theorem, 7, note 2).

The Hessian for g(b) is 5²g(b) = n

g_k;i⁰⁰ (b)o

k;i= d_k;i

(bk b_l;k) (b_u;k b_k) _k;i

d_k;i = 1where i=k, and 0 otherwise. (8.8) which is a diagonal matrix with diagonal negative elements (and therefore negative de…nite). The Hessian matrix forfT() is

5²f_T (b) =M⁰ (b)M (8.9) where the center matrix (b) is also diagonal with diagonal negative ele-ments

(b) =f _ijg _jj = 1

s² v_j(b)² , _ij = 0 otherwise: (8.10) SinceM is invertible, for any non-zero vector z,z⁰5²f(b)z=z⁰M⁰ (b)M z = y⁰ (b)y <0:Noting that the sum of two convex (concave) functions is also convex (concave) (Berck and Sydsaeter, 12.10) completes the proof.

Lemma 3:

IfB^c

f;T B_g , theng(:) is bounded (above and below) onB_f;T: Proof of Lemma 3: IfB^c

f;T B_g theng(:) is de…ned onB^c

f;T and since B^c

f;T

is a compact set, and g(:) is continuous on B^c

f;T then g(:) is bounded on B^c

f;T (Apostol, 1974 Theorem 4.25). Therefore, g(:) has a …nite supremum and in…mum on B_f;T.

Lemma 4 IfB^c

f;T B_g;thenET (:) is de…ned onB_f;T and B_f;T contains a maximum point (~b) at whichrET ~b = 0

Proof of Lemma 4: From Lemma 1,g(:)has a …nite in…mum and supremum on B_f;T. Consequently,

Kln 1

2 + sup

Bf;T

g(b) sup (E_T(b)) Kln 1

2 + inf

Bf;T

g(b) : As b approaches the boundary of B_f;T from any direction, f_T (b) ! 1:

Consequently, E_T (:) ! 1 as b approaches its boundary from any di-rection since g(b) is bounded above and below. Therefore, a point can always be chosen su¢ciently close to the boundary of B_f;T for which ET(:) is less thansup (E_T (b)). The supremum must therefore be contained within B_f;T and must therefore be a maximum. The second part of the Lemma rET ~b = 0 follows from the fact that under Lemma 1, the derivatives of E_T are …nite over theB_f;T (though not bounded). Using Apostol (1974) p.362, Ex 12, ifE_T contains a maximum withinB_f;T then the existence of …-nite partial derivatives withinB_f;T, is su¢cient to ensure that the derivatives are zero at the maximum point.

Lemma 5

Tlim!1Prob B^c_f;T B_g = 1 (8.11) Proof of Lemma 5:

The proof of Lemma 5, is in two parts. The …rst part, shows that for any point that is a …xed distance from ^b will asymptotically not belong in B^c

f;T with probability one. Conversely, the second part shows that for any point within a radius of ^'₂ from ^b will asymptotically be a member of B_g with probability one. Therefore, any point which is close enough to ^b to be a member of B^c

f;T must asymptotically also be a member of B_g with probability one.

Part 1:

For any point(b) (using notation de…ned at the beginning of Section 3)

b= ^b M ¹v(b) : (8.12)

An open K-Ball around^bcan be expressed as:

S(^b; ") = b: b ^b

b ^b =v(b)⁰M ²v(b)< "² : (8.13)

The Closure of B_f;T is:

B^c_f;T =n

b:jv(b)j= M b ^b 1so

: (8.14)

De…ne the vector

b ^b

N ¹ =h(b)⁰ : (8.15)

Under C2 and C5, M ² =N ¹Q ¹N ¹: Therefore, any point in B^c

f;T has the property that:

v(b)⁰v(b) =h(b)⁰Q ¹h(b) Ks²₀T² : (8.16) The middle part of [8.16] can be decomposed as:

h(b)⁰Q ¹h(b) =h(b)⁰ Q ¹ N h(b) + b ^b

N ¹ b ^b : (8.17) Under C2,

Q ¹ N !^p Q ¹ (8.18)

where Q ¹ is positive de…nite. Under C5, T ² N ¹ diverges. There-fore, a small positive can exists for which b ^b

T ² N ¹ b ^b >

T b ^b

b ^b and for b ^b

b ^b > "²;

Tlim!1Prob T b ^b

b ^b Ks²₀ = 0: (8.19) Therefore, any point more than a …xed Euclidean distance" >0 from^bwill not asymptotically lie within B^c

f;T with probability one. Consequently, lim!1Prob B^c_f;T S ^b; " = 1: (8.20) Part 2:

Under C1,S( ; ') B_g:Therefore, given the consistency of^b;for any" >0

Tlim!1Prob ^b < " = 1 (8.21) C1-C5, in turn, implies that for any "2 0;^'₂

Tlim!1Prob S ^b; " S( ; ') B_g = 1: (8.22)

Therefore

Tlim!1Prob B^c_f;T S ^b; " B_g = 1: (8.23) B_E;T will become non-empty (in probability) since it becomes equivalent to B_f;T which by de…nition is a non-empty K-Ball around^b

The following Lemmas are most easily stated and proved as a group.

Lemmas 6.1, 6.2: Under C1-C5:

6:1)5g ^b ! 5g^d ( ); (8.24)

6:2)5²g ^b ! 5^d ²g( ):

Proof of Lemmas 6.1 and 6.2:

From Lemma 1, 5g( )and 5²g( )exist and are …nite . If cross entropy is de…ned at^b, by the continuous mapping theorem (Davidson, 1994, Theorem, 22.11 the consistency of^b;and Lemma 5, 6.1 and 6.2. hold.

Lemmas 7.1, 7.2: Under C1-C5:

7:1) 5f_T ^b = 0; (8.25)

7:2)T² N 5²f_T ^b N =s₀²Q : Proof of Lemmas 7.1. and 7.2:

Lemma 7.1 is trivially proved by observing that ^b = 0;and therefore 5fT ^b = 1

2sM⁰ ^b = 0: (8.26)

Lemma 7.2. follows from

5²f_T ^b =M⁰ ^b M = 1

s²M⁰M : (8.27) From C2 and C5

5²f_T ^b = 1

s²N ¹G⁰GN ¹ = T ² s₀²N ¹QN ¹: (8.28) Therefore,

T² N 5²f_T ^b N = s₀²Q : (8.29)

Lemma 8.1 and 8.2: Under C1 to C5 (and de…ning two new quantitiesW₁ ^b and W₂ ^b ) :

8:1 : W₁ ^b =T² N 5E_T ^b !^d 0;

8:2 : W₂ ^b =T² N 5²E_T ^b N !^d s₀²Q : Proof of Lemma 8.1:

From Lemma 7.1

W₁ ^b =T² N5ET ^b =T² N 5g ^b +5fT ^b =T² N 5g ^b : (8.30) The second component

T² N 5g ^b !^d 0 (8.31)

follows from Lemma 6.1 and T² N !0 (under C5).

Proof of Lemma 8.2:

ExpandingW₂ ^b and then using Lemma 7.2 :

W₂ ^b = T² N 5²f_T ^b N+T² N 5²g ^b N (8.32)

= s₀²Q+T² N5²g ^b N :

From Lemma 6.2,5²g ^b ! 5^d ²g( );and under C5, T² N !0:Therefore:

T² N 5²g ^b N !^d 0: (8.33)

Proof of Theorem 1:

Theorem 1 claimed that under C1-C5 the estimator ^b = M ¹ and the cross-entropy estimator~b have the property

N ¹ ~b ^b !^d 0: (8.34)

Proof of Theorem 1:

Lemmas 1 through to 3 establish that if B^c

f;T B_g then cross entropy will be de…ned, the derivatives will exist, and negative cross entropy will have a maximum at a point where the derivatives are equal to zero. Lemma 5 establishes that B^c

f;T B_g will be met asymptotically with probability one.

Therefore, Lemmas 1 through 5, establish that the cross-entropy estimator will exist on the interior of B_f;T asymptotically with probability one. The functions are concave everywhere onB_f;T and that the maximum will have a derivative of zero. Therefore, using an expansion for~b(the entropy estimate) around^b=M ¹

rE ~b = 0 =rE ^b +r²E ^b ~b ^b +o ~b ^b ; (8.35) a manipulation of [8.35] gives

N ¹ ~b ^b = N ¹ r²E ^b ¹rE ^b

N r²E ^b N T² ¹N T² o ~b ^b :(8.36) Using the de…nitions in Lemmas 8.1 and 8.2,

N ¹ ~b ^b = W₂ ^b ¹W₁ ^b W₂ ^b ¹N T² o ~b ^b : (8.37) By using Lemmas 8.1, 8.2 and C5, each of the components on the right hand side converge to zero in distribution. Therefore,

N ¹ ~b ^b !^d 0: (8.38)

which completes the proof of Theorem 1.

Remark.

Note that the above also suggests an approximate relationship between the entropy estimate ~b and ^b as

~b ^b r²E ^b ¹5g ^b (8.39) which may be useful approximation in practice.

9. Weak Convergence Results

These results are outlined in the work of Phillips, for which Phillips, (1990) is a starting reference. Using similar notation to that in Balcombe and Ti¢n (2001), Equation A1 and A2 give

Z_t: ¹Z_t⁰N =G⁰G!^d Z

W_e ¹W_e⁰ =G ⁰G (8.40) and

T ¹X

Z_t: ¹: ut

e_t

W_e: ¹: :d! +V ec _e ⁰ ¹ (8.41) where W_e = I_k !_e where !_e and ! are vectors of Brownian Motions.

The construction of ensures that ! is independent of !_e and therefore R W_e: ¹: :d! is mixed normal with mean zero and covariance matrix R W_e: ¹W_e⁰:Therefore, givenN =T ¹I it follows that:

z_t ¹ u_t et

! Md ^N V ec _e ⁰ ¹ ; G ⁰G : (8.42) It follows that v constructed as in [5.11] weakly converges to a multivariate normal.

REFERENCES

Andrews D.W.K. (1991). Heteroskedasticity and Autocorrelation Consistent Covari-ance Matrix Estimation, Econometrica, 59, 817-858.

Apostol, T.M. (1974) Mathematical Analysis, Addison Wesley Publishing.

Balcombe .G. and Ti¢n R.(2002). Fully Modi…ed Estimation with Cross Equation Restrictions,Economics Letters, 74, 257-263

Berck P. and Sydsater K. (1993) Economists Mathematical Manual, Second Edition, Springer-Verlag.

Davidson, J. (1994) Stochastic Limit Theory, Advanced Texts in Econometrics, Oxford University Press.

Golan, A., Judge, G. and Miller, D. (1996). Maximum Entropy Econometrics, Robust Estimation with Limited Data. Series in Financial Economics and Quantitative Analysis. Wiley,

Golan, A. Judge, G. and Perlo¤, J. (1997). Estimation and inference with censored and ordered multinomial response data. Journal of Econometrics 79, 23-51.

Golan Amos, and Perlo¤ J. (2002). Comparison of maximum entropy and higher-order entropy estimators, Journal Of Econometrics (107)1-2 195-211

Golan, A. Moretti E. and Perlo¤ J.M. (1999). An Information Based Sample Selection Estimation of Agricultural Worker’s Choice between Piece Rate and Hourly Work.

American Journal of Agricultural Economics, Vol 81, 3, 735-741.

Golan, A. (2002). Information and Entropy Econometrics, Editors View. Journal of Econometrics, 107 1-15.

Golan A. and Gzyl H. (1999). A Generalized Maxentropic Inversion Procedure for Noisy Data. Applied Mathematics and Computation (forthcoming).

Hamilton J.D. (1994), Time Series Analysis, Princeton University Press, New Jersey.

Harmon, A. Preckel, P.V. and Eales J., (1998). Entropy Based Seemingly Unrelated Regression, Sta¤ Paper #98-8, Dept of Agricultural Economics. Purdue Univer-sity.

Haug A. (1999). Testing linear restrictions on cointegrating vectors: sizes and powers of Wald and Likelihood ratio tests in …nite samples. Working Paper: University of Canterbury.

Kullback J. (1959). Information Theory and Statistics. John Wiley, New York.

Magnus J.R, and Neudecker H. (1994) Matrix Di¤erential Calculus with Applications in Statistics and Econometrics, Wiley Series in Probability and Mathematical Statistics, Wiley and Sons..

Marsh L., R.C. Mittlehammer, and Cardell S. (1998). A Structural-Equation GME Estimator. A Selected Paper 1998 AAEA Annual Meeting, Salt Lake

Moon H.R. (1999). A note on fully-modi…ed estimation of seemingly unrelated regres-sions models with integrated regressors. Economics Letters 65, 25-31.

Paris Q. (2001). MELE, Maximum Entropy Leuven Estimators, 2001, Working Pa-per, 01-003, California Agricultural Experiment Station, Gianinni Foundation for Agricultural Economics.

Prekel P.V. (2001). ‘Least Squares and Entropy A Penalty Function Perspective’, American Journal of Agricultural Economics, 83 (2) 366-377.

Phillips P.C.B. and Hansen B. 1990). Statistical inference in instrumental variable regressions with I(1) processes. Review of Economics Studies 57, 99-125.

Shannon C.E. (1948). A Mathematical Theory of Communication. Bell System Tech-nical Journal, 27, 379-423.

Thirtle, C., Sartorious von Bach, H. and van Zyl, J. (1993) Total Factor Productivity in South African Agriculture, 1947-1992. Development South Africa, 10, 301-318.

Xaio, Z and Phillips P.C.B. (2002) Higher order approximations for Wald Statistics in Time Series Regression with Integrated Processes. Journal of Econometrics, 108, 157-198.

Zellner, A. (1996) Models, prior information, and Bayesian Analysis.Journal of Econo-metrics, 75, 51-68.

Zellner A. (1997). A Bayesian Method of Moments (BMOM): Theory and Applications.

Advances in Econometrics, vol 12. Applying Maximum Entropy to Econometric Problems: Eds T.B. Formby and R.C. Hill, pp. 85-105. Grenwich JAI Press, 1997.

Zellner, A. (1999) New Information Based Econometric Methods in Agricultural Eco-nomics: Discussion, American Journal of Agricultural Economics, Vol 81, 3, 742-46

Zellner A. (2002), Information Processing and Bayesian Analysis. Journal of Econo-metrics, 107, 41-50.

Im Dokument Cross-EntropyEstimationofLinearCointegratedEquations Balcombe,Kelvin MunichPersonalRePEcArchive (Seite 21-32)