Choosing the Starting Values - Estimation and Inference

Joint with Harald Uhlig

2.3 Estimation and Inference

2.3.3 Choosing the Starting Values

In general one can start the iteration cycle with any arbitrary randomly drawn set of parameters, as the joint and marginal empirical distributions of the generated param-eters will converge at an exponential rate to its joint and marginal target distributions asS→∞. This has been shown by Geman and Geman [18]. Since Gelman and Rubin [17] have shown that a single chain of the Gibbs sampler might give a "false sense of security ", it has become common practice to try out different starting values. We check our results based on four different strategies regarding the set of starting values. One out of many convergence diagnostics involves testing the fragility of the results with respect to the starting values. For the results to be reliable, estimates based on different stating values should not differ. Strictly speaking, the different chains should repre-sent the same target distribution. In order to verify we start our Gibbs sampler with the following summarized starting values respectively.

(i) Randomly drawθ₀from (over)dispersed distribution

(ii) Set θ₀ to rather "agnostic values" which involves setting 0’s for coefficients and 1’s for variances⁵

(iii) Setθ₀to results from principal component analysis.⁶In such a way the number of draws required for convergence can be reduced considerably.

4For a survey and more details see Kim and Nelson [21], Eliasz [13] and Bernanke, Boivin, and Eliasz [6]

5This strategy has been applied by Belviso and Milani [3].

6This strategy is particularly suited for large models as the ones studied here and has been proposed by Eliasz [13].

(iv) Setθ₀to parameters of the last iteration of the previous run.

Despite the strategies above convergence is never guaranteed, particularly in large models. Hence it is recommended to restart a chain many times applying the strat-egy 4.

2.3.4 Conditional density of the factorsF^T given X^T andθ

In this subsection we want to sample from p(F^T | X^T,θ)assuming that the data and the hyperparameters of the parameter spaceθ are given, hence we describe Bayesian inference on the dynamic evolution of the factors f_tconditional onX_t^cfort = 1, . . . ,T and conditional on θ. The transformations that are required to draw the factors have been done in the previous section. The conditional distribution, from which the state vector is generated, can be expressed as the product of conditional distributions by exploiting the Markov property of state space models in the following way

p(F^T |X^T,θ) = p(F_T |X^T,θ)

T−1

∏

t=1

p(Ft|F_t+1,X^T,θ) The state space model is linear and Gaussian, hence we have:

F_T |X^T,θ ∼N(F_T_|_T,P_T_|_T) (2.3.10) F_t_|_T |F_t₊₁_|_T,X^T,θ ∼N(F_t_|_t,F_t₊₁_|_T,P_t_|_t,F_t₊₁_|_T) (2.3.11) with

F_t_|_t =_F_t_|_t₋₁+_P_t_|_t₋₁_Λ⁰_H⁻¹η_t_|_t₋₁ (2.3.16) P_t_|_t =P_t_|_t₋₁−P_t_|_t₋₁Λ⁰H⁻¹ΛP_t_|_t₋₁ (2.3.17) where η_t_|_t₋₁ = (X_t −_ΛF_t_|_t₋₁) is the conditional forecast error and its covariance is denoted byH_t_|_t₋₁= (_ΛP_t_|_t₋₁Λ⁰+R_e). Furthermore let

F_t_|_t₋₁ =_ΦF_t₋₁_|_t₋₁ (2.3.18) P_t_|_t₋₁ =_ΦP_t₋₁_|_t₋₁Φ⁰+Q_u. (2.3.19) The last iteration of the Kalman filter yieldsF_T_|_TandP_T_|_Trequired for (2.3.10) to draw the last observation and start the Kalman smoother according to (2.3.11) going back-wards through the sample forFt,t=T−2,T−3, . . . , 1 updating the filtered estimates

2.3 Estimation and Inference

with the sampled factors one period up subject to F_t^∗_|_t,F

t+1|T =F_t_|_t+P_t_|_tΦ^∗0J_t⁻₊¹₁_|_tξ_t₊₁_|_t (2.3.20) P_t^∗_|_t,F

This is required whenQ is singular which is the case for the companion form when there is more than one lag in (2.3.3). Here we closely follow Kim and Nelson [21]

where a detailed explanation and derivation can be found.

2.3.5 Conditional density of the parametersθgiven X^T and F^T

Sampling from the conditional distribution of the parameters p(θ | X^T,F^T)requires the blocking of the parameters into the two parts that refer to the observation equation and to the state equation respectively. The blocks can be sampled independently from each other conditional on the extracted factors and the data.

2.3.5.1 Conditional density ofΛandR_e

This part refers to observation equation of the state space model which, conditional on the estimated factors and the data, specifies the distribution of ΛandR_e. The er-rors of the observation equation are mutually orthogonal with diagonalRe. Hence we can apply equation by equation OLS in order to obtain the ols estimates ˆΛn and ˆe^c as the observation equation amounts to a set of independent regressions. Note that the subscript nrefers to then-th equation and all hat variables refer to the respective ols estimates. We assume conjugate priors

p(Rnn) =IG(δ₀/2,η₀/2) p(_Λ_n|R_nn) =N(_Λ_n0,R_nnM⁻_n0¹)

which according to Bayesian results conform to the following conditional posterior distribution

p(Rnn|X^˜_T, ˜F_T) =IG(δ_i/2,η_i/2) p(_Λ^¯_nn|X^˜_T, ˜F_T,R_nn) =N(_Λ^¯_n,R_nnM⁻_n¹). with

η_n =η₀+T

δ_n =δ₀+e^ˆ^c⁰_neˆ^c_n+ (_Λ^ˆ_n−_Λ_n0)⁰^hM⁻_n0¹+ (F_Tⁿ⁰F_Tⁿ)⁻¹ⁱ⁻¹(_Λ^ˆ_n−_Λ_n0) M¯_n =M_n0+ F_Tⁿ⁰F_Tⁿ

Λn =M^¯n(M⁻_n0¹Λn0+ F_Tⁿ⁰F_TⁿΛˆn)

where we set the same prior specification(δ₀ =_6,η₀ = ₁₀⁻³_,M_n0 = I_K_c,Λn0 = ₀_K_c_×₁) as in Bernanke, Boivin, and Eliasz [6] in order to allow an adequate comparison. M₀

denotes the matrix in the prior on the coefficients of then-th equation ofΛn. The factor normalization discussed earlier requires to set M₀ = I. The regressors of the n-th equation are represented byF_Tⁿand the fitted errors of then-th equation are represented by ˆe^c_{t n}.

2.3.5.2 Conditional density ofvec(φ)andQ_u

The next Gibbs block requires to drawvec(φ)andQuconditional on the most current draws of the factors and the data. We employ the Normal-Inverse Wishart prior ac-cording to Uhlig [30]

p(Q_u) =IW(S₀,ν₀) p(vec(φ)|Q_u) =N(φ^¯₀,Q_u⊗N₀⁻¹) which results in the following posterior:

p(Qu|X^T,F^T) =IW(ST,νT)

P(vec(φ)|X^T,F^T,Qu) =N(vec(φ^¯_T),Qu⊗N_T⁻¹) with

ν_T =T+ν₀

N_T =N₀+ (F_T⁰₋₁F_T−1)

φ¯_T =N_T⁻¹(N₀φ¯₀+F_T⁰₋₁F_T₋₁φˆ) S_T =^ν⁰

ν_TS₀+ ^T ν_T

Qˆ_u+ ¹

ν_T(φ^ˆ −φ^¯₀)⁰N₀(N_T)⁻¹(F_T⁰₋₁F_T−1)(φ^ˆ−φ^¯₀) This prior and has the following specification

ν₀ =K+2 N₀ =0_K×K

where the choice of S₀ and ¯φ₀ are arbitrary as they cancel out in the posterior. We alternatively also implemented the Normal-Wishart prior for according to Kadiyala and Karlsson [20] where the diagonal elements ofQ0are set to the correspondingp-lag univariate autoregressions,σ_i². The diagonal elements ofΩ0are constructed such that the prior variances of the parameter of the k lagged j’th variable in the i’th equation equals σ_i²/kσ_j². Hence S0 = Q0 and ¯φ₀ = 0. Results were virtually the same. To ensure stationarity, we truncate the draws by discarding the draws ofφwith the larges eigenvalue greater than 1 in absolute value.

2.4 Identification

The major objective of this paper is to identify monetary policy shocks in a data rich environment through imposing sign restrictions as introduced by Uhlig [31] for the VAR framework. The issue of how to identify structural shocks through the

decompo-2.4 Identification sition of the prediction erroru^c_t and in particular of monetary policy shocks, has been subject of much debate in the literature. From an economic perspective it seems de-sirable to have an identification scheme which guarantees that the impulse responses satisfy conventional wisdom. In FAVAR models the task is actually the same with the main distinction that the structural shocks are not required to be deduced from the in-novation of the reduced form observation equation, but from the FAVAR inin-novation, including the factors that summarize the crucial dynamics of the observed data. For comparison purposes we will employ two identifying schemes, namely the factor gen-eralization of the aforementioned sign restriction and the factor gengen-eralization of the recursive Cholesky identification. These two identification schemes shall be explained in the following.

Im Dokument Essays in empirical macroeconomics with application to monetary policy in a data-rich environment (Seite 29-33)