• Keine Ergebnisse gefunden

Three Essays on Bayesian Nonparametric Modeling in Microeconometrics

N/A
N/A
Protected

Academic year: 2022

Aktie "Three Essays on Bayesian Nonparametric Modeling in Microeconometrics"

Copied!
110
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Nonparametric Modeling in Microeconometrics

Dissertation

zur Erlangung des Grades

Doktor der Wirtschaftswissenschaften (Dr. rer. pol.) am Fachbereich Wirtschaftswissenschaften

der Universit¨at Konstanz

vorgelegt von:

Markus Jochmann Mainaustraße 61 78464 Konstanz

Tag der m¨undlichen Pr¨ufung: 26. Juli 2006 1. Referent: Prof. Dr. Winfried Pohlmeier 2. Referent: Prof. Gary Koop

(2)

Diese Arbeit wurde w¨ahrend meiner Zeit am Lehrstuhl f¨ur ¨Okonometrie am Fachbereich Wirtschaftswissenschaften der Universit¨at Konstanz angefertigt.

Ich danke meinem Doktorvater Winfried Pohlmeier f¨ur seine Hilfe, seine Un- terst¨utzung und den mir gew¨ahrten wissenschaftlichen Freiraum. Nicht weniger danke ich allen Kollegen w¨ahrend dieser Zeit. Wir hatten immer eine beson- dere Atmosph¨are am Lehrstuhl, diese h¨atte ich nicht missen wollen.

Da diese Dissertation in der Diaspora entstanden ist, war es f¨ur mich unerl¨asslich, einige Bayesianische Diskussionspartner und Freunde zu haben.

Hier gilt mein Dank vor allem Gary Koop, der sich bereit erkl¨art hat, als zweiter Gutachter zu fungieren. Auch danke ich meinem Koautoren Roberto Le´on-Gonz´alez. Schließlich konnte ich mich mit dummen Fragen immer an Luc Bauwens wenden.

Weiter danke ich der Deutschen Forschungsgemeinschaft und der Univer- sit¨atsgesellschaft Konstanz f¨ur finanzielle Unterst¨utzung meiner Forschungs- vorhaben. Auch bin ich Robert Lee f¨ur seine Hilfe bez¨uglich der englischen Rechtschreibung zu Dank verpflichtet.

Zu guter Letzt danke ich meiner Familie und meinen Freunden.

(3)

Abstract 1

Zusammenfassung 2

1 Introduction -

Summary of the Literature 4

Bibliography . . . 14

2 Estimating the Demand for Health Care with Panel Data - A Semiparametric Bayesian Approach (Essay 1) 21 2.1 Introduction . . . 23

2.2 A Parametric Benchmark Model . . . 24

2.3 A Semiparametric Extension . . . 25

2.4 Bayesian MCMC Sampling . . . 29

2.5 The Data . . . 31

2.6 Results . . . 33

2.7 Conclusion . . . 38

Bibliography . . . 39

Appendix . . . 43

(4)

3 Nonparametric Bayesian Inference for Count Data Treatment

Effects (Essay 2) 47

3.1 Introduction . . . 49

3.2 The Model . . . 50

3.3 Bayesian MCMC Sampling . . . 53

3.4 Empirical Illustration: Number of Trips by Households . . . 55

3.5 Conclusion . . . 60

Bibliography . . . 61

Appendix . . . 65

4 Nonparametric Bayesian Inference for Quantile Treatment Ef- fects (Essay 3) 73 4.1 Introduction . . . 75

4.2 The Model . . . 77

4.3 Bayesian Computation . . . 81

4.4 Empirical Application . . . 84

4.5 Conclusions . . . 89

Bibliography . . . 90

Complete Bibliography 95

Erkl¨arung 105

Abgrenzung 106

(5)

This dissertation is comprised of three essays on nonparametric Bayesian mod- eling in microeconometrics. The introduction discusses some basic concepts of Bayesian nonparametrics including the Dirichlet process and the mixture of Dirichlet processes model. Further, the literature on estimating the demand for health care using count data and the literature on treatment effect models is summarized.

The first essay specifies a Bayesian nonparametric random effects model for count data. This model is based on a mixture distribution with a random number of components, and is therefore a natural extension of prevailing latent class models. We propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference and apply the framework to data from Germany.

The second essay is also concerned with count data but focuses on the estimation of causal treatment effects. A potential outcomes model is specified in a nonparametric Bayesian fashion using a mixture of Dirichlet processes.

Posterior inference is done using MCMC simulation methods. We illustrate the proposed techniques with a real data set concerning the mobility of households.

The third essay analyzes distributional treatment effects. The model is based on the potential outcomes framework and focuses on the estimation of quantile treatment effects. Flexibility is achieved by including random inter- cepts in the outcomes equations. These random intercepts are assumed to be drawn from Dirichlet processes. We apply the model to real data using MCMC techniques for posterior simulation.

(6)

Diese Dissertation umfasst drei Aufs¨atze uber¨ nichtparametrische Bayesianische Verfahren in der Mikro¨okonometrie. In der Einleitung werden grundlegende Konzepte der Bayesianischen Nichtparametrik diskutiert. Ins- besondere werden der Dirichlet-Prozess und das Dirichlet-Prozess-Mischmodell angesprochen. Weiter wird die Literatur ¨uber die Sch¨atzung der Nachfrage nach Gesundheitsdienstleistungen auf Basis von Z¨ahldatenmodellen und die Literatur ¨uber Maßnahmeeffekte zusammengefasst.

Im ersten Aufsatz wird ein Bayesianisches nichtparametrisches Zufallseffek- temodell f¨ur Z¨ahldaten vorgestellt. Dieses basiert auf einer Mischverteilung mit einer zuf¨alligen Anzahl von Komponenten und kann daher als Erweiterung der in der Literatur bereits diskutierten Modelle mit latenten Klassen angesehen werden. Ein auf Markov Ketten Monte Carlo (MKMC) Verfahren basierender Algorithmus wird f¨ur die Analyse der a posteriori Verteilung entwickelt. Der Modellrahmen wird schließlich auf deutsche Daten angewendet.

Der zweite Aufsatz besch¨aftigt sich auch mit Z¨ahldaten, fokussiert aber auf die Ermittlung von kausalen Maßnahmeeffekten. Ein auf dem Kausalmodell von Rubin fußendes Modell wird vorgestellt. Dieses wird nichtparametrisch mit Hilfe des Dirichlet-Prozess-Mischmodells formuliert. Die a-posteriori-Ver- teilung wird mit MKMC Verfahren analysiert. Der Aufsatz schließt mit einer Anwendung, die sich mit der Mobilit¨at von Haushalten besch¨aftigt.

Der dritte Aufsatz analysiert schließlich die kausale Wirkung einer Maß- nahme auf die Verteilung der relevanten Zielgr¨osse. Die Sch¨atzung von Quantilsmassnahmeeffekten steht im Vordergrund der Analyse. Hierzu basiert der vorgeschlagene Modellrahmen wiederum auf Rubins Kausalmodell. Die

(7)

Einf¨uhrung von Zufallsthermen, die Dirichlet-Prozessen folgen, erlaubt eine flexible Formulierung des betrachteten Zusammenhangs. Das Modell wird unter R¨uckgriff auf MKMC Verfahren gesch¨atzt und mit realen Daten illustriert.

(8)

Introduction -

Summary of the Literature

This dissertation is comprised of three essays on nonparametric Bayesian meth- ods in microeconometrics. Nonparametric methods have become one of the central areas of Bayesian research and are intensely used in many fields like biometrics or machine learning. In contrast, like their parametric counterparts, they have not been so influential in the econometric literature. Thus, one pur- pose of this thesis is to demonstrate the usefulness of Bayesian nonparametric methods for applied econometric research.

In a literal sense we should not use the term Bayesian ‘nonparametric’

methods. It is an ‘oxymoron and misnomer’, as M¨uller and Quintana (2004) put it. Bayesian inference is always based on a well defined probability model, and thus is inherently parametric. The commonly used definition of Bayesian nonparametrics, rather, refers to models with infinitely many parameters (Bernardo and Smith (1994)).1

The Dirichlet process is the most frequently used prior in the Bayesian nonparametric literature (see MacEachern and M¨uller (2000), and M¨uller and Quintana (2004) for recent surveys). Accordingly, we also employ the Dirichlet

1For a nice discussion of parametric versus nonparametric modeling from a Bayesian point of view see Koop (2003), who uses the term ‘flexible models’ for his chapter on Bayesian nonparametrics.

(9)

process and the closely related mixture of Dirichlet processes model in the three essays of this dissertation. The first essay estimates the demand for health care from panel data. In this application the Dirichlet process is used to relax the assumption on the distribution of the random effects. The second and the third essay deal with treatment effect models. Here, the distribution of the error terms is modeled in a flexible way using a mixture of Dirichlet processes.

The remainder of this introduction gives a review of the literature related to this dissertation. First, the Dirichlet process and issues of its application in Bayesian nonparametrics are discussed. Second, we review the literature on count data regression for estimating the demand for health care. Finally, the fundamental aspects of treatment effect models are summarized.

The Dirichlet Process

The Dirichlet process was introduced by Ferguson (1973) as a prior distribu- tion on spaces of probability measures. It is defined by the following property:

A random probability measureGis generated by a Dirichlet process with pre- cision parameterν >0 and base distributionG0if for any partitionB1, . . . , Bm

on the space of support of G0 the vector of probabilities (G(B1), . . . , G(Bm)) follows a Dirichlet distribution with parameter (νG0(B1), . . . , νG0(Bm)). The expectation and variance ofG are defined by

E(G) =G0, (1.1)

and, for any eventA,

Var[G(A)] = G0(A)[1−G0(A)]

ν+ 1 . (1.2)

More aspects of the Dirichlet process are discussed, among others, in Fer- guson (1973), Antoniak (1974), Cifarelli and Melilli (2000) and Ghosh and Ramamoorthi (2003).

To better understand the properties of the Dirichlet process, it is useful to look at two of its representations. Sethuraman (1994) shows that any G ∼

(10)

DP(ν, G0) can be represented as G=

X

h=1

ωhδθh, (1.3)

ωh =Uh

Y

j<h

(1−Uj) with Uh

iid∼Beta(1, ν), (1.4)

θh

iid∼G0. (1.5)

That is, realizations of the Dirichlet process can be represented as infinite mixtures of point masses δθh. The random weights ωh are generated from a

‘stick-breaking’ prior and the locationsθh are drawn from the base distribution G0. To obtain the ‘stick-breaking’ prior, we start with a stick of length 1, break it in two pieces and throw one away. Then we break the remaining piece again in two, and throw away one of those pieces. If we continue for an infinite number of breaks, we finally obtain an infinite set of stick-lengths that sum up to 1 with probability 1. Note that Sethuraman’s construction suggests how to easily sample from a Dirichlet process.

The P´olya urn representation of Blackwell and MacQueen (1973) specifies the conditional distribution ofθn given θ1, . . . , θn−1:

θn1, . . . , θn−1 ∼ 1 n−1 +ν

n−1

X

i=1

δθi + ν

n−1 +νG0. (1.6) Thus, with probability proportional toν, the parameter θn is drawn from the base distribution G0, and with probability proportional to 1 for every θi, the new parameter value is given by θi.

We can easily see from both Sethuraman’s constructive construction and the P´olya urn representation of the Dirichlet process that it is almost surely a discrete random probability measure. That is, some θi’s share the same value and thus the number of distinct values is smaller than the sample size.

Given that this discreteness is inappropriate for many applications, mixtures of Dirichlet processes models (MDP), which were introduced by Antoniak (1974),

(11)

add a further deconvolution leading to the hierarchical model F(x) =

Z

f(x|θ)dG(θ), (1.7)

θ ∼DP(ν, G0), (1.8)

where f(x|θ) denotes a parametric family of densities indexed byθ.

The P´olya urn representation of the Dirichlet process is the cornerstone of most computational methods for fitting mixtures of Dirichlet processes models.

Escobar (1994) and Escobar and West (1995) show that one can use Markov chain Monte Carlo (MCMC) techniques for that purpose.2 They construct a Gibbs sampler (Geman and Geman (1984), Gelfand and Smith (1990)) by repeatedly drawingθi from its conditional posterior distribution given the data and the θj for j 6= i (written as θ−i). This distribution can be obtained by combining the likelihood and the prior conditional on θ−i:

θi−i ∼ 1 n−1 +ν

X

i6=j

δθj+ ν

n−1 +νG0. (1.9) This conditional prior distribution can be derived from equation (1.6) by treat- ing observation i as the last observation (which we can do since the observa- tions are exchangeable). Denoting the likelihood by F(yi, θi), we obtain the following conditional posterior distribution

θi−i, yi ∼X

j6=i

qjδθj+q0H, (1.10)

where H denotes the posterior distribution for θ given the prior G0 and the single observation yi. The weights qj and q0 are given by

qj ∝F(yi, θj), (1.11)

and

q0 ∝ν Z

F(yi, θ)dG0(θ), (1.12)

2See Robert and Casella (1999), Chen, Shao, and Ibrahim (2000) or Liu (2001) for comprehensive surveys on MCMC methods.

(12)

and sum up to 1. Note, that in order to set up the Gibbs sampler we have to evaluate the integral in (1.12) and sample from H. Both steps are tractable when we chooseG0to be the conjugate prior. Gibbs samplers for non-conjugate choices of G0 have been proposed by, for instance, MacEachern and M¨uller (1998) and Neal (2000).

However, one problem with employing the P´olya urn representation is that integrating out the Dirichlet process rules out computation of functionals of the posterior Dirichlet process that cannot be expressed as expectations of G including, for example, quantiles. Thus, some authors propose to employ a finite approximation of the Dirichlet process instead of integrating it out (see, for instance, Ishwaran and James (2002)). Gelfand and Kottas (2002) combine P´olya urn based sampling with a finite approximation of the Dirichlet process for calculating posterior quantities.

Alternatively to MCMC techniques for fitting MDP models, other ap- proaches have been proposed in the literature. For example, MacEachern, Clyde, and Liu (1999) apply sequential importance sampling methods, and Blei and Jordan (2004) discuss variational methods.

Several approaches for eliciting a prior on the precision parameter ν are discussed in the literature. Escobar and West (1995) assume a Gamma distri- bution forν,

π(ν) = Gam(a0, b0) (1.13)

and choose the prior parameters a0 and b0 based on the distribution of the number of distinct elements of the resulting Dirichlet process. One advantage of their approach is thatν can be easily sampled with the data augmentation technique (Tanner and Wong (1987)).

Another possibility is to interpret ν as a ‘prior sample size’ and specify a prior for ν+nν which is the mass of the base distribution in the posterior predictive distribution (see Carota and Parmigiani (2002) and Griffin and Steel (2004)). Let ξ = ν+nν

0 and assume that ξ ∼Beta(a0,b0), where a0, b0 and n0

are hyperparameters to be chosen. This results in an Inverted Beta distribution for ν:

π(ν) = nb00Γ(a0+b0) Γ(a0)Γ(b0)

νa0−1

(ν+n0)a0+b0, (1.14)

(13)

with

E(ν) = n0a0

b0 −1, b0 >1, (1.15) Var(ν) = n20a0(a0+b0−1)

(b0−1)2(b0−2), b0 >2, (1.16) and

Mode(ν) = n0(a0−1)

b0+ 1 , a0 >1. (1.17) Griffin and Steel suggest choosing a0 = b0 = η0, which means that the prior median ofν isn0. Thus, the prior is centered aroundn0 with its variance decreasing in η0. In this caseν can be sampled using the Metropolis-Hastings algorithm.

One restriction of the Dirichlet process is that it does not allow to model the relationship between covariates and the unknown distribution directly. How- ever, this fact received increasing attention in the literature and some possi- ble extensions have been developed. Cifarelli and Regazzini (1978) consider the case of discrete covariates and propose the Product of Dirichlet Processes model. They use Dirichlet process priors at each level of the covariate but link them through a common regression component in the base distribution.

A similar approach has been used in the econometric literature by Griffin and Steel (2004). An alternative approach is proposed by MacEachern (1999) who discusses the Dependent Dirichlet process (DDP). He starts from the stick- breaking representation and assumes that the distribution of the locations θh

is dependent across different levels of the covariate. An alternative strategy is to model dependencies of the weights in the stick-breaking representation.

This approach is followed by Dunson and Pillai (2004) and Griffin and Steel (2006).

Estimating the Demand for Health Care from Count Data

The first essay of this thesis “Estimating the Demand for Health Care with Panel Data: A Semiparametric Bayesian Approach” (Chapter 2) employs a

(14)

mixture of Dirichlet processes model in order to estimate the demand for health services from count data. Given that the utilization of health care is often measured as the number of visits to a physician or another institution, count data models are often encountered in the empirical literature.

A natural starting point for modelling count data is the Poisson distri- bution. However, simple models based on the Poisson distribution have a number of shortcomings. One is that they do not allow for unobserved hetero- geneity. Alternative assumptions concerning the underlying probability distri- bution may fit the data better. Following this reasoning, some papers employ a negative binomial model or a Poisson-log-normal model.

But still these two alternatives cannot fully account for one feature of health care data, which is the high proportion of zero usage. To overcome this prob- lem, two-part models (which are also called hurdle models) were proposed in the literature (Mullahy (1986), Pohlmeier and Ulrich (1995), and Gurmu (1997)). The first part of these models consists of a binary outcomes equa- tion that distinguishes between users and non-users. The second part then specifies the distribution of usage conditional on the fact that it is positive.

Two-part models are attractive since they can be interpreted in terms of a principal-agent model. In a first step the patient decides whether to go to the physician or not, and in the second step the physician determines the level of care. However, the fact that the data is usually recorded over a fixed time pe- riod and not over an illness spell makes this interpretation problematic. Still, the two-part setup can be seen as a reasonable model approach that enables a richer specification of the data generating process.

Another type of model that is able to capture the high proportion of zero usage is the latent class model (Deb and Trivedi (1997, 2002)). This finite mixture model does not distinguish between users and non-users but between frequent users and non-frequent users (in the case of two groups). Thus, the latent class model is appealing if the mixture components can be interpreted in a meaningful way. However, this is not required, like the two-part model it can be regarded as a more flexible framework for modelling count data.

A further shortcoming of the standard Poisson regression model is that it ignores a possible panel structure of the data. Riphahn, Wambach, and Million

(15)

(2003) take up this problem and discuss a bivariate random effects framework for estimating the demand for health care with count data. L´opez-Nicol´as (1998) is another study that uses panel data to infer the demand for health care.

Finally, a growing part of the literature on the demand for health care is allowing for endogeneity of the insurance status (see, for example, Miller and Luft (1994), and Meer and Rosen (2004)). Munkin and Trivedi (2003) consider the case of count data. They analyze a self-selection model with two outcome variables, one of which is a count and the other a continuous variable. They use Bayesian methodology to draw inference and motivate this choice by computational problems they had in a simulated maximum likelihood framework.

The first essay in this book (Chapter 2) does not account for endogeneity but considers the first two shortcomings of standard Poisson models for esti- mating the demand for health care. First, it develops a random effects panel data model and thus allows to control for different attitudes and genetic diver- sity across individuals. The model is formulated in a Bayesian nonparametric fashion using a mixture of Dirichlet processes. In this way, an arbitrary spec- ification of the random effects distribution is avoided. Second, employing the Dirichlet process prior generalizes latent class models by allowing the mixture distribution to have a random number of components. Thus, the problem of selecting the number of classes is avoided.

Bayesian Treatment Effect Models

The second essay “Nonparametric Bayesian Inference for Count Data Treat- ment Effects” (Chapter 3) and the third essay “Nonparametric Bayesian In- ference for Quantile Treatment Effects” (Chapter 4) of this thesis discuss non- parametric Bayesian methods for econometric program evaluation. At the heart of econometric program evaluation are ‘what if’ questions. These play a major role in many fields of economics. For example, a classical ‘what if’

question in labor economics concerns the wage effect of an additional year of schooling: What would an individual earn if it went to school for one more

(16)

year?

Estimation of causal effects involves a comparison of two states of the world.

However, one of those two states cannot be observed, since an individual is either in the considered program or not. Thus, econometric program evaluation can be seen as a missing data problem.

The literature on econometric program evaluation is large and steadily growing. Heckman, LaLonde, and Smith (1999), Heckman and Vytlacil (2007) and Blundell and Costa Dias (2002) give excellent surveys. Basically, three dif- ferent approaches to program evaluation can be distinguished: i) social/natural experiments, ii) matching methods, and (iii) instrumental variable methods.

In social experiments a small subsample of the population is randomly assigned to treatment and control groups. The treatment group is then sub- jected to the program and the difference in outcomes provides an estimate of the causal effect of the program. Hausman and Wise (1985) discuss the advantages of this approach. In the case of a natural experiment (also called randomized trial), the researcher is able to observe a group of individuals that behaves like a control group in a properly randomized experiment. Often the ‘difference-in-differences’ estimator is used to evaluate natural experiments (see, for example, the famous study of Card and Krueger (1994) about mini- mum wages).

The matching literature assumes that individuals select themselves into the treatment solely based on variables that can be observed by the researcher (se- lection on observables). Thus, the researcher can match each treated individual with a non-treated individual with the same matching variables in order to es- timate the effect of the treatment. The two most common approaches to do this are propensity score matching (Rosenbaum and Rubin (1983)) and mul- tivariate matching based on the Mahalanobis distance (Cochran and Rubin (1973)).

Finally, the instrumental variable approach builds on an exclusion restric- tion. That is, there needs to be at least one variable that influences treatment choice but is excluded from the outcome equation. The seminal papers on instrumental variable estimation in treatment effect models are Imbens and

(17)

Angrist (1994) and Heckman (1997).

Given that econometric program evaluation can be seen as a missing data problem, following the Bayesian approach seems natural, since here the dis- tinction between missing data and model parameters gets blurred. This be- comes even more apparent when applying Markov chain Monte Carlo simu- lation methods for Bayesian inference. Modern sampling techniques augment the parameter space by the missing data and sample both in turn.

Despite this appeal, there are only a few Bayesian papers that deal with econometric program evaluation and treatment effects. Chib and Hamilton (2000) consider the potential outcomes framework (Neyman (1923), Fisher (1935), Roy (1951), Cox (1958) and Rubin (1974)) and extensions of it from a Bayesian viewpoint. In a subsequent paper (Chib and Hamilton (2002)) they extend their approach in a nonparametric way. Chib (2003) also ana- lyzes treatment effects from a Bayesian perspective. Instead of modelling the two potential outcomes separately, he considers a model with an endogenous dummy variable indicating treatment choice. Thus, this model is more re- strictive in that, for example, it equates the variances of the two potential outcomes.

The potential outcomes model is also used by Koop and Poirier (1997) who focus on the correlation between the two potential outcomes. Given that only one of the potential outcomes is observable, this quantity is inherently unidentified. Koop and Poirier show how one can learn about this parameter in a Bayesian setup with a proper prior and suggest MCMC techniques for drawing inference. Poirier and Tobias (2003) also follow this approach and focus on predictive distributions of the outcome gains. Finally, Li, Poirier, and Tobias (2004) extend the analysis and look at non-normal selection models.

Specifically, they propose Student-t selection models and a finite mixture of Gaussian selection models.

Imbens and Rubin (1997) estimate treatment effects in the case of ran- domized experiments with noncompliance. Their approach is then extended by Hirano, Imbens, Rubin, and Zhou (2000) who allow for the presence of co- variates. They also consider violations of the identifying exclusion restrictions.

(18)

The second essay of this dissertation (Chapter 3) considers treatment ef- fect estimation in situations where the outcome variable is a count. Terza (1998) discusses classical inference for a count data regression model with an endogenous dummy variable. We extend his approach by formulating a po- tential outcomes model for count data. Again, we use a mixture of Dirichlet processes to obtain a robust model framework. MCMC simulation methods are also used for posterior inference in this model.

Most of the program evaluation literature focuses on mean treatment ef- fects. The average treatment effect (ATE), which gives the effect of the treat- ment on a randomly picked individual, and the effect of the treatment on the treated (TT), which calculates the effect on a randomly chosen participant, are the two most common. However, in many situations researchers and policy- makers are also interested in the effect of the treatment on the distribution of the outcome variable. In order to address this point, quantile treatment effects can be estimated. Abadie, Angrist, and Imbens (2002) and Chernozhukov and Hansen (2004, 2005) discuss appropriate model frameworks for doing so. The third essay (Chapter 4) of this thesis proposes a Bayesian alternative to their classical approaches. To allow the data drive the shape of the distribution of the outcome variable, we introduce a nonparametric mixture of Dirichlet processes model. Posterior inference is done using MCMC techniques.

Bibliography

Abadie, A., J. Angrist, and G. Imbens (2002): “Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings,” Econometrica, 70, 91–117.

Antoniak, C. E.(1974): “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems,” The Annals of Statistics, 2, 1152–

1174.

Bernardo, J. M., and A. F. M. Smith (1994): Bayesian theory. Wiley, New York.

(19)

Blackwell, D., and J. MacQueen (1973): “Ferguson distributions via Polya urn schemes,” The Annals of Statistics, 1, 353–355.

Blei, D., and M. Jordan (2004): “Variational methods for the Dirichlet process,” in Proceedings of the 21st International Conference on Machine Learning.

Blundell, R., and M. Costa Dias (2002): “Alternative approaches to evaluation in empirical microeconomics,” Portuguese Economic Journal, 1, 91–115.

Card, D., and A. B. Krueger(1994): “Minimum wages and employment:

A case study of the fast-food industry in New Jersey and Pennsylvania,”

American Economic Review, 84, 772–793.

Carota, C., and G. Parmigiani (2002): “Semiparametric regression for count data,” Biometrika, 89, 265–281.

Chen, M. H., Q. M. Shao, and J. G. Ibrahim (2000): Monte Carlo methods in Bayesian computation. Springer, New York.

Chernozhukov, V., and C. Hansen (2004): “The impact of 401K par- ticipation on savings: an iv-qr analysis,” The Review of Economics and Statistics, 86, 735–751.

(2005): “An iv model of quantile treatment effects,” Econometrica, 73, 245–261.

Chib, S. (2003): “On inferring effects of binary treatments with unobserved confounders (with discussion),” in Bayesian Statistics 7, ed. by J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, pp. 66–84. Oxford University Press, Oxford.

Chib, S., and B. H. Hamilton (2000): “Bayesian analysis of cross-section and clustered data treatment models,” Journal of Econometrics, 97, 25 – 50.

(2002): “Semiparametric Bayes analysis of longitudinal data treat- ment models,” Journal of Econometrics, 110, 67–89.

(20)

Cifarelli, D. M., and E. Melilli(2000): “Some new results for Dirichlet priors,” The Annals of Statistics, 28, 1390–1413.

Cifarelli, D. M., and E. Regazzini(1978): “Problemi statistici non para- metrici in condizioni di scambialbilita parziale e impiego di medie asso- ciative,” Annali del Instituto di Matematica Finianziara dell Universit`a di Torino, Serie III, 12, 1-36.

Cochran, W. G., and D. B. Rubin (1973): “Controlling bias in observa- tional studies: A review,” Sankhya, Ser. A, 35, 417–446.

Cox, D. R. (1958): The planning of experiments. Wiley, New York.

Deb, P., and P. Trivedi (1997): “Demand for medical care by the elderly:

A finite mixture approach,” Journal of Applied Econometrics, 12, 313–336.

(2002): “The structure of demand for health care: Latent class versus two-part model,” Journal of Health Economics, 21, 601–625.

Dunson, D. B., and N. Pillai(2004): “Bayesian density regression,” ISDS Discussion Paper 2004-33.

Escobar, M. D.(1994): “Estimating normal means with a Dirichlet process prior,” Journal of the American Statistical Association, 89, 268–277.

Escobar, M. D., and M. West (1995): “Bayesian density estimation and inference using mixtures,” Journal of the American Statistical Association, 90, 577–588.

Ferguson, T. S. (1973): “A Bayesian analysis of some nonparametric prob- lems,” The Annals of Statistics, 1, 209–230.

Fisher, R. A. (1935): Design of experiments. Oliver and Boyd.

Gelfand, A. E.,and A. Kottas(2002): “A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models,”

Journal of Computational and Graphical Statistics, 11, 289–305.

Gelfand, A. E., and A. F. M. Smith(1990): “Sampling-based approaches to calculating marginal densities,” Journal of the American Statistical As- sociation, 85, 398–409.

(21)

Geman, S., and D. Geman (1984): “Stochastic relaxation, Gibbs distri- butions and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.

Ghosh, J. K.,and R. V. Ramamoorthi(2003): Bayesian nonparametrics.

Springer, New York.

Griffin, J. E., and M. F. J. Steel (2004): “Semiparametric Bayesian inference for stochastic frontier models,”Journal of Econometrics, 123, 121–

152.

Griffin, J. E.,andM. F. J. Steel(2006): “Order-based dependent Dirich- let processes,” Journal of the American Statistical Association, 101, 179–

194.

Gurmu, S. (1997): “Semiparametric estimation of hurdle regression models with an application to Medicaid utilization,” Journal of Applied Economet- rics, 12, 225–242.

Hausman, J. A., and D. A. Wise (1985): Social experimentation, NBER conference report. University of Chicago Press, Chicago.

Heckman, J. (1997): “Instrumental variables: A study of implicit behav- ioral assumptions used in making program evaluations,” Journal of Human Resources, 32, 441–462.

Heckman, J., R. LaLonde, and J. Smith (1999): “The economics and econometrics of active labour market programs,” in Handbook of Labour Economics, Volume 3, ed. by O. Ashenfelter, and D. Card. Elsevier, Ams- terdam.

Heckman, J., and E. Vytlacil (2007): “Econometric evaluation of social programs,” inHandbook of Econometrics, Volume 6, ed. by J. Heckman,and E. Leamer. North Holland, Amsterdam.

Hirano, K., G. W. Imbens, D. B. Rubin, and X.-H. Zhou (2000): “As- sessing the effect of an influenza vaccine in an encouragement design,” Bio- statistics, 1, 69–88.

(22)

Imbens, G. W., and J. D. Angrist (1994): “Identification and estimation of local average treatment effects,” Econometrica, 62, 467–475.

Imbens, G. W., and D. B. Rubin(1997): “Bayesian inference for causal ef- fects in randomized experiments with noncompliance,”The Annals of Statis- tics, 25, 305–327.

Ishwaran, H., and L. F. James (2002): “Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information,”

Journal of Computational and Graphical Statistics, 11, 508–532.

Koop, G. (2003): Bayesian econometrics. Wiley, Chicester.

Koop, G., and D. J. Poirier (1997): “Learning about the across-regime correlation in switching regression models,” Journal of Econometrics, 78, 217–227.

Li, M., D. Poirier, and J. Tobias(2004): “Do dropouts suffer from drop- ping out? Estimation and prediction of outcome gains in generalized selec- tion models,” Journal of Applied Econometrics, 9, 203–225.

Liu, J. S. (2001): Monte Carlo strategies in scientific computing. Springer, New York.

L´opez-Nicol´as, A. (1998): “Unobserved heterogeneity and censoring in the demand for private health care,” Health Economics, 7, 429–437.

MacEachern, S. (1999): “Dependent nonparametric processes,” in ASA Proceedings of the Section on Bayesian Statistical Science. American Statis- tical Association, Alexandria.

MacEachern, S. N., M. Clyde, and J. S. Liu (1999): “Sequential im- portance sampling for nonparametric Bayes models: The next generation,”

Canadian Journal of Statistics, 27, 251–267.

MacEachern, S. N., and P. M¨uller (1998): “Estimating mixtures of Dirichlet process models,” Journal of Computational and Graphical Statis- tics, 7, 223–238.

(23)

(2000): “Efficient MCMC schemes for robust model extensions us- ing encompassing Dirichlet process mixture models,” in Robust Bayesian analysis, ed. by F. Ruggeri, and D. R´ıos-Ins´ua. Springer.

Meer, J.,and H. S. Rosen(2004): “Insurance and the utilization of medical services,” Social Science and Medicine, 58, 1623–1632.

Miller, R. H., and H. S. Luft (1994): “Managed care plan performance since 1980,” The Journal of the American Medical Association, 271, 1512–

1519.

Mullahy, J. (1986): “Specification and testing in some modified count data models,” Journal of Econometrics, 33, 341–365.

M¨uller, P., and F. A. Quintana (2004): “Nonparametric Bayesian data analysis,” Statistical Science, 19, 95–110.

Munkin, M. K., and P. K. Trivedi (2003): “Bayesian analysis of a self- selection model with multiple outcomes using simulation-based estimation:

an application to the demand for healthcare,”Journal of Econometrics, 114, 197–220.

Neal, R. M. (2000): “Markov chain sampling methods for Dirichlet process mixture models,”Journal of Computational and Graphical Statistics, 9, 249–

265.

Neyman, J.(1923): “Statistical problems in agricultural experiments,” Jour- nal of the Royal Statistic Society, 2, 107–180.

Pohlmeier, W., andV. Ulrich(1995): “An econometric model of the two- part decisionmaking process in the demand for health care,” The Journal of Human Resources, 30, 339–361.

Poirier, D. J., and J. L. Tobias (2003): “On the predictive distribution of outcome gains in the presence of an unidentified parameter,” Journal of Business and Economic Statistics, 21, 258–268.

Riphahn, R. T., A. Wambach, and A. Million(2003): “Incentive effects in the demand for health care: A bivariate panel count data estimation,”

Journal of Applied Econometrics, 18, 387–405.

(24)

Robert, C. P., and G. Casella (1999): Monte Carlo statistical methods.

Springer, New York.

Rosenbaum, P. R., and D. B. Rubin (1983): “The central role of the propensity score in observational studies for causal effects,” Biometrika, 1, 41–55.

Roy, A. (1951): “Some thoughts on the distribution of earnings,” Oxford Economic Papers, 3, 135–146.

Rubin, D. B.(1974): “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of Educational Psychology, 66, 688–

701.

Sethuraman, J.(1994): “A constructive definition of Dirichlet priors,” Sta- tistica Sinica, 4, 639–650.

Tanner, M. A., and W. Wong(1987): “The calculation of posterior distri- butions by data augmentation (with discussion),” Journal of the American Statistical Association, 82, 528–550.

Terza, J. V.(1998): “Estimating count data models with endogenous switch- ing: Sample selection and endogenous treatment effects,”Journal of Econo- metrics, 84, 129–154.

(25)

Estimating the Demand for

Health Care with Panel Data - A Semiparametric Bayesian

Approach (Essay 1)

(26)

Abstract

This paper is concerned with the problem of estimating the demand for health care with panel data. A random effects model is specified within a semipara- metric Bayesian approach using a Dirichlet process prior. This results in a very flexible distribution for both the random effects and the count variable.

In particular, the model can be seen as a mixture distribution with a random number of components, and is therefore a natural extension of prevailing la- tent class models. A full Bayesian analysis using Markov chain Monte Carlo (MCMC) simulation methods is proposed. The methodology is illustrated with an application using data from Germany.

JEL classifications C14, C23, I10

Keywords

random effects model, Dirichlet process prior, MCMC

(27)

2.1 Introduction

This paper is concerned with the problem of estimating the demand for health care. It advances on previous cross sectional studies by explicitly incorporating unobserved heterogeneity using a random effects panel data model (see L´opez- Nicol´as (1998) and Riphahn, Wambach, and Million (2003) for other studies using panel data to infer the demand for health care). This approach allows us to control for different behavioral attitudes or genetic diversity across individ- uals, which are both very likely to influence the demand for health care. One aspect of our analysis is to develop a semiparametric framework that avoids the arbitrary specification of a particular distribution for the random effects.

Another purpose of this paper is to expand the range of the recently ad- vocated latent class models (e.g., Deb and Trivedi (1997) or Jimenez-Martin, Labeaga, and Martinez-Granado (2002)) by allowing the population to be split into more than a small number of classes. An argument in favor of latent class models is that they allow for a heterogeneous population while avoiding the sharp distinction between “users” and “non-users” which is assumed in two- part hurdle models (see, for example, Pohlmeier and Ulrich (1995) or Gurmu (1997)). Deb and Trivedi (2002) use data from the RAND Health Insurance Experiment (RHIE) and find that latent class models outperform two-part models in terms of in-sample and cross-validation model selection tests.

However, latent class models only allow for a small number of classes in practice. Moreover, the problem of selecting the number of classes is not straightforward, especially with small sample sizes. In the literature about health care demand, it is common to estimate models with just two classes representing the ‘ill’ and the ‘healthy’ (e.g., Deb and Trivedi (1997, 2002)).

This assumption may be too restrictive in some circumstances.

Our model overcomes this fact by specifying a Dirichlet process prior (Fer- guson (1973) and Antoniak (1974)) for the distribution of the random effects.

The resulting mixture distribution of the random effects has a random number of components, and hence it is very flexible while remaining tractable. By hav- ing a random number of components, we extend the ’healthy - ill’ dichotomy into a richer classification.

(28)

We apply the proposed model to analyze equity in the delivery of health care using 5 waves from the German Socio-Economic Panel Study (GSOEP).

In particular, we focus on the analysis of horizontal equity. The delivery of health care will be equitable in a horizontal sense if individuals with equal need, in terms of health status, are given the same treatment irrespective of their income and other socio-economic characteristics. For that purpose, we analyze the importance of income and socio-econonomic characteristics in explaining health care utilization while controlling for health status.

The plan of the paper is as follows. Section 2 introduces a parametric ran- dom effects count data model which assumes a multivariate Normal distribu- tion for the random effects. This model will serve as a benchmark throughout the paper. In Section 3 we present a semi-parametric extension of the model which allows for a wide range of distributions for the random effects. Section 4 describes the numerical procedures (Markov chain Monte Carlo techniques) that we use to obtain the model estimates. In Section 5, we describe the data and the results of the empirical analysis will be presented in Section 6. Section 7 draws some conclusions and provides an outlook on future research.

2.2 A Parametric Benchmark Model

In this section, we describe a parametric Bayesian model for panel count data, which sets the benchmark for the semiparametric extension discussed later (Chib and Winkelmann (2001) analyze a similar model using Bayesian in- ference, Zeger and Karim (1991) propose a Bayesian approach to generalized linear models and Richardson, Viallefont, and Green (2002) analyze a Bayesian finite mixture Poisson model). We assume that the observed count outcomes yit for individual i= 1, . . . , N over time periods t = 1, . . . , Ti follow a Poisson distribution, that is,

yitit ∼Poisson(exp(θit)). (2.1) The logarithm of the conditional mean θit is defined as

θit =xitβ+wit biit, (2.2)

(29)

where xit is a k ×1 vector of covariates, β is the corresponding parameter vector, wit is a p×1 vector of covariates for the corresponding vector of un- observed random effectsbi and εit is an error term. We assume that bi and εit

are independent and that each random effects vectorbifollows a p-dimensional multivariate Normal distribution with mean zero and variance-covariance ma- trixD:

bi ∼Np(0, D). (2.3)

The error term εit is drawn from a Normal distribution with mean 0 and precision parameter τ,

εit ∼N(0, τ−1). (2.4)

The presence of εit relaxes the assumptions in the Poisson distribution by al- lowing for over-dispersion. The model is completed by specifying the following priors forβ, D and τ:

β ∼Nk00), (2.5)

D−1 ∼Wishart(ν0, S0), (2.6) τ ∼Gammaα0

2 ,α0

2

. (2.7)

2.3 A Semiparametric Extension

It has been shown both from the classical and the Bayesian perspectives that in many situations the assumption of a particular functional form for the random effects is too restrictive and may lead to wrong parameter estimates (see for example Heckman and Singer (1984) who make this point for duration models or Verbeke and Lesaffre (1996) in the context of linear mixed-effects models).

For this reason, we now propose a Dirichlet process mixture (DPM) model in the spirit of Ibrahim and Kleinman (1998) that generalizes the parametric benchmark model of the previous section. The model uses a Dirichlet process (Ferguson (1973)), which is a prior that reflects beliefs about the probability function of the random effects. Instead of imposing a parametric probability function for the individual effects, the Dirichlet process represents the uncer- tainty about the true probability function of the random effects. In addition, the Dirichlet process is flexible enough to approximate any probability func-

(30)

tion.

The DPM model removes the parametric normal prior assigned to the ran- dom effects {bi}and replaces it with a general distribution G:

bi ∼G. (2.8)

The prior distribution on G is then defined to be a Dirichlet process with concentration parameterM and base distribution G0:

G∼ DP(M·G0). (2.9) The base distribution G0 is specified as a p-dimensional multivariate Normal distribution:

G0 = Np(0, D). (2.10)

We therefore add a further stage to the model that allows us to take into account possible deviations of the true distribution of the random effects G from the “baseline” multivariate normal distribution G0. In other words, we approximate the true nonparametric shape of G by the base distribution G0. The concentration parameter M reflects our prior belief about how similar G is to G0. Large values of M lead to a G that is very likely to be close to G0. Small values of M allow G to deviate more from G0 and put most of its probability mass on just a few atoms.

In order to illustrate some properties of the model, Figure 2.1 plots sev- eral probability functions for bi and yit, which are obtained from two random draws from the Dirichlet process withM = 10. Note that each draw from the Dirichlet process represents a probability function for the random effects (bi).

This probability function in turn implies a probability function for the count variable. The two draws represented in Figure 2.1 are just two possible scenar- ios, out of the infinite existing possibilities considered in the prior. In order to obtain these draws, we utilized a truncated version of the sum-representation of the Dirichlet process proposed by Sethuraman (1994). It is clear from the graph that the model can approximate unimodal and multimodal probability functions for the count variable. Figure 2.1 also illustrates that the distribution

(31)

Figure 2.1: Two draws ofbi (left column) andyit (right column) from the prior of the random effects is almost surely discrete (Sethuraman (1994)). WhenM is small the number of mass points with non-negligible probability is smaller than whenM large. AsM increases, the probability mass will be more evenly distributed on a bigger set of mass points, and it would resemble more closely the continuous density G0.

Looking at two key features of the Dirichlet process helps to clarify the implications of this setup. First, some of the bi are identical with positive probability. Thus, eachbi takes one ofl < N distinct values which we denote byκ= (κ1, . . . , κl). A so called cluster then contains all random effects which take the same value. In order to discuss the second fact, some additional notation is necessary. Letb−i denote the random effects excluding the random effect for individual i. Finally, let the set κ−i consist of the distinct elements of b−i with each value κ−ij appearing m−ij times. Now we can show that by integrating over G the prior distribution of bi conditional on b−i and G0 can

(32)

be expressed as:

bi|b−i, G0 ∼ M

M+N −1G0+ 1 M +N −1

l

X

j=1

m−ij δ(κ−ij ), (2.11)

where δ(κ) represents a degenerate distribution with point mass at κ.

Therefore, a new value drawn from the base distribution is chosen forbi with probabilityM/(M+N−1), whereasbi takes the value of an already existing clusterκ−ij with probability m−ij /(M +N−1).

Combining this result with equations (2) and (4), we obtain the following expression for the conditional distribution of θit marginalized over bi and G:

θit|β, D, G0, b−i ∼ Z

fNit|xitβ+witbi, τ−1)d[bi|b−i, G0]. (2.12)

Performing the integration we end up with:

θit|β,D, G0, b−i ∼ M

M +N −1fNit|xitβ, wit Dwit−1)

+ 1

M +N −1

l

X

j=1

m−ij fNit|xitβ+witκ−ij , τ−1),

(2.13)

where fN represents the normal density. We see thatθit follows a mixture distribution with a random number of components, where the components differ both with respect to their means and variances. Equation (13) illustrates that the here proposed DPM model can be seen as a mixture model with an infinite number of classes (see Neal (2000) for a more formal presentation of this point). Thus, it contributes to the existing literature on latent class models for estimating the demand for health care. Also note that by using the Dirichlet process as a prior on the distribution of the random effects, we are able to relax the restrictive parametric assumption inherent in the benchmark model

(33)

in a tractable manner.

2.4 Bayesian MCMC Sampling

Having specified the prior distribution and the likelihood function, we now turn to the analysis of the posterior distribution, which is proportional to the product of these two terms. In the Bayesian approach, the posterior distribu- tion of a model contains all the relevant information and can be used to make probability statements about the parameters.

However, due to the complexity of the proposed models, we are not able to analyze their posterior distributions analytically. This problem can be over- come by applying Markov chain Monte Carlo (MCMC) techniques. This means that we draw large samples from the posterior distributions and then use these samples to summarize the posterior distributions. We do this by employing the Gibbs sampler where each element of the parameter vectors is updated conditional on the actual values of the other components. After discarding some number of initial draws, the resulting Markov chains have converged to the posterior distributions. We refer to Chen, Shao, and Ibrahim (2000) or Robert and Casella (1999) for comprehensive surveys on MCMC methods.

In order to keep the Gibbs sampler computations simple, we apply the data augmentation technique put forward by Tanner and Wong (1987). This means that we include the random effects {bi} and the latent variables {θit} in the parameter space. Thus, we end up with full conditional distributions which take convenient functional forms.

The resulting Gibbs sampler for the parametric benchmark model can be summarized as follows (further details on the algorithm are given in the ap- pendix of this paper):

0. Choose starting values for τ, {bi}, D−1,{θit}.

1. Sample β from [β|{bi}, τ,{θit}], which is a Normal distribution.

(34)

2. Sample τ from [τ|{bi}, β,{θit}], which is a Gamma distribution.

3. Sample {θit} from [θit|bi, β, τ], using the Metropolis-Hastings algorithm, independently fori= 1, . . . , N and t= 1, . . . , Ti.

4. Sample {bi} from [bi|β, τ, D,{θit}], which is a Normal distribution, inde- pendently for i= 1, . . . , N.

5. Sample D−1 from [D−1|{bi}], which is a Wishart distribution.

6. Repeat Steps 1-5 using the updated values of the conditioning variables.

Since G0 is chosen to be a conjugate prior distribution (a conjugate prior distribution yields a posterior distribution that falls in the same class of distri- butions), we can easily set up a Gibbs sampler for the semiparametric model as well. Examples of MCMC methods applied to the semiparametric setting are Escobar and West (1995) or MacEachern and M¨uller (1998). In particular, we have to modify steps 4 and 5 as follows (further details are also given in the appendix):

4’a. Sample{bi}from [bi|b−i, G0, D, β, τ,{θit}], independently fori= 1, . . . , N. 5’. Sample D−1 from [D−1|{κj}], which is a Wishart distribution.

In order to improve the mixing behavior of the modified algorithm, we follow a strategy proposed by Bush and MacEachern (1996) and resample the cluster values{κj}after determining how thebis are grouped. This is achieved by including the following step:

4’b. Sample {κj} from [κj|β, τ, D,{θit}], which is a Normal distribution, in- dependently for j = 1, . . . , l.

We would like to point out that the Bayesian approach and its application via MCMC techniques offer several advantages. First, the Bayesian approach allows for full and exact small sample inference both in the parametric and the semiparametric version of the model and is not restricted to asymptotic

(35)

approximations. Second, numerical integration methods are avoided in the evaluation of the model. Finally, by using data augmentation we easily obtain estimates for the random effects. This becomes important when analyzing extensions of the model in which the estimates of the random effects play a central role on their own (see Ibrahim and Kleinman (1998) and the cited literature therein). For example, one might think of a possible extension of the model in the direction of causal effect modelling. In this case, MCMC methods would allow us to calculate individual treatment effects (see Chib and Hamilton (2002)). In addition, the estimates of the random effects in our model can be used to obtain predictions in a simple way. Thus, our framework can be easily used for analyzing the impact of institutional changes on the individual demand for health care.

2.5 The Data

In the following, the proposed methodology is used to estimate the demand for health care by the elderly in Germany. There are many existing studies analyzing the demand for health care, but only few of them focus on the elderly population (Deb and Trivedi (1997) is one exception). Nevertheless, this group is of particular interest, since elderly people typically have higher medical care needs and costs and their population share is steadily growing in many countries.

The data set used in this study stems from five waves (1997-2001) of the German Socio-Economic Panel Study (GSOEP). The GSOEP, conducted by the German Institute for Economic Research in Berlin, is a representative longitudinal survey of German households (for more information, see SOEP Group SOEP Group (2001)). It contains detailed information about the health care utilization of the respondents and insurance schemes under which they are covered.

We restrict our analysis to retired men who are older than 65 years. Af- ter eliminating all observations with missing values on any of the variables of interest, we obtain a final sample of 1854 individuals and 4761 person-year ob- servations. Note that the observations are not equally distributed throughout

(36)

Variable Definition Mean Std. Dev.

VISITS Number of doctor visits in last 3 months 4.120 5.534

AGE Age in years 72.371 6.041

AGE2 Age squared in years / 1000 5.274 0.913

EDUCATION Years of education 11.300 2.306

SATISFAC Self reported health satisfaction 5.667 2.323 (0-low to 10-high)

LOWS 1 if SATISFAC<4 0.187

HIGHS 1 if SATISFAC>6 0.400

HANDICAP 1 if individual is handicapped 0.337

HDEGREE Degree of handicap in percentage points 21.800 33.102 NOPARTNER 1 if individual has no partner 0.145

PENSION Monthly pension payments in DM / 1000 2.639 1.295 PUBLICIN 1 if individual is in public health insurance 0.920

ADDON 1 if individual purchased add-on insurance 0.055 FOREIGN 1 if individual is foreigner 0.056

YEAR97 1 if year = 1997 0.118

YEAR98 1 if year = 1998 0.138

YEAR99 1 if year = 1999 0.153

YEAR00 1 if year = 2000 0.298

YEAR01 1 if year = 2001 0.293

N = 1854 PTi = 4761

Table 2.1: Variable definitions and summary statistics

the five years, since both in 1998 and 2000 the GSOEP was expanded with new sub-samples. The variable definitions and summary statistics are reported in Table 2.1.

The dependent variable in our study is the number of visits to a doctor in the last three months prior to the survey (VISITS). Note that visits to a dentist are subsumed under this definition as well. The explanatory variables consist of socioeconomic characteristics and variables that describe the health condition of the individual. In particular, we include a self-perceived health satisfaction index (SATISFAC), as well as variables measuring disability (HANDICAP and HDEGREE). In order to capture nonlinear and threshold effects of SATISFAC we include the dummy variables LOWS and HIGHS.

In the German health care system, only individuals above a certain earn- ings level (3,825 Euros gross monthly earnings in 2003), civil servants, or self-

(37)

employed individuals can opt out the public insurance scheme (PUBLICIN) and choose a private insurance plan or remain uninsured. Individuals in the public insurance scheme can purchase add-on insurance (ADDON) that, for example, covers extra costs for dental prostheses or glasses.

Given this institutional setup, the decisions to choose a private insurance plan and to purchase add-on insurance may be endogenous. However, since we control for the health condition of the individual, the strength of this argument is reduced (see Deb and Trivedi (1997), who argue in the same line). The pos- sibility of endogeneity should nevertheless not be overlooked when interpreting the results.

2.6 Results

We analyze these data using both the parametric benchmark model and the semiparametric extension of it. Prior elicitation is done in the following way:

we randomly choose 250 individuals from the data set and analyze this “ training sample ” using the parametric benchmark model with uninformative priors (i.e. priors with large variance). In this way we mimic the usual Bayesian approach where the results of a previous study with different data are used to select prior distributions (Chib and Hamilton Chib and Hamilton (2002) and Ibrahim and Kleinman Ibrahim and Kleinman (1998) also follow the ’training sample’ strategy). For a discussion of the training sample strategy see, for example, Gelfand, Dey, and Chang Gelfand, Dey, and Chang (1992)) and Ghosh and Samanta Ghosh and Samanta (2002).

To analyze the remaining data, we select a prior distribution on D−1 by settingν0 = 250 andS0 = Dˆν1

0 , where ˆDis the training sample posterior mean.

Cowles, Carlin, and Connett (1996) argue that a flatter prior on the variance matrix of the random effects can lead to a slow convergence of the algorithm (see also Ibrahim and Kleinman (1998)). In addition, the prior means and variances of the slope parameters inβare the corresponding estimates obtained with the training sample. In order to facilitate the calculation of Bayes factors (Verdinelli and Wasserman (1995)), the non diagonal elements in Σ0 are set equal to zero. Finally, in order to represent prior ignorance, we setα0 = 0.001.

(38)

We then estimate the parametric benchmark model and the semiparametric model with M equal to 10. Recall that a Dirichlet process prior implies that we expect the density of the individual effects to be discrete (we showed several draws from the prior on the distribution of the random constant in Figure 2.1).

Given our choice ofM,S0 and ν0, the number of mass points with probability larger than 0.01 is between 2 and 9 with probability 0.95 (we calculate this “a priori” credible interval by Monte Carlo simulation).

We specify the models choosing VISITS as the dependent variable. All other variables (including the year dummies) plus a constant are included in the population mean vectors. The random effects include a constant and the effects of SATISFAC, HIGHS and LOWHS. The models are then estimated using the MCMC sampling algorithms described in Section 4. We ran each for 30,000 iterations keeping the last 25,000 iterations each time. To give an indication of the performance of the algorithm for the semiparametric model, Figure 2.2 reports the posterior histograms and autocorrelation functions of βAGE, τ, and DC, where DC is the variance of the intercept in the base mea- sure (DSAT ISF AC,DHIGHS andDLOW S denote the variances of the SATISFAC, HIGHS and LOWS effects, respectively). It can be seen that the mixing be- havior of the sampler is satisfactory since autocorrelations decline steadily as the number of lags increases. The algorithm for the parametric model displays an even better mixing behavior.

Table 2.2 shows the posterior estimates for the parametric and semipara- metric models. The medians and 95% highest posterior density (HPD) regions for some marginal effects are quite similar. However, the estimates for the effects of SATISFAC, LOWS, HANDICAP, NOPARTNER, PUBLICIN and FOREIGN are substantially different among the parametric and semipara- metric models. This indicates that the posterior distributions of the binary covariates are the most affected by the relaxation of the parametric assump- tions. With regard to the effect of NOPARTNER, zero is included in the 95%

HPD region in the parametric case but it is excluded in the semiparametric case. However, also the posterior distributions for the effect of continuous covariates change. This is illustrated in Figure 2.3, which compares the pos- terior density of the coefficient of SATISFAC in both models. Note that the semiparametric point estimate receives very small density weight in the para-

(39)

Figure 2.2: Autocorrelation functions and posterior histograms forτ (top row), the marginal effect of AGE2 (middle row) and DC (bottom row)

(40)

metric model and that there is more uncertainty in the estimate when the parametric assumptions are relaxed. Additionally, the posterior distributions of the elements of the covariance matrix D are noticeably different in the two models. One has to keep in mind, however, that D plays a different role in the semiparametric model and hence obtaining a meaningful comparison is difficult.

Variable Quantiles

2.5% 50% 97.5% 2.5% 50% 97.5%

M =∞ M = 10

AGE 0.151 0.576 0.990 0.163 0.571 0.991

AGE2 −6.094 −3.343 −0.530 −6.140 −3.347 −0.648 EDUCATION −0.006 0.066 0.140 −0.009 0.057 0.123 SATISFAC −0.584 −0.460 −0.338 −0.708 −0.563 −0.423 LOWS −0.354 0.108 0.571 −0.601 −0.073 0.440 HIGHS −0.845 −0.424 0.002 −0.899 −0.379 0.126 HANDICAP −0.690 −0.002 0.676 −0.709 −0.045 0.617 HDEGREE 0.010 0.019 0.029 0.011 0.020 0.030 NOPARTNER −0.945 −0.493 −0.046 −0.783 −0.342 0.085 PENSION −0.198 −0.057 0.086 −0.163 −0.034 0.092 PUBLICIN −0.526 0.054 0.651 −0.383 0.209 0.767 ADDON −0.442 0.124 0.699 −0.444 0.099 0.621 FOREIGN −0.314 0.408 1.106 −0.434 0.240 0.890

τ 4.760 5.478 6.245 4.809 5.560 6.387

DC 0.177 0.212 0.256 0.261 0.560 1.195

DSATISFAC 0.016 0.018 0.021 0.018 0.027 0.045

DLOWS 0.105 0.123 0.146 0.035 0.069 0.168

DHIGHS 0.130 0.153 0.183 0.055 0.119 0.292

Note: We report marginal effects for the coefficient vector.

Table 2.2: Posterior estimates for the parametric benchmark model (M =∞) and the MDP model with M = 10

The estimated coefficients on AGE and AGE2 imply that the number of doctor visits increases with age until the age of 85 and decreases thereafter.

There is a large probability that the effect of NOPARTNER is negative, but positive values cannot be ruled out. Similarly, there is some uncertainty regard- ing the sign of the effect of education. The evidence on the effect of disability

(41)

Figure 2.3: Posterior distributions ofβSATISFAC: Parametric benchmark model (dashed curve) and MDP model with M = 10 (solid curve)

(42)

is twofold: the sign of the dummy variable (HANDICAP) is not clearly deter- mined, whereas the degree of handicap (HDEGREE) has an unambiguously positive effect. An increase of 10 percentage points would lead to 0.2 visits more on average. The variable SATISFAC has as expected a negative effect, whereas the signs of the threshold effects (LOWS and HIGHS) are uncertain.

Note that the variance of the time variant error term εit is small when compared with the variance of the individual effects. Thus, individual hetero- geneity accounts for a large proportion of the variability in the data, which illustrates the importance of modelling the distribution of the individual effects correctly.

There is substantial uncertainty regarding the signs of the coefficients of the variables FOREIGN, ADDON, PUBLICIN and PENSION. Riphahn, Wambach, and Million (2003) argue that the result for ADDON is not surprising and can be explained by the benefit packages of the German add-on insurance plans.

In order to determine whether the delivery of health care for the elderly is eq- uitable, we test the hypothesis that the variables EDUCATION, FOREIGN, ADDON, PUBLICIN and PENSION have all a zero effect. We calculate a Bayes factor for this hypothesis following the method proposed by Verdinelli and Wasserman (1995). We obtain that the hypothesis of equitable delivery of health care is much more likely than the alternative (the probability of this hy- pothesis versus the alternative is 0.9993). Note, however, that the model does not account for the possible endogenous nature of the variable PUBLICIN.

An extension in the direction of causal modelling using the potential outcomes approach is one direction for future research.

2.7 Conclusion

This paper developed a semiparametric Bayesian framework for estimating the demand for health care with panel data. This was done by specifying a Dirichlet process prior for the distribution of the random effects. Thus, the presented framework allowed explicitly for individual heterogeneity while it did not impose unreasonably strong constraints on distributional assumptions.

Referenzen

ÄHNLICHE DOKUMENTE

The proper way of doing model selection for a new data point involves considering the whole data (t; t 0 ) instead of using just the new datum t 0 to update the prior.. For a feeling

Since grey adjustments are similarly reliable in both conditions, the Bayesian model provides predictions of slightly higher memory colour effects for outline shapes than

Our approach for the detection of safety underreporting dem- onstrates the potential of Bayesian data analysis to address secondary questions arising from clinical trials such as QA

We present a new classification algorithm usable with fuzzy sets that is (a) fast, (b) is able to work with few training examples, (c) uses a compact representation of the

Besides volatility, bid-ask spreads provide a good proxy for market quality. We find that bid-ask spreads are lower in a competitive market in comparison with a monopolistic

When the function g( · ) linking the latent response variable y ⋆ to the observed re- sponse y is the indicator function 1l[ · ], no specific parameters λ are required and the model

Algorithms Besides BADS, we tested 16 optimization algorithms, including popular choices such as Nelder-Mead ( fminsearch [25]), several constrained nonlinear optimizers in the

1 Multiscale Inference and Long-Run Variance Estimation in Nonpara- metric Regression with Time Series Errors 3..