• Keine Ergebnisse gefunden

3.3 Gene-environment wide interaction studies (GEWIS)

4.1.2 The Bayesian model

In Bayesian inference, a prior probability of a hypothesis is combined with the compati-bility of some observed data with this particular hypothesis, to determine the probacompati-bility of the hypothesis given the observed data.

Consider a general problem where we want to specify a sampling model for N ob-servations y = (y1, ..., yN) depending on a vector of r unknown model parameters θ = (θ1, ..., θr) in a known way. This dependency can be expressed in form of a proba-bility density functionf(y|θ), withf(y|θ) =QN

i=1f(yi|θ) if the N observationsy1, ..., yN conditional on θ are independent from each other. The function f(y|θ) is a function of y that theoretically represents the probability to observe the data y under given fixed values ofθ. However, in classical statistics it is regarded as a function ofθ for fixed data y, also denoted by L(θ) and called likelihood, representing the probability to observe the given data y (Gelman et al.,1995;Robert,1994). The frequentist statistic is based on that likelihood. Estimates of the unknown parameters θ are yielded by choosing the values which maximize the likelihood (MLE) and hence make the data most likely to occur (Dehling and Haupt, 2004; Robert, 1994). For example, assuming normally

distributed data with known variance σ2 and unknown expectation θ, the MLE for the expected value is given by the mean of the data ˜θ = N1 PN

i=1yi.

In the Bayesian context we are interested in the unknown quantities θ as well, but we do not want to know the parameters that make the data most likely to occur, but the parameters, that are most likely given the data. Therefore, we need to reverse the con-ditional probability f(y|θ) of y given θ to a conditional probabilityπ(θ|y) of θ given y, what can be done by Bayes’ Theorem. Hence, we need additional prior beliefs about the parameter values θ which we want to take into account, expressed in terms of a probability density function. We supposeθ is a random quantity, having a probability distributionπ(θ), which is formalized by the available prior information. This function π(θ) is called prior density function of θ.

Regarding the prior information, two different interpretations can be opposed: the pop-ulation interpretation and the knowledge interpretation. From the first perspective, the prior represents a population of possible parameter values from which the model param-eters have been drawn. From the latter, more subjective viewpoint, the prior expresses the knowledge about the model parameters as if its values could be thought of as a random realization from a prior distribution (Gelman et al., 1995).

Having specified the prior distribution and having observed the data, we can use Bayes’

Theorem to transfer the prior belief about the model parameters before the observation into a posterior belief considering the new observed data.

Therefore, we multiply the prior π(θ) by the likelihood f(y|θ) (contribution of the ob-served data) to obtain the joint distributionh(y, θ) = f(y|θ)π(θ) ofyandθ. Normalizing the joint distribution by the marginalm(y) of the data, we obtain the posterior

π(θ|y) = h(y, θ)

m(y) = h(y, θ)

R h(y, θ)dθ = f(y|θ)π(θ)

R f(y|θ)π(θ)dθ. (4.2)

In general, the posterior distribution has no closed form expression and in particular the computation of the normalizing constant m(y) may be difficult due to the integration.

Therefore, the unnormalized posterior density function simply given by model times priorπ(θ|y)∝f(y|θ)π(θ) is often used (Gelmanet al.,1995). The posterior distribution is the main tool of the Bayesian inference (Robert, 1994). It summarizes the current state of knowledge about the parameter of interest by updating or weighting the prior opinion formalized by π(θ) according to the new evidence given by the experimental data represented by the likelihoodf(y|θ) (Gelman et al.,1995;Robert,1994).

In comparison to simple maximum likelihood (ML) parameter estimates, the Bayesian approach results in a whole distribution for the parameters from which any information can be extracted and inferences concerning can be made. To give a closer understanding of the Bayesian approach, two common simple models and corresponding examples are illustrated in the following. Since the examples will be taken up later in this chapter, we will numerate the different parts by 1a), 1b) and 2a), 2b).

Example 1a): Normal-normal model (Gelman et al., 1995)

The simplest combination of model and prior is using normal distributions for both of them. Assume that a woman wants to know her diastolic blood pressure and obtains a value of y in a blood pressure measurement. This value is normally distributed with

unknown mean θ presenting her true blood pressure and known variance σ2 due to measurement errors

y|θ∼N(θ, σ2).

In a frequentist manner, the observed score would be used as an estimate for her true blood pressure. However, experts say that the diastolic blood pressure in women of that age in general is a random variable with known mean µ and variance τ2. The woman can use this information as a prior

θ∼N(µ, τ2),

to get a Bayesian solution for her problem. We obtain the marginal distribution m(y) by multiplying the model f(y|θ) and the prior π(θ) and integrating over the parameter of interest θ. This results again in a normal distribution

y∼N(µ, σ22).

The posterior distributionπ(θ|y) is obtained by dividing the joint distributionh(y, θ) = f(y|θ)π(θ) by the marginalm(y) resulting in

θ|y∼N( σ2

σ22µ+ τ2

σ22y, σ2τ2

σ22), (4.3)

another normal distribution.

Let us now assume that the woman repeated the measurement N times with values y= (y1, ..., yN) that are independently and identical distributed (iid) conditioned on the true blood pressure

yiiid∼N(θ, σ2) i= 1, ..., N.

The common distribution is given by f(y|θ) = QN

i=1f(yi|θ). The MLE for her true blood pressure is then given by the mean of the data ˜θ = ¯y= N1 PN

i=1yi. The marginal distribution for the single scores isyiiid∼N(µ, σ22), with the marginal over all observed data given by

m(y) = 1

p2π(σ22)e

PN i=1(yi−µ)2 2(σ2+τ2) .

The corresponding posterior distribution π(θ|y) is given by θ|y∼N

2/N)

2/N) +τ2µ+ τ2

2/N) +τ2y,¯ (σ2/N)τ22/N) +τ2

.

Example 2a): Binomial-beta model (Gelmanet al., 1995)

Another Bayesian model that is often used in praxis is the combination of a binomial distribution model with a beta distributed prior. Assume that we have a study to

evaluate the risk of tumors in laboratory rats. The sample consists of N rats from the same strand that were treated under identical conditions. Of these N ratsy developed a tumor. We can model the experiment as a realization of a binomial distribution y|p∼Bin(N, p). The MLE for the tumor risk is given by ˜p= Ny.

From historical data we know that the tumor risk among laboratory rats from this strand under varying experimental conditions is approximately beta distributed Beta(α, β) with known meanµand varianceσ2. The parameters αand β of the distribution are related to mean and variance byµ=α/(α+β) andσ2 =αβ/((α+β)2(α+β+ 1)). We can use this information as priorp∼Beta(α, β) for our unknown probability p. For convenience, we reparameterize the distribution by the expected meanµandM =α+β as a measure of prior precision, so that the prior density is given by

π(p) = Γ(M)

Γ(M µ)Γ(M(1−µ))pM µ−1(1−p)M(1−µ)−1 .

The marginal distribution fory is a beta-binomial with density m(y) = Γ(M)

Γ(M µ)Γ(M(1−µ)) N

y

Γ(y+M µ)Γ(N −M(1−µ))

Γ(N +M) .

The posterior distribution of the tumor risk is again beta distributed given by π(p|y) = Γ(M)

Γ(M µ)Γ(M(1−µ))py+M µ−1(1−p)N−y+M(1−µ)−1

.

The main advantage of the Bayesian approach in comparison to the classical fre-quentist procedure is that it provides a simple conceptual framework with high flexibility and generality that allows to deal with really complex problems (Gelman et al., 1995). In addition, MLE estimates from classical statistics often have the drawback, that the estimators can be quite unstable (Robert, 1994) and may lack smoothness (Robert, 1994), whereas Bayes estimators are more stable. Furthermore, a key aspect of Bayesian methods is that it is possible to easily perform sequential analyses using Bayesian formula sequentially (Gelman et al., 1995). When a posterior distribution is calculated and new data become available, the previous posterior can be used as a prior for the new data (Lee,1997). In the blood pressure example given differ-ent measuremdiffer-ents, we have thatf(y1, y2|θ) = f(y1|θ)f(y2|θ) wheny1 andy2 conditional onθ are independent from each other. We can rewrite π(θ|y1, y2)∝ π(θ)f(y1, y2|θ) by π(θ|y1, y2) ∝ π(θ)f(y1|θ)f(y2|θ) ∝ π(θ|y1)f(y2|θ), treating the posterior given y1 as prior fory2.