• Keine Ergebnisse gefunden

Log-linear Model

Im Dokument He shall lift you up? (Seite 81-85)

An additive relationship of independent and dependent variables is assumed. A look at the income distribution graph (figure 13) suggests that the dependent variable, inc, is not normally distributed. In order to improve the distribution of the data, a BOX-COX

(1964) test is conducted on inc. This test runs a series of power transformations on the data in order to determine what transformation would best change the distribution of the data to normal (OSBORNE 2010). Since SPSS itself does not contain a module for the BOX-COX test, it was done using the BOX-COX transformations dialog provided by GRÜNER (2012). The test yields value of λ =0 at the lowest absolute Log-Likelihood value of −1810.208 (cf. the SPSS output in appendix 7). This means that a natural logarithmic transformation is appropriate. Hence, inc it is transformed to the variable

ln_inc=ln(inc), which is used in all estimations.

First, the log-linear income equation (23), which includes dummy variables for church membership and traditional religious practice, is estimated using ordinary least squares (OLS) estimation.

ln_inci = β0+ β1⋅genderi+ β2agei+ β3⋅schooli+ β4tertiaryi + β5⋅mem_groupsi+ β6rel_chiefi+ β7⋅hmemsi

+ β8⋅s_hmems_youngi+ β9⋅s_hmems_y51i+ β10hmems_school_msi + β11⋅hmems_learni+ β12hmems_acadi+ β13⋅dist_roadi

+ β14⋅dist_shopi+ β15⋅clinici+ β16⋅churchi+ β17trad_reli+ei

(23)

It is assumed that all Gauß-Markov conditions for OLS estimation are fulfilled.17 In particular, these assumptions are that the residuals in the error term ei have an expected value of 0 as well as constant variance (homoscedasticity) and are neither autocorrelated nor correlated with the explanatory variables. The estimation results are displayed in table 12, column A.

17 The Gauß-Markov theorem contains four conditions under which OLS yields unbiased results (see VERBEEK 2012, 15–20)

The adjusted coefficient of determination (adjusted R²) of the estimation is adj. R²=0.45. The F-statistic of F =9.63 shows that the model has significant explanatory value.18 Since the p-value of the Kolmogorov-Smirnov test (KS-test) on normal distribution of the residuals is pKS=0.65, the null hypothesis that the residuals are normally distributed cannot be rejected.19 Both, the F-test and the KS test yield equivalent results in all subsequent estimations presented; their results are not elaborated on in the subsequent paragraphs.

Furthermore, values for the Akaike Information Criterion (AkIC) and the Bayesian Information Criterion (BIC) are computed (cf. VERBEEK 2012, 66). These criteria are based on Bayesian statistics and allow comparison of different, non-nested models.

They are calculated as AkIC =log1

N

i=1 N

ei2+2K

N and BIC =log 1 N

i=1 N

ei2+K

N⋅logN , (24) where N is the number of observations, K the number of regressors, and i indexes the observations (households). In both criteria, a lower value indicates a better model, and in both cases the value increases with an increasing number of regressors. This increase is larger in the BIC, which thus tends to guide to simpler models. The AkIC provides better results in small samples. Values of both criteria are calculated and used in the model comparison.

18 The F-test tests the hypothesis that all coefficients except for the intercept are equal to zero. The empirical F-value Femp= R2/(K−1)

(1−R2)/(NK) (where K is the number of regressors and N the number of observations) is compared to the critical value of the F-distribution. For a detailed description see VERBEEK 2012, 26–28).

19 On the KS-test See MASSEY 1951.

A B D E G H

AkIC, BIC -0.88, -0.56 -0.91, -0.51 -0.93, -0.64 -0.91, -0.59 -0.90, -0.61 -0.89,-0.59

KS-Test (p-value) 0.65 0.87 0.70 0.82 0.69 0.68

Dependent variable: ln_inc.

Columns A, B: standard errors in parentheses; Columns D, E, G, H: corrected standard errors in parentheses *, **, and *** indicate significance at the 10%, 5%, and 1% level, respectively

Table 12: Estimation Results Log-linear Model

Eight of the coefficients are significant at least at the 5 percent level.20 All of the significant coefficients have the expected sign. The coefficient of gender is negative; if the household head is a woman, household income is lower. The variable age has a positive coefficient. The older the household head, the higher is household income.

Higher education leads to higher income as well. This is expressed in the positive the coefficients of school, tertiary, and hmems_acad . Furthermore, the number of household members (hmems) has a positive coefficient as well. Larger households have higher incomes. Looking at the religiosity variables, the coefficient of church membership is positive, but it is not significant. The coefficient on traditional religion, however, is positive and significant. Church membership in general does not seem to impact on household income.

Due to the natural logarithmic transformation of the dependent variable, the interpretation of the coefficients is not as straightforward as without such transformation. In order to calculate the effect a variable has on household income, one needs to reverse the transformation by taking the exponential function of the natural logarithm of income: inci=eln_inci. Since ln_inc is composed additively of the variables multiplied by their respective coefficients, income of household i can be written as

inci = eβ0+ β1⋅x1i+...+ βK⋅xKi = eβ0eβ1⋅x1i⋅...⋅eβK⋅xKi, (25) where β1, … ,βK are the coefficients from equation (24) and x1i,..., xKi the values of the variables for household i. K is the number of explanatory variables including the constant. The effect of a specific variable can be calculated as eβj⋅xji. Correspondingly, the effect of traditional religion is e0.258⋅1=1.294. That is, where the household head practices traditional religion, predicted household income is 29.4 percent higher than in a household with otherwise equal characteristics whose head does not practice traditional religion.

Second, the same estimation is done including dummy variables for the different church categories:

20 The significance of the variables is calculated with a two-sided t-test, which tests the hypothesis that the coefficient is equal to zero by comparing the empirical t-value temp=βj

sj (where sj is the standard error of coefficient βj) to the critical value of the t-distribution. See VERBEEK (2012, 23–

25).

ln_inci = xiβ + β16church_mi+ β17church_zni+ β18church_zli

+ β19church_ai+ β20⋅church_bi + β21⋅church_oi+ β22trad_reli +ei

, (26)

where xi is a 1×16 vector composed of all regressors except for the religiosity variables and β a 16×1 vector of the respective coefficients. The results are displayed in table 12, column B. Although the number regressors has increased to K =23, the model seems to have better explanatory power. The adjusted R² has gone up to adj. R² =0.48 and the AkIC down to AkIC = −0.91, indicating a better model.

This result, however, is not unequivocal, since the BIC, which imposes a harsher penalty for a higher number of regressors has increased to BIC = −0.51. With one exception, the same coefficients as before are significant. The variable dist_road is now significant at the 10 percent level. When taking a closer look, however, the change to the previous model is marginal. The coefficient changed by 0.01, causing the it to slide just below the 10 percent error margin. Only one coefficient of a church variable is significant, the coefficient of church_zn, the dummy variable for membership in the ZCC. In terms of magnitude, the coefficient is only excelled by the coefficient of tertiary education. Traditional religion is still significant, with a slightly larger coefficient than before. Four of the other religiosity dummies have positive coefficients and the coefficient of the dummy for membership in Apostolic Churches is negative (but not significantly so).

Im Dokument He shall lift you up? (Seite 81-85)