• Keine Ergebnisse gefunden

For the analysis of the data, the Ordinary Least Squares (OLS) regression models are used. In the regression equation the dependent variable (or regressand) y (here daily per capita expenditures) is explained by a function of x. X are the independent or predictor variables, here the different variables derived from the indicators. The OLS models minimize the sum of the squared prediction errors (Davidson 2000). In other words the linear regression curve is the one, in which the sum of the squared discrepancies of curve to the observed values is minimised (Draper et al. 1981).

To find a suitable initial set of indicators for Model 1, one-step OLS was conducted. The regression models run in SAS, using the MAXR technique, which seeks to maximise the explained variance of the depended variable. This study searches for a set of the 15 best indicators for predicting a poor household in Central Sulawesi. Therefore, MAXR seeks for an optimal improvement of R2 within a set of 15 variables. R2 is the ratio of the variance in the dependent variable that is explained by the model and its regressors, divided by the overall observed variance of the dependent variable. The coefficient can range between 0 and 1.

Consequently, an R2 with the value of 1 would mean, that the predicted values for the dependent variable for all households are the same as the observed values. An R2 value of 0.7 would imply that 70% of the observed variance in the dependent variable is explained by the model and its regressors. The MAXR procedure seeks to maximise R2 and considers all combinations among pairs of regressors to move from one step to the next. First, MAXR tries to find a one-variable model, which provides the highest R2. In the next step another variable is added. This variable has to be the one, which yields the greatest improvement in R2. Within this two variable model, each variable is compared with each variable not in the model.

MAXR ‘decides’ after each comparison, whether to remove a variable and replace it or not in order to get a maximal R2. This procedure is done until no more maximisation is possible or a certain amount of variables (e.g. 15) is reached. In the selection process of the indicators this MAXR procedure was used. However, an important drawback of this procedure is that cannot handle the inclusion of sampling weights.

After obtaining the results from a regression, a number of checks and related adjustments have to be made. One such check is whether the coefficient carries a sign that concurs with what one would expect from theory. For example the variable ‘bed ownership yes, no’ has to have a positive sign, because richer households are more likely to have beds compared to poorer households. On the opposite the variable ‘share of total expenditures on food’ has to 61

have a negative sign, because poorer households normally spend a higher share of their total expenditures on food than richer households do.

Any of those variable sets found can be described as a poverty assessment tool for the purpose of identifying the poverty status of a household. The variables or indicators are derived from the composite questionnaire. The dependent variable (per capita daily expenditures) is, like any other variable defined in monetary values (as expenditures or values of assets), converted into the natural logarithm in IDR, the national currency of Indonesia. All ordinal variables, such as the ‘type of wall’, with lower values indicating inferior material and higher values indicating superior materials, are converted into dummy variables (for each sub-type) (Zeller et al 2005).

In the analysis presented here, the first step done was to try out different ways to generate an initial indicator set for Model 1. Because of the comparatively low sample size (281 households) it was not possible to take all 278 regressors (including 9 control variables).

The different ways of selection were:

- A: The best 86 indicators of all variables were selected by the MAXR technique from SAS. The adjustments to check, whether the sign of the regression coefficient is theoretically right, were done within the 15 best variables out of 86 best indicators presented by SAS.

- B: Before creating a set of the 86 best indicators, a regression with all variables was run and adjustments were done within 5, 10 and 15 best indicator-sets.

- C: This was the method which determined the number of variables taken for all methods (86 variables). All variables were split into seven dimensions, namely

‘education’, ‘food, health and clothing’, ‘demography and occupation’, ‘assets and durable goods’, ‘agricultural assets and land ownership’, ‘housing’ and ‘finances, social capital and others’. The best indicators were selected by MAXR out of each dimension. Adjustments were made within the dimensions. All the variables of one dimension were included in the selection process. The adjustments were done until every sign was fitting and MAXR could not be improved anymore.

62

Table 21: Number of indicators in dimensions of selection Dimension Total number of variables in

this dimension

Number of selected variables

Education 38 13

Food, health and clothing 43 12

Demography and occupation 33 9

Assets and durable goods 49 12

Agricultural assets and land ownership

38 6

Housing 36 17

Finances, social capital and others

31 17 Total 269 86

- D: In this method, the best indicators again from each of the dimensions were taken.

The number of indicators from each dimension was the same as in method C, but this time no adjustments were done within those dimensions. Again only the best 15 variables out of the set with the best 86 variables were adjusted.

- E: In this set there are no adjustments at all. It is the best set of 86 variables without changing anything. Out of these set 15 indicators were chosen again as best from SAS MAXR. E is not a data set, which can be used to run the models properly. It was created just as a reference.

In all steps for the selection of an indicator set, an INCLUDE statement was included for nine regressors as control variables. These variables were: household size, household size squared, the age of the household head and age of household head squared. These variables take into account the influence of demographic factors that in previous research have been found to be powerful variables in explaining per capita expenditure at the household level and additionally five regional dummies which seek to capture regional agro-ecological, cultural and socio-economic differences between regions (Zeller et al. 2005).

For making good and valid predictions, a regression model or poverty assessment tool needs certain accuracy. In the following, seven measures of accuracy performance that are used in the models are presented:

63

- Total Accuracy is the percentage of the total sample of 281 households whose poverty status is correctly predicted by the regression model.

- Poverty Accuracy is accuracy among the very poor. It is expressed as a percentage of the total very poor. This measure refers to the households correctly predicted as very-poor.

- Non-poverty Accuracy: The accuracy among the not very poor is expressed as percentage of the total number of not very poor. This measure refers to the households correctly predicted as not very-poor.

- Undercoverage represents the error of predicting very-poor households as being not very-poor, expressed as a percentage of the total number of very-poor households.

- Leakage reflects the error of predicting not very-poor households as very poor, expressed as a percentage of the total number of very poor households.

- Poverty Incidence Error (abbreviated in tables as PIE), defined as the difference between the predicted and the actual (observed) poverty incidence (here headcount), measured in percentage points.

- Balanced Poverty Accuracy Criterion (abbreviated in tables as BPAC), defined as:

Poverty Accuracy minus the absolute difference between Undercoverage and Leakage, each expressed as a percentage of the total number of the very poor. When Undercoverage and Leakage are equal, the BPAC is equal to the Poverty Accuracy.

BPAC is measured in percentage points (Zeller et al 2005 /The IRIS Centre 2005).

For the comparison between the different methods to create a set of the 15 best indicators, (preliminary) accuracy tests with these sets were done. The results can be summarised as follows:

64

Table 22: Accuracy performance of different selection methods

Method A Method B Method C Method D Method E

Total accuracy 84.41% 84.41% 84.34% 84.34% 84.34%

Poverty accuracy

40.74% 40.74% 37.04% 40.74% 44.44%

Non-poverty accuracy

96.04% 95.6% 94.27% 94.71% 93.83%

Undercoverage 59.26% 59.26% 62.96% 59.26% 55.56%

Leakage 16.67% 16.67% 18.52% 22.22% 25.93%

PIE - 8.19% - 8.19% - 8.54% - 7.12% - 5.69%

BPAC - 1.85% - 1.85% - 7.41% 3,70% 14.82%

In the first two Methods (A and B) SAS MAXR selected the same indicator set, even if the previous adjustments were different. The method E, which has no adjustments at all, appears to be the method that has the highest accuracy in predicting the poor. Nonetheless, for the further calculation of Model 1, method D, where first the best variables within each dimension without adjustments are selected - referring to the amount of variables in method C, to create a set of 86 indicators - and then adjustments are done within the variables selected as best 15 by the MAXR procedure, is taken. Even, if the total accuracy is a little lower than in the case of method A, B and C, the method D occurs to provide the highest BPAC within the feasible methods.

The full list of all variables as well as the variables of method D is listed in Annex 3 and 4.

65