• Keine Ergebnisse gefunden

3.3 Examining gender inequalities in factors associated with in-

3.3.5 Model specification

cases with missingness before modeling (a description of this data process can be found in Supplementary information 5.12). In sum, the final data set for the study on rural poverty is composed of information on 4,434 women-headed and 14,877 men-women-headed households. Each of these two sets of data contains 42 theoretical poverty risk factors describing the individual/house-hold, community, and regional levels. These data is freely available from Figshare at https://doi.org/10.6084/m9.figshare.21183271.

• β, ..., β14τ represent the parametric component for estimating linear effects of the 14 categorical variables (see Section 2.1.1);

• s(z1), ..., s28τ(z28) is the model component for the 28 univariate con-tinuous variables (see Section 2.1.2);

• sgeoτ(lon, lat)are the spatial effects estimated based on the geographic coordinates lon and lat corresponding to the centroid of each munici-pality (see Section 2.1.3);

• sint1τ(varying1) and sint2τ(varying2) denote the component for inter-action effects (see Section 2.1.4):

– age of the head by education level, and – age of the head by marital status.

• φ0iτ denotes the cluster-specific random intercept due to the hierar-chical data structure, in which individual/household observations are connected to the information for the communities, and these, in turn to the regional information (see Section 2.1.5); and,

• ετ i represents the quantile-specific regression errors.

This way, the structured additive quantile regression model expressed in Equation 3.3 has a total of 75 potential effects associated to 42 theoretical covariates (see Tables 3.13, 3.14, 3.15 and 3.16) plus spatial and random effects. The following Table 3.17 lists these alternative effects included in the full model.

Table 3.17 List of alternative effects by covariate in the full model

Variable Alternative effects

Individual-/household-level covariates

-Head’s age in years Linear and/or nonlinear

-Education level Linear

-Marital status Linear

-Head’s age by education level Interaction

-Head’s age by marital status Interaction

-Indigenous origin Linear

-Social networks Linear

-Credit card Linear

-Disability Linear

-Type of household Linear

-Access to food Linear

-Access to health services Linear

-Dwelling with adequate quality and sufficient space Linear

-Educational lag Linear

-Access to basic housing services Linear

-Access to social security Linear

-Weekly housework hours Linear and/or nonlinear

Community-level covariates

-Social marginalization Linear

-Emergencies due to weather Linear and/or nonlinear

-Gini index Linear and/or nonlinear

-Human development index Linear and/or nonlinear

-Municipal functional capacities Linear and/or nonlinear

-Women-to-men ratio of housework hours Linear and/or nonlinear

-Women’s political participation Linear and/or nonlinear

-Migration of women Linear and/or nonlinear

-Migration of men Linear and/or nonlinear

-Women’s household headship Linear and/or nonlinear

-Women’s economically active population Linear and/or nonlinear

-Men’s economically active population Linear and/or nonlinear

-Women working in the primary sector Linear and/or nonlinear

-Men working in the primary sector Linear and/or nonlinear

-Women working in the secondary sector Linear and/or nonlinear

-Men working in the secondary sector Linear and/or nonlinear

-Women working in the trade sector Linear and/or nonlinear

-Men working in the trade sector Linear and/or nonlinear

-Women working in the service sector Linear and/or nonlinear

-Men working in the service sector Linear and/or nonlinear

-Municipality of residence Random

-Centroid coordinates: longitude, latitude Spatial

Regional-level covariates

-Corruption Linear and/or nonlinear

-Satisfaction with public services Linear and/or nonlinear

-Violence against women and girls in the community Linear and/or nonlinear

-Violence against women and girls at school Linear and/or nonlinear

-Violence against women and girls in the workplace Linear and/or nonlinear

-Violence against women and girls by an intimate partner Linear; and/or nonlinear

-Violence against women and girls in the family context Linear and/or nonlinear

-State of residence Random

Three alternative effects are taken into account for each of the 42 covari-ates considered in the full model. First, purely parametric or linear effects for the categorical independent variables (Section 2.1.1). Furthermore, for continuous covariates we proceed as described in Section 2.1.2, i.e. given that no functional form is imposeda priori to continuous variables, both lin-ear effects and nonlinlin-earities are considered as modeling competing options for each of them. The motivation behind this is found in existing research pointing to the existence of nonlinear effects on income of covariates such as head’s age (Deyshappriya & Minuwanthi, 2020). Third, we also intro-duce interaction effects between head’s age and categorical covariates educa-tion level and marital status. Previous studies have found that the effect of both education and marital status varies across lifetime (Torres Munguía &

Martínez-Zarzoso, 2020). Moreover, it is evident that the level of education of the head depends by definition on her/his age. Similarly occurs with the interaction of age and marital status.

Model in Equation 3.3 has a large number of parameters to estimate, leading to a complex and high-dimensional setting with which tradional re-gression methods can not find a solution. We therefore apply the three-step methodology described in Section 2.2. Details on the implementation of this methodology applied to this study of rural poverty are described in the fol-lowing Section 3.3.5. Findings are described in the Section 3.3.6.

Implementation details

We apply the three-step methodology to the model expressed in Equation 3.3 as follows. For each of the four models estimated in this research, 5000 initial boosting iterations are performed. After cross-validating to prevent overfitting the prediction accuracy of the models is optimized at the number of iterations shown in 3.18.

Table 3.18 Number of boosting iterations optimizing the models

Head’s sex Poverty level Number of iterations

-Man Extremely poor 813

Poor 2846

-Woman Extremely poor 364

Poor 617

Once the models are fitted at their optimal number of iterations, we exe-cute stability selection to avoid the erroneous selection of non-relevant

vari-ables. Specifically for this study on rural poverty, we use 50 complementary pairs for the error bounds and a threshold for the relative selection frequency of 0.8, which corresponds to a significance level of 0.0381.

Finally, 95% confidence intervals for the subset of effects selected as stable in step 2 are calculated by drawing 1000 random samples from the empiri-cal distribution of the data using a bootstrap approach based on pointwise quantiles.

All computations are implemented in the R package “mboost” (Hothorn et al., 2020). The corresponding code to replicate these results is freely available from Figshare at https://doi.org/10.6084/m9.figshare.21183271.