• Keine Ergebnisse gefunden

3. Who Got What, Then and Now? A Fifty Years Overview from the Global Consumption and Income Project . 51

3.2. Data

We use data from the Global Consumption and Income Project (GCIP), constructed according to the methods presented in Lahoti, Jayadev and Reddy (2016) and associated online materials (available on gcip.info) which together offer details of the procedures used, ongoing revisions to the methods employed, and how the data differs from other available sources. We construct estimates of annual data from 1960 to 2015, for each percentile of the population from 161 countries covering 97 percent of the global population in 2015. The GCIP is a complete ‘time-space system’ which produces estimates for every country-year, which is essential in order for us to be able to use it in a flexible way to construct estimates for country aggregates.

The GCIP was also, distinctively, built keeping the goals of attaining transparency, replicability and flexibility foremost during its construction. We aim to document fully the assumptions and choices made in the database generation process. The database is constructed in a manner that is intended to make it possible to adopt alternate assumptions and thus to test the sensitivity of the choices made, of which we demonstrate some examples in this paper.

Construction of the GCIP datasets involves several decisions on selection of data and methods used; some of the most important ones are discussed here briefly. First, we restrict ourselves to surveys that provide household per-capita data, as data employing equivalence scales in their construction use widely variable and incomparable methods and constitute a smaller proportion of the available data4. Second, for country-years with no consumption and income survey we

4 The majority of developing countries report grouped survey data in per-capita terms and hence we cannot apply any other equivalence scale to such grouped data. In the case of developed countries where data might be reported using an equivalence scale (for example for surveys from LIS, SILC and ECHP) we have used the underlying unit data to estimate instead per-capita measures. Findings based on applying an equivalence scale might differ from the ones based on using per-capita data as distributions of size and composition of households can be systematically different across regions. For example, Sub-Saharan African households tend to be larger and with more children than those in Latin America or other regions. But there is no consensus on the appropriate equivalence scale to use and how this would vary by country or region. Determining an appropriate equivalence scale is outside the scope of this paper, although as we note in Jayadev, Lahoti and Reddy (2016) the choice of scale is greatly consequential and the arguments for choosing one of these are weak. In order to maintain comparability we use per-capita measures as is also done in Lakner and Milanovic (2015) and in the World Bank’s Povcalnet database.

55

interpolate or extrapolate the consumption or income profile using survey data from the closest survey years and appropriate growth rates5 from the national accounts. This method is similar to

Table 3.1: Summary Statistics for Surveys in Global Consumption Database (GCD)

1960-69 1970-79 1980-89 1990-99 2000-09 2010-13 Total

5 We use growth rate of household final consumption expenditure per capita when available for interpolation/extrapolation. We do not use any adjustment factor for the growth rate as used by Ferreira et. al (2016) but this should have minimum impact on our interpolations as means are bounded from both sides.

56

the one used in Povcalnet and is described in detail in Lahoti, Jayadev and Reddy (2016)6. These extensions might raise some valid concerns that we are aware of and hence are careful in our assumptions and provide an option for concerned users to use only actual survey data. For the period after mid-1980’s the density of surveys is dense, and our interpolation/extrapolation is unlikely to impact results in any significant way. A larger amount of the data before 1980 is interpolated or extrapolated due to sparse survey information and thus has to be treated with caution, but is still indicative of the trends during this period.

Third, the GCIP uses a regression-based ‘standardization’ method to predict the consumption shares of each quintile of the population for the Global Consumption Database (GCD) in country-years, which have an income survey but no consumption survey and the obverse for the Global Income Database (GID). The distribution of income is known to be more concentrated than that of consumption, but almost none of the existing databases correct for this (Ferreira et al, 2016; Lakner & Milanovic, 2015). We use information from 204 country-years across 71 countries in which there is both consumption and an income survey reported by the same statistical agency to derive a regression relationship between consumption and income for each quintile, using a Seemingly Unrelated Regressions (SUR) approach. We use various controls and different methods and specifications to test robustness of the relationship between income and consumption for each quintile. The set of countries on which we have information are spread across geographical and income groupings and over time. We find that the regressions are fairly accurate for in-sample prediction. We thus apply the same relationship to make out-of-sample predictions for other country-years. After doing so we find that the difference between income and consumption Gini coefficients (arising from comparisons of results from the two standardized datasets for the same country-year) is between 7 and 10 points, which is comparable to estimates in the literature based on simply comparing average differences7. A detailed description of the method used for standardization is provided in Lahoti, Jayadev and Reddy (2016)8.

6 Almost all databases, even the ones using benchmark years do some degree of interpolation/extrapolation to line up surveys for a particular year to do comparisons across countries.

7 In the full PovcalNet data, the average Gini index of consumption surveys is approximately 10 Gini points lower than the average Gini over the income surveys (Lakner and Milanovic, 2015). Li, Squire, and Zou (1998) suggest an adjustment of 6.6 Gini points based on comparison of average values in the Deininger and Squire (1996) database.

8 We do not standardize the various income concepts (gross, disposable, monetary) as we do not have enough data to ascertain relationships between gross vs. disposable vs. monetary shares for different quintiles as was the case with consumption and income. When we have a choice between surveys with various income concepts for a country-year, we prefer the disposable income over other income concepts (see Lahoti, Jayadev and Reddy (2016) for more details on our preference ordering).

57

Figure 3.1: Survey Means and GDP Per Capita for Survey Years from 1960 to 2013

Finally, we use estimates of consumption or income levels from surveys wherever they are available. This is a consequential choice, since survey means are often discrepant from (and typically lower than national accounts means). Figure 1 shows the relation between GDP per capita and survey means for country-years in our dataset. It can be seen that, generally, survey-means are much lower than the household consumption component of GDP-per-capita and thus of GDP-per-capita itself9. Investigation of the data suggests that this is a phenomenon that crosses decades and world regions. For this reason among others, the estimates of the absolute level of income that we arrive at, as well as its distribution, must be viewed with the proverbial salt in hand. We also standardize the means by converting means from income surveys (whose distribution was standardized in the previous step) used in the Global Consumption Database (GCD) to consumption means by using the share of consumption in National Accounts as a multiplicative conversion factor, while we use the reciprocal share to convert consumption to income means for the Global Income Database (GID).

9 The growth rates of survey means and of GDP-per-capita are also substantially different (but it is more difficult to find a clear trend of growth in survey means vs. in GDP-per-capita, as shown in our discussion of fast growing countries later).

58

We present the main results using 2005 PPPs and not the most recent 2011 PPPs as there is still debate in literature on which ones are more appropriate (Deaton and Aton, 2017; Ravallion, 2014) and we ourselves take an agnostic view with respect to PPP base years, especially where comparison over lengthy periods of time as well as space are involved. However, we have reproduced all of our primary results using both PPP base years and report the differences where they are noteworthy.

The Global Consumption Database (GCD) has a wide geographical and temporal coverage with data from various secondary and primary sources as seen in Table 3.1. The GCIP presently contains survey data for 1998 country-years spanning the period of 1960-2013 for 161 countries.

Most of the surveys are nationally representative (96% cover complete geographical area and 93% cover the entire population). Our data is drawn from various secondary sources like the World Bank’s Povcalnet database, UNU WIDER’s World Income Inequality Database (WIID), Socio-Economic Database for Latin America and Caribbean (SEDLAC), European Community Household Panel (ECHP), Statistics on Income and Living Conditions (EU-SILC), World Income Database (WYD) and from the LIS database and from data directly obtained from National Statistical Offices.

The density of surveys varies drastically across the decades. The 1960’s and 1970’s have the lowest density of surveys with only 79 and 82 country-year observations respectively from 44 countries. This is largely because of paucity of household surveys, especially in the developing countries, during this period. Our choice of using only per-capita surveys also restricts the number of country-years as this period has several surveys where only total income is reported at household level with no adjustment for household size. The formerly communist countries also have sparse data on income or consumption distribution prior to 1990. Given this data limitation we advise caution in interpreting the results encompassing the earlier period.

The GCIP datasets used in this paper reflect further improvements in methods of construction of the dataset described in Lahoti, Jayadev and Reddy (2016). In particular, we have made systematic efforts to reduce the volatility of data arising from surveys being selected from different data sources, by using specific methods to select surveys that belong to consistent long survey series. We have also extended the datasets by adding new survey sources (European surveys from SILC and ECHP) and incorporating surveys from recent years beyond 2010.

59