• Keine Ergebnisse gefunden

The Method Utilized in the Analysis

Given the goal of this paper to classify the countries of the world according to specific variables of interest, the statistical task has to encompass a reduction in dimensionality, since the outcome will not characterize each observation, i.e. country, by its values of the respective variables, but by its belonging to one certain group. The initial situation for the underlying country data gives values for each variable and country but does not give any hint for an already occurred grouping. Hence, the question is, whether some ”natural groups” of countries with similar characteristics are hidden behind the data set.

The statistical methodology that fulfills exactly this task to identify groups - or clusters- within a given data set without prior specification on the data or prespecified groups, is cluster analysis. Therefore, this methodology will be used for this analysis and is presented in detail in Chapter 4.

After performing a clustering of the countries, it could be of interest to de-tect the variables or combination of variables that are mainly responsible for the sub-division of the countries, i.e. the combinations of variables that con-tributed extensively to the specific outcome of the cluster analysis. This task is, as well, tackled by dimension reduction. One way of achieving this is to detect theprincipal componentsof the data set using theprincipal component analysis. Another way is to determine the key factors11 that are correlated the most with the data matrix. This is done by the so-calledfactor analysis.

For the purpose of this paper, I decided to perform afactor analysisafter the clustering procedure to find out the key combinations being responsible for the outcome of the cluster analysis and give an interpretational framework for the the combination of the two analyses. This will follow in Chapter 6.12

3 The Data

The aim of this thesis is to classify the countries of the world according to the categories wealth, urbanization and infrastructure, hence, data is needed for each country of the world. The theoretical background as to which variables should be included into the analysis has been given in Chapter 2, the task remains to collect the required data for these variables. For the purpose of this paper I consider determined variables at one point in time, namely to cluster the countries according to present data. No time-series analysis will be implemented in this analysis. However, further research can be done to evaluate the evolution of the outcoming clusters over time.

When dealing with data from such different countries, each one with an own national statistical institution, one has to pay special attention to the compa-rability of the procedures and data. It will not be possible to obtain perfectly comparable data sets, hence, an analysis has to rely on the best available ap-proximations.

The raw data underlying the analysis of this paper is given in Table I in Appendix B and shows data for 204 countries.

Those countries included in the analysis are based on the countries included

11Factors can be understood as combinations of variables.

12The factor analysis procedure will be explained in Chapter 6, for more details on principal component analysis, H¨ardle & Simar (2002) or Johnson & Wichern (1998) are suggested.

in the statistics of the United Nations.13

3.1 GDP per capita - PPP

The values for the variable GDP per capita measured in purchasing power parity (GDP), are taken from the CIA World Factbook (2002).14 Basis for this is the nominal GDP divided by the population and by the corresponding PPP estimate for a country.

yP = ε P

P yN (4)

yP being the measure of GDP per capita directly comparable with other GDP per capita due to the PPP adjustment, P being the national price level, being the nominal exchange rate to the reference or base country, P the price level in the base country and yN the nominal GDP per capita in the country of reference. For PPP to hold, the real exchange rate ε PP should be constant over time.15 Most PPP estimates utilized to calculate the GDP per capita values underlying the analysis of this paper stem from an extrapolation of PPP estimates published by the UNICP. The PPP esti-mates are generally reliable for OECD economies, whereas those estiesti-mates for developing countries are ”often a rough approximation”.16 Even though, the PPP estimates cannot be considered as being securely reliable, they are best approximations. Since nominal GDP values would not yield satisfying comparable measures of well-being across economies, I decided to use these values for GDP per capita based on PPP adjustment for the analysis. An

13The source of the listing is

http://unstats.un.org/unsd/demographic/social/population.htm. From that list, East Timor, Gibraltar, Hong Kong, Macao, The Occupied Palestine Territories and Western Sahara are left out due to missing data points or non-compatibility of the data.

Other extraterritorial areas, states or areas with unclear status that are left out of the analysis due to lacking data are Anguilla, Aruba, The British Virgin Islands, Cayman Islands, Christmas Islands, Cocos Islands, Falkland Islands, Farøer Islands, The Gaza Strip, Guernsey, Jersey, The Isle of Man, Mayotte, Montserrat, Niue, The Norfolk Islands, Pitcairn, Saint Helena, Saint Pierre and Miquelon, Taiwan, Tokelau, Turks and Caicos Islands, Wallis and Fortuna and The West Bank.

14TheCIA World Factbookcan be found on

http://www.cia.gov/cia/publications/factbook/index.html.

15This model is further discussed in Obstfeld & Rogoff (1996). A lot of empirical research has been done on whether the PPP hypothesis holds over time. A discussion on this can be found in Herwatz & Reimers (2002).

16See also the Notes and Definitions of theCIA World Factbook.

equal attention has to be kept when considering the respective gathering of the data. As mentioned above, each country has its own statistical institu-tions and thus approaches. The GDP data from the CIA World Factbook 2002 are mainly year 2001 estimates with some estimates dated earlier.