• Keine Ergebnisse gefunden

Table B.1 gives an overview over the availability of the explanatory variables. Subsequently, the construction of those variables is described in detail.

Table B.1 - Number of observations by country

uz 6 0 0 0 6 0 0 5

yu 6 6 4 0 20 20 18 8

za 6 0 0 0 9 11 9 10

total 270 218 96 45 783 786 636 632

N – Population figures up to 1800 are from McEvedy and Jones (1978). In most cases, the latter explicitly provide figures in 50-year intervals. Occasionally, however, the values need to be inferred from curves. For the year 1550, they can be interpolated. A few exceptions require mentioning:

• McEvedy and Jones (1978) provide a common curve for Belgium and Luxembourg. Hence, the figures were divided up based on today's population of both countries, i.e. a fraction of 95.5% was assigned to Belgium and the remainder to Luxembourg.

• In the case of Syria and Lebanon, 80% of the population in 1500 and 1600 was assigned to Syria.

• UK population was computed as the sum of the figures for England, Wales, and Scotland.

• Russian figures were chosen to reflect only the populace on European ground, given that Russian innovation events occurred primarily in this area.

• Armenian, Estonian, Latvian, Lithuanian, and Uzbekistanian population is unavailable from McEvedy and Jones (1978). Here, the 1950 figures by Maddison (1995) were used as a point of origin to impute the remaining observations such that the population growth rates equaled those of Russia.

• Indian figures include the whole of British India, i.e Pakistan as well as Bangladesh.

• Turkish population covers both the Asian and European part.

Innovation and Growth on a Macro Level, 1500-1990 149

Starting with 1820, the figures are based on Maddison (1995). If two or less consecutive decades were missing, the values were filled in by interpolation.

Y – Prior to 1820, Maddison (1995) gives GDP per capita only for the years 1500, 1600, and 1700. Thus, the years 1550, 1650, 1750, and 1800 were interpolated for all countries. No extrapolation was made, however. As of 1820, decadal figures are available for many countries from Maddison (1995). Two or less consecutive missing decades were filled in by interpolation. Some exceptions require mentioning:

• In the case of Russia, Mexico, India, Japan, Korea (South), Iran, Iraq, Syria, Turkey, Egypt, and South Africa, interpolations were made to fill the four subsequent decades between 1820 and 1870.

• GDP per capita for Bulgaria, Hungary, Poland, Romania, and Yugoslavia is available only as of 1870, respectively 1820 in the case of Czechoslovakia. All preceding periods were imputed such that the growth rates equal those of the collective category 'Eastern European Countries' according to Maddison (1995).

• USSR figures by Maddison (1995) were used for Russia.

• Figures for Armenia, Estonia, Latvia, Lithuania, and Uzbekistan are available as of 1973. The missing periods were imputed such that the growth rates equal those of the USSR.

Inst – After 1800, this variable is equal to the measure of executive constraints from the Polity IV project. It assigns a score of between 1 and 7, the former indicating unlimited authority of the executive, and the latter representing executive parity. Further details are described in Marshall and Jaggers (2002). Some adjustments were made to the original coding:

• Instances of "standardized authority scores", such cases of transition, interregnum, or interruption, have been prorated, converted to missing, or converted to zero score as suggested by Marshall and Jaggers (2002, p. 16).

For Russia, inst contains the Soviet Union scores from 1930 through 1990.

In the case of Germany, prior to 1870 inst contains Prussian scores, and between 1950 and 1990 the scores of West Germany.

• For Yugoslavia, values prior to 1921 were set equal to the scores of Serbia.

• For Korea after 1950, the scores were taken from South Korea.

For all periods up to 1800, as well as periods thereafter that are not covered by Polity IV, inst is equal to the measure used by Acemoglu, Johnson and Robinson (2005, 2002). It is derived from different sources, but follows the same general concept.

geo1-geo5 – Five dummy variables proxy for the distance of a country to the innovative

center of Europe. geo1j =1 for the UK, France, and Germany, Belgium and the Netherlands.

=1 2j

geo for all countries, which were for most of the observation period immediate neighbors to one of the previous five countries, i.e. Austria, Switzerland, Czechoslovakia, Denmark, Spain, Hungary (respectively Austria-Hungary), Ireland, Italy, Poland, and Sweden. geo3j =1 for those countries that have been an immediate neighbor to one of the geo2 countries for most of the time, namely Latvia, Norway, Portugal, Romania, and Yugoslavia. geo4j =1 for countries in the periphery of Europe. Those are Bulgaria, Estonia, Finland, Greece, Lithuania, and Russia. Finally, geo5j =1 for countries that are located outside of Europe, such as Armenia, Argentina, Australia, Brazil, Canada, China, Colombia, Egypt, Israel, India, Iraq, Japan, Korea, Mexico, New Zealand, Panama, Syria, Turkey, Trinidad and Tobago, the US, Uzbekistan, and South Africa.

Innovation and Growth on a Macro Level, 1500-1990 151

h – Human capital is defined as the average number of school years divided by the

potential maximum of 17 years of schooling. Schooling duration, djt, was taken from Barro and Lee (2001) for the period 1960-1990. Prior to 1960, or if no information was available from Barro and Lee (2001), schooling duration was constructed based on three different measures:

1. Primary and secondary enrollment rates. Those are available from Lindert (2004) for 1830-1930. Countries and periods not covered by Lindert (2004) were supplemented with primary school enrollment rates by Benavot and Riddle (1988). In rare cases, the latter were used instead of Lindert's, because the figures appeared to make more sense.

Detailed information on those exceptions can be given by the author upon request.

Further, if only primary enrollment rates were available, secondary enrollment rates were completed as follows:

a. Missing values were interpolated, if only one consecutive decade was vacant.

b. Else, if at least one observation with secondary enrollment was available for a specific country, the average ratio of secondary to primary enrollment for this country was used to impute the missing secondary enrollment rates.

c. If, after steps 1a and 1b, secondary enrollment was still vacant, the missing value was converted to zero. Admittedly, this causes underestimation of years of schooling. But the error is small given that secondary enrollment was regularly less than 10% of primary enrollment. Further, given that unobserved heterogeneity is controlled for in the empirical analysis, no bias is entailed in this approach as long as changes in secondary enrollment are in line with changes in primary enrollment.

Average duration of total schooling, dejt, was computed according to

s secondary schooling last 6 years each in every country of the world. This is simple, but unlikely to cause a bias in a regression analysis, if unobserved heterogeneity is controlled for. Further, tertiary education was neglected; prior to 1950, this does not seem to be a big problem. Finally, a lag of three decades was applied to account for the delayed effect of enrollment rates on average education of the labor force. Of course, a more accurate measure could be obtained by considering birth-cohort-specific enrollment rates and birth cohort sizes. For the purpose of this study, however, it seems admissible to abstain from this type of perfectionism. Constructing a more perfect measure of average schooling duration would be a task for a whole project in its own right.

2. Numeracy levels. The numeracy concept exploits the phenomenon of age-heaping, which describes the fact that people round their ages when asked for it in censuses or on other occasions. One way to express the extent of heaping is the Whipple Index (see A'Hearn, Baten and Crayen, 2008). It sums up the frequencies, ni, of all ages ending in 0 or 5, and expresses the result relative to one-fifth the sample size. It must be defined over an interval, which contains each terminal digit an equal number of times, such as 23 to 62:

Whipple indices are available from Crayen and Baten (2008) for the period 1820-1949 and 165 countries (henceforth referred to as late numeracy levels). The data (including additionally the decades 1800 and 1810) were kindly provided by the authors for the

Innovation and Growth on a Macro Level, 1500-1990 153

purpose of this study. Another collection of numeracy levels is available from A'Hearn, Baten, and Crayen (2008) for the period 1300-1800 (henceforth called early numeracy levels). It is based on relatively small regional samples and thus less reliable than the late numeracy levels. To estimate a general relationship between numeracy levels and years of schooling, however, some common observations between both were needed.

Morrison and Murtin (2007) provide years of schooling on a world-region level for the years 1870, 1910, and 1950.116 By computing population-weighted average Whipple Indices, i.e. by aggregating the country-level late numeracy data for the same eight world regions, 20 observations were obtained, on which the estimation of the relationship could be based.117 Allegedly, the relationship must satisfy the functional form

This is because the theoretical upper bound for Whipple values is 100, which reflects a situation with no age-heaping. Hence, at higher schooling durations, the curve must asymptotically approach a vertical line through W =100. Likewise, years of schooling cannot be lower than zero; hence, at low levels of schooling the curve is likely to downloaded for this study in April 2008 from http://www.stanford.edu/-~murtin/EducationInequality.pdf. The only alternative to using the data by Morrison and Murtin (2007) would have been to estimate the relationship with the help of the enrollment-based measure of years of schooling.

117 No aggregate numeracy level could be computed for South-East Asia in 1870, Latin America in 1870 and 1910, as well as Africa in 1910. Also, two observations had to be excluded from the regression. The explanatory variable as given by C.24 was undefined in those cases because of a Whipple below 100.

(

jt

)

jt decades earlier, because the latter refer to birth cohorts. The OLS results imply a=6.22 and b=0.46. With these values, C.3 could be used to convert the numeracy levels by Morrison and Murtin (2007). It must allegedly satisfy the functional form

(

jt

)

b should be zero, too. The respective regression equation is

(

jt

)

jt those in C.5 gives the rule, according to which the literacy rates were converted to years of schooling. Figure B.1 depicts scatter charts of schooling duration versus numeracy levels, respectively literacy rates, as well as predicted years of schooling. The latter follow from B.3 and B.5 and the calculated parameters a and b.

118 For the original sources of the data, see A'Hearn, Baten and Crayen (2008).

119 No aggregate literacy levels were available for 1950 as well as for Africa in 1910, Latin America in 1870 and 1910, and South-East Asia in 1870.

Innovation and Growth on a Macro Level, 1500-1990 155

Figure B.1. The relationship between numeracy/literacy and years of schooling

Now, to obtain series of schooling duration without breaks that are consistent over time, the heterogeneous measures had to be brought in line. Therefore, if possible, adjustment factors were derived based on observations common to at least two series. Those were applied to the series before appending them. Further, depending on the reliability of the base-measure and the quality of the estimate, the different series had to be ranked according to their priority.

The following rules describe in detail the procedure applied to construct years-of-schooling time series, dt.

1. Set dt equal to the Barro and Lee (2001) measure of schooling attainment.

2. If, after step 1, a common observation of dt and enrollment-based years of schooling exists, apply the factor which is implied by the ratio of both measures and replace the missing values in dt by the adjusted enrollment-based schooling duration series. If more

than one common observation exists, choose the year 1960 to derive the adjustment factor.

3. If, after step 2, a common observation of dt and late numeracy-based years of schooling exists, apply the factor, which is implied by the ratio of both measures and replace the missing values in dt by the adjusted late numeracy-based schooling duration series. If more than one common observation exists, choose the earliest possible data point common to both series.

4. If, after step 3, a common observation of dt and literacy-based years of schooling exists, apply the factor which is implied by the ratio of both measures and replace the missing values in dt by the adjusted literacy-based schooling duration series. If more than one common observation exists, choose the one with the lowest estimated value for literacy-based years of schooling, because the estimates are more accurate at lower levels of literacy.

5. If, after step 4, a common observation of dt and early numeracy-based years of schooling exists, apply the factor which is implied by the ratio of both measures and replace the missing values in dt by the adjusted early numeracy-based schooling duration series. If more than one common observation exists, choose the one with the lowest estimated value for early-numeracy-based years of schooling.

6. If, after step 5, no common observation of dt and early numeracy-based years of schooling exists, apply the factor from step 3. and replace the missing values in dt by the adjusted early numeracy-based schooling duration series. Further, interpolate or extrapolate the adjusted series based on literacy-based schooling duration, if the latter is available and early-numeracy-based schooling duration is not.

7. Replace missing values in dt by the unadjusted late numeracy-based schooling duration series interpolated with the help of literacy-based schooling duration.

Innovation and Growth on a Macro Level, 1500-1990 157

8. Replace missing values in dt by the unadjusted late numeracy-based schooling duration series interpolated with the help of early numeracy-based schooling duration.

9. Replace missing values in dt by the unadjusted early numeracy-based schooling duration series interpolated with the help of literacy-based schooling duration.

Next, looking at the resulting time series graphically, few corrections were advised to achieve overall plausible curves:

• For China in 1950, 1960, and 1970, the values were replaced by the respective figures from Morrison and Murtin (1970).

• In the case of Czechoslovakia, the converted years of schooling measures could not be matched with the series by Barro and Lee (2001) on the basis of a common observation. Hence, it was aligned with the figure by Morrison and Murtin (2007) for Eastern European countries in 1870.

• In the case of Romania, the literacy-based measure is preferred over the late numeracy-based.

• For Sweden, the literacy-based measure has been appended without applying the respective adjustment factor, because this would lead to implausibly high values for the literacy-based schooling estimates.

Eventually, interpolations were made to the final series, if one or two decades were missing.