• Keine Ergebnisse gefunden

2. Land inequality and numeracy in Spain during the 17 th and 18 th century

2.3 Methodology and data

The regions considered in this research are illustrated in Figure 2.1. Table 2.1 specifies the number of observations by province and period. Table 2.6 in the appendix contains a description of the sources18.

To measure land equality, we use the ratio between the number of farmers and the overall agricultural population, an indicator suggested by Clark and Gray (2014). Our definition of farmers depends on the contemporaneous naming of occupations. “Farmers”

(labradores) were not only those who owned land, but also those who rented land and ran a farm of a substantial area. Hence, a day labourer (jornalero) who was usually not possessing or controlling land, would not be identified as farmer by contemporary census takers (Tollnek and Baten 2017). Although quantitatively almost irrelevant, we also include “hortelano” in the same category as farmers, since they usually also had some control over plots of land that were intensively farmed and they could provide better nutrition to their children in crisis situations19. Although hortelanos were obviously not farmers, we included them for simplicity in the variable “farmers’ share” (justified by their small number). In order to assess the plausibility of the farmers’ shares based on our

18 Within these sources, we analysed a convenient sample and we took care not to select only special groups.

19 The difference between “labrador” and “hortelano” lies in the type of land they own. For the former it was rain-fed for the latter it was irrigated (Bermúdez Méndez and Martín Chicano 2007).

sample, we can calculate a similar farmers’ share for the Floridablanca census (even if the Floridablanca census was recorded somewhat later, in 1785-87). The correlation is very strong (Figure 2.2, aggregated on province level). A large share of both our-sample-based farmers’ shares and the Floridablanca-our-sample-based farmers’ shares are in the 20 to 40 percent range. Our sample is slightly more urban (hence a lower farmers’ share for Sevilla, for example) and more Andalusian. This difference is mostly compensated for by our weighting procedure.

In order to assess numeracy, we employ the “age heaping” methodology using the ABCC index20. This method considers the share of individuals who are able to state their precise age in years, in contrast to those who report an age rounded to a multiple of five.

For instance, an individual could state “I am 45” when he or she is 44 in reality, but did not know it exactly. Numeracy and literacy are robustly correlated, though basic mathematical skills diffused earlier than literacy. In addition, the potential biases caused by counting cultures and the institutional settings of censuses have been thoroughly discussed throughout the numeracy literature, but the results did not invalidate the age heaping method (Tollnek and Baten 2017). Accordingly, we can argue that, just as signature rates in official documents, despite their limitations, can serve as proxy for basic literacy (Reis 2005; Rodríguez and Bennassar 1978), age heaping can serve as a proxy for basic numeracy.

The ABCC index is a simple linear transformation of the Whipple index (1), derived by A'Hearn et al. (2009). The ABCC index (2) allows for an easier interpretation and yields an estimate of the share of individuals who state their age precisely:

20 The term “ABCC” results from the initials of the authors’ last names plus that of Gregory Clark, who commented on their paper.

(1) 𝑊ℎ = ( (𝐴𝑔𝑒25 + 𝐴𝑔𝑒30 + 𝐴𝑔𝑒35 + ⋯ + 𝐴𝑔𝑒60) 1

5 ×(𝐴𝑔𝑒23 + 𝐴𝑔𝑒24 + 𝐴𝑔𝑒25 + ⋯ + 𝐴𝑔𝑒62)

) × 100

(2) 𝐴𝐵𝐶𝐶 = (1 −(𝑊ℎ − 100)

400 ) × 100 𝑖𝑓 𝑊ℎ ≥ 100 ; 𝑒𝑙𝑠𝑒 𝐴𝐵𝐶𝐶 = 100

This index ranges from 0 to 100, where 100 indicates no heaping patterns on multiples of five; meaning that the entire society has skills in basic numeracy. The age groups we use are in increments of ten years; 23 to 32, 33 to 42 etc. We omitted the age range 63 to 72, as this group offers relatively few observations, especially for the seventeenth and eighteenth centuries when mortality was relatively high (Schofield and Reher 1994). Crayen and Baten (2010) analysed age effects carefully and found that they do not have a strong influence once the birth cohort effect is controlled for: older individuals may round more strongly, but mostly because they were born earlier. The only exception is the youngest group, age 23-32, which needs an adjustment of 25% that we calculated in our sample (Crayen and Baten 2010)21.

While the ABCC index refers to averages of groups (by region and birth decade, for example), it is also possible to analyse the likelihood of individuals to report a rounded

21 Moreover, a potential bias could result from counter-checking by the officials who collected the local censuses. We looked at each source by itself to assess whether numeracy was close to 100 percent in local communities and times in which this could not be expected. This phenomenon of counter-checking occurred in some Russian and Korean sources, for example, as described by Baten, Szołtysek and Campestrini (2017) as well as Baten and Sohn (2017).

They therefore decided to discard a part of their sources. In Spain, government officials were not counter-checking sources to the same extent, as we do not observe this phenomenon of numeracy being very close to 100 percent.

age. This can be done by assigning the binary variable “numerate” which is coded as 1 for those who report an unrounded age and 0 otherwise (Juif and Baten 2013; Tollnek and Baten 2017). The binary variable can be analysed with Logit or Probit regression models or by using a linear probability model (LPM) with heteroskedasticity-robust standard errors. For the result to be interpreted in ABCC-values under the LPM, it needs to be multiplied by 125 (by 100 to move from a fraction between 0 and 1 to a percentage, and by an additional 25 to account for the fact that 20% of the population actually do have ages ending in 0 or 5).

How representative is the sample? Fortunately, the availability of evidence in Spain resulted in a quite widespread geographic distribution (Figure 2.1). Most regions can be covered in the seventeenth and eighteenth centuries, except the northwestern coast and Catalonia. We have more observations on Andalusia, but we can adjust this overrepresentation by assigning smaller weights to Andalusian observations and larger weights to the other provinces (see the notes in Table 2.3 for details). Socially, our local censuses are quite representative, because they include all social strata, as can be seen from the occupational information. We also took care that we did not only record a special effect in the Cadaster that might have reflected a special sub-population (such as the nuns in a monastery or the merchant quarter of a city, for example). We have rather drawn samples that cover various parts of cities and villages, if the archival situation allowed us to do so. As a definition, we will call cities and villages “local communities” in the following. In general, we distinguish between local communities, provinces and regions (as in Figure 2.1).

Finally, is the population of each local community sufficiently covered by at least some observations? We calculated the approximate share of our sample, relative to the

total population in the earliest reliable census, the Floridablanca census (1785-87)22. As a result, in only 10 local communities, our sample represented less than 10% of the total population older than 25 years of age, while for 48 local communities we could obtain more than one tenth of the overall population (see Table 2.7 in the appendix)23. As there were differences in the archival survival rates in various local communities, we needed to weigh the samples in order to obtain regional representativeness anyways.

Finally, we analysed whether the observations for which we have occupations and those for which we do not have occupations are comparable. The numeracy index of those with occupations was 64.3 and the one without occupations was 66. Hence the numeracy index difference is only 1.7 points, which is a very small difference that can easily be caused by composition effects.

22 Using this census, we calculated the inhabitants who were more than 25 years old (given the way in which the Floridablanca census aggregates the information, it is not possible to take it from 23 years of age) by local community. We divide the number of persons in our sample by the census total, even if our sample refers to an earlier period. Due to the lack of reliable census sources for occupations in the sixteenth, seventeenth and early eighteenth century, it is not possible to obtain reliable census totals per local community for earlier periods.

23 The ten cases of less than 10% refer mostly to Andalusia, for which we have overall a very high number of observations anyways. In other words, if we would have a 10 percent share for these Andalusian local communities, our regional representativeness would actually be smaller. The same is the case for the urban share – our sample has slightly more urban cases than the general Spanish population, hence we would have a less representative sample, if Écija, Córdoba etc.

would be presented by a 10% sample.