A.3 Supplementary information
3.4 Econometric methods
3.4.2 Few treated clusters
It is well-known that in difference-in-differences setups, it is crucial to control for po-tential intra-cluster dependence. In our application, there is clustering in both the state and the time dimension. Ignoring intra-cluster dependence will bias standard errors downward and lead to over-rejection rates (Bertrand, Duflo & Mullainathan, 2004). As explained in the previous section, we include a large set of time-varying covariates at the state level as well as additional time and state effects in our difference-in-differences regressions to pick up potentially differential time trends across states. This will already take out a fair amount of intra-cluster correlations, mitigating potential problems of cluster inference.
Our application is characterized by a small number of clusters, of which only one is the treated cluster. For this and similar cases, Mackinnon and Webb (2017) compare the wild bootstrap, the wild cluster bootstrap, and an intermediate case called wild subcluster bootstrap. Their results suggest that for our scenario (one treated cluster, thirteen un-treated clusters), the ordinary (= individual) wild bootstrap performs best.
Mackinnon and Webb (2017) also advocate comparing restricted and unrestricted boot-strapped p-values (i.e., with and without imposing the null hypothesis) as a diagnostic test for the validity of p-values. If the two coincide, this can be taken as an indication for their validity. Following this procedure, we found the ordinary (= individual) wild bootstrap to be the most adequate for our application. We, therefore, report p-values and confidence intervals based on the ordinary wild bootstrap (unrestricted version) throughout our results (given the chosen procedure, the results using the restricted ver-sion are similar and available on request).
Chapter 3. 3.5. Data
3.5 Data
The data for this chapter were provided by the Centre for Higher Education Research and Science Studies (DZHW), see Baillet et al. (2017, 2019). The DZWH starts a new rep-resentative survey of German university graduates every four years. The survey includes rich information on parental background, the individual’s higher education entrance qualification, choices during university study, and labor market entry. The main target population are all higher education graduates from institutions that are approved by the state. This includes universities as well as applied universities and similar institutions.
The sample was drawn at the level of the institution using a stratified cluster sampling (Baillet et al.,2017). For our analysis, we use the cohorts 2005 and 2013. This provides a clear separation into four groups of university graduates who completed their final high school years in either the pre- or the post-reform period and in either the reform state Baden-W¨urttemberg or other states. Table 3.1 summarizes the four groups.
Table 3.1– Categorization of DiD groups for the analysis
Group Before Treatment: After Treatment:
Control: HEEQ obtained before 2004 in control states
HEEQ obtained after 2004 in control states
Treatment: HEEQ obtained before 2004 in Baden-W¨urttemberg
HEEQ obtained in and after 2004 in Baden-W¨urttemberg
Note: The table specifies the four categories needed for the empirical analysis. HEEQ: Higher edu-cation entrance qualifiedu-cation, i.e. high school graduation (Abitur). Note that the year of HEEQ is not necessarily the start enrollment at university. Cohort 2005 includes only individuals with HEEQ years between 1997 and 2001, cohort 2013 only from 2005 and 2009.
Note that each cohort includes students with different HEEQ years as study durations differ, and as students do not necessarily start their studies immediately after obtaining the higher education entrance qualification. The HEEQ year represents the year in which the higher education entrance qualification was obtained, not the year in which the person enrolled in tertiary education. In our analysis, we exclude individuals with a HEEQ obtained before 1997 and after 2001 for the 2005 cohort, and before 2005 and after 2009 for the 2013 cohort in order to drop unrepresentative long- and short-term students.
Chapter 3. 3.5. Data In this way, we also exclude high school graduates who might have experienced an announcement effect as well as the year 2004 in which only theGymnasiumimplemented the reform but not certain other institutions that may also grant a higher education entrance qualification (Fachschulen).
Table 3.2 shows some basic sample information by gender. The two cohorts have ap-proximately the same size. The individual-level covariates included in our difference-in-differences regressions are gender, age, parental education in four categories, parental occupation in two categories as well as state and year of the HEEQ. Table 3.2 further presents summary statistics for the degree and occupational outcome variables used in the regressions. The degree variables are dummies indicating whether or not a particular individual obtained a degree in a particular field. Labels such as ‘at least one STEM degree’ mean that we have a small number of individuals with more than one degree but count them as STEM if at least one of their degrees is in STEM. Following com-mon practice, we include into STEM all fields in science, technology, engineering, and mathematics. More precisely, our STEM category includes the sciences (biology, chem-istry, pharmacy, geosciences, physics), technology (computer science), engineering (all subfields of engineering), and mathematics. As indicated above, we also consider smaller subsets of STEM fields: mathematics and natural sciences (MatNat) and engineering and computer sciences (EngComp).
For occupational outcomes, our data include the KldB occupation code (German classi-fication of occupations). For 2005, this is the KldB 1992, whereas for the other cohorts it is the KldB 2010. The German Federal Employment Agency provides a categorization into STEM and non-STEM occupations, but only for the KldB 2010 (Bundesagentur f¨ur Arbeit, 2019). For the KldB 1992 codes, we followed a translation from KldB 1992 to KldB 2010. This left us with a small number of cases for which it was not possible to assign a clear STEM or non-STEM status based on the 2010 STEM classification (because these occupations were more or less specific in the KldB 1992 classification than in the KldB 2010 classification). In order to resolve these cases, we employed a specific algorithm, the details of which are available on request.
Chapter 3. 3.5. Data Table 3.2– Descriptive statistics
Males Females
Variables Mean SD Mean SD
DiD
Treated individuals 0.053 0.223 0.059 0.235
HEEQ after treatment 0.414 0.493 0.446 0.497
HEEQ in treatment state Baden-W¨urttemberg 0.144 0.351 0.138 0.345
Age and Parents
Age 26.545 1.796 25.919 1.701
Highest parental education: Other 0.014 0.116 0.011 0.104
Highest parental education: Vocational training 0.357 0.479 0.355 0.478
Highest parental education: HS Diploma 0.049 0.216 0.050 0.218
Highest parental education: PhD, Uni & AU 0.581 0.493 0.584 0.493
Highest parental occupation status: White collar 0.944 0.230 0.951 0.216
Highest parental occupation status: Blue collar and other 0.056 0.230 0.049 0.216 State variables, year of HEEQ
Non working population per capita 0.047 0.023 0.047 0.024
Labor force participation per capita 0.502 0.025 0.503 0.025
Unemployment rate by gender 9.187 3.569 9.969 4.563
GDP per capita 27.485 6.633 27.847 6.948
Share of producing sector 0.087 0.029 0.086 0.030
Share of manufacturing sector 0.199 0.045 0.195 0.045
R&D per capita 0.092 0.045 0.096 0.048
Exports per capita 7.055 3.511 7.261 3.699
Imports per capita 6.619 4.321 6.912 4.589
Density of universities 2.067 0.454 2.071 0.473
Density of applied universities 3.501 1.092 3.578 1.159
Year 2002.094 4.374 2002.712 4.373
Mediators
Finale grade of HEEQ 2.248 0.602 2.155 0.594
Other path than academic HS 0.140 0.347 0.080 0.271
Employment before university 0.249 0.432 0.246 0.430
Vocational training before university 0.145 0.352 0.114 0.318
Applied university 0.298 0.458 0.205 0.404
Degree type: teaching profession 0.047 0.212 0.146 0.353
Outcomes
At least one degree in STEM 0.554 0.497 0.263 0.440
At least one degree in MatNat 0.130 0.336 0.145 0.352
At least one degree in EngComp 0.425 0.494 0.119 0.324
Current or last occupation in STEM 0.427 0.486 0.157 0.351
Current or last occupation in MatNat 0.025 0.151 0.028 0.159
Current or last occupation in EngComp 0.401 0.482 0.129 0.321
Note: HEEQ: Higher education entrance qualification. The two German states, Sachsen-Anhalt and Mecklenburg, are not included because they had a different reform during the period of interest. For all variables and the degree outcomes, we have 5199 male and 7652 female observations. For the occupation outcomes, we have 3664 male and 5470 female observations. For the regressions using the occupations as outcomes, we merge the state variables to the year of the degree. The German occupation classificationKldB is used for classifying individuals into different fields of occupation. Information on the states of the respective HEEQs are included in the appendix.
In order to control for potential time-varying differences between federal states and in order to minimize remaining intra-cluster correlation, we also include a set of state- and time-specific variables, as shown in Table 3.2. All variables are measured at the state
Chapter 3. 3.6. Empirical results level. They are merged to the year of the HEEQ for the degree regressions and to the year of the degree for the occupation regressions.
In the last step, we include variables whose realization was after the reform and which may, therefore, have been mediators of reform effects. As these variables might have been affected by the reform, their inclusion should proceed with caution. However, we also ran our difference-in-differences regressions taking each of these variables as an outcome but did not find any significant reform effects on them. Note that by including these variables, and all other individual-level variables, we control for potential compositional differences in the population before and after the reform.