• Keine Ergebnisse gefunden

A.3 Supplementary information

3.4 Econometric methods

3.4.2 Few treated clusters

It is well-known that in difference-in-differences setups, it is crucial to control for po-tential intra-cluster dependence. In our application, there is clustering in both the state and the time dimension. Ignoring intra-cluster dependence will bias standard errors downward and lead to over-rejection rates (Bertrand, Duflo & Mullainathan, 2004). As explained in the previous section, we include a large set of time-varying covariates at the state level as well as additional time and state effects in our difference-in-differences regressions to pick up potentially differential time trends across states. This will already take out a fair amount of intra-cluster correlations, mitigating potential problems of cluster inference.

Our application is characterized by a small number of clusters, of which only one is the treated cluster. For this and similar cases, Mackinnon and Webb (2017) compare the wild bootstrap, the wild cluster bootstrap, and an intermediate case called wild subcluster bootstrap. Their results suggest that for our scenario (one treated cluster, thirteen un-treated clusters), the ordinary (= individual) wild bootstrap performs best.

Mackinnon and Webb (2017) also advocate comparing restricted and unrestricted boot-strapped p-values (i.e., with and without imposing the null hypothesis) as a diagnostic test for the validity of p-values. If the two coincide, this can be taken as an indication for their validity. Following this procedure, we found the ordinary (= individual) wild bootstrap to be the most adequate for our application. We, therefore, report p-values and confidence intervals based on the ordinary wild bootstrap (unrestricted version) throughout our results (given the chosen procedure, the results using the restricted ver-sion are similar and available on request).

Chapter 3. 3.5. Data

3.5 Data

The data for this chapter were provided by the Centre for Higher Education Research and Science Studies (DZHW), see Baillet et al. (2017, 2019). The DZWH starts a new rep-resentative survey of German university graduates every four years. The survey includes rich information on parental background, the individual’s higher education entrance qualification, choices during university study, and labor market entry. The main target population are all higher education graduates from institutions that are approved by the state. This includes universities as well as applied universities and similar institutions.

The sample was drawn at the level of the institution using a stratified cluster sampling (Baillet et al.,2017). For our analysis, we use the cohorts 2005 and 2013. This provides a clear separation into four groups of university graduates who completed their final high school years in either the pre- or the post-reform period and in either the reform state Baden-W¨urttemberg or other states. Table 3.1 summarizes the four groups.

Table 3.1– Categorization of DiD groups for the analysis

Group Before Treatment: After Treatment:

Control: HEEQ obtained before 2004 in control states

HEEQ obtained after 2004 in control states

Treatment: HEEQ obtained before 2004 in Baden-W¨urttemberg

HEEQ obtained in and after 2004 in Baden-W¨urttemberg

Note: The table specifies the four categories needed for the empirical analysis. HEEQ: Higher edu-cation entrance qualifiedu-cation, i.e. high school graduation (Abitur). Note that the year of HEEQ is not necessarily the start enrollment at university. Cohort 2005 includes only individuals with HEEQ years between 1997 and 2001, cohort 2013 only from 2005 and 2009.

Note that each cohort includes students with different HEEQ years as study durations differ, and as students do not necessarily start their studies immediately after obtaining the higher education entrance qualification. The HEEQ year represents the year in which the higher education entrance qualification was obtained, not the year in which the person enrolled in tertiary education. In our analysis, we exclude individuals with a HEEQ obtained before 1997 and after 2001 for the 2005 cohort, and before 2005 and after 2009 for the 2013 cohort in order to drop unrepresentative long- and short-term students.

Chapter 3. 3.5. Data In this way, we also exclude high school graduates who might have experienced an announcement effect as well as the year 2004 in which only theGymnasiumimplemented the reform but not certain other institutions that may also grant a higher education entrance qualification (Fachschulen).

Table 3.2 shows some basic sample information by gender. The two cohorts have ap-proximately the same size. The individual-level covariates included in our difference-in-differences regressions are gender, age, parental education in four categories, parental occupation in two categories as well as state and year of the HEEQ. Table 3.2 further presents summary statistics for the degree and occupational outcome variables used in the regressions. The degree variables are dummies indicating whether or not a particular individual obtained a degree in a particular field. Labels such as ‘at least one STEM degree’ mean that we have a small number of individuals with more than one degree but count them as STEM if at least one of their degrees is in STEM. Following com-mon practice, we include into STEM all fields in science, technology, engineering, and mathematics. More precisely, our STEM category includes the sciences (biology, chem-istry, pharmacy, geosciences, physics), technology (computer science), engineering (all subfields of engineering), and mathematics. As indicated above, we also consider smaller subsets of STEM fields: mathematics and natural sciences (MatNat) and engineering and computer sciences (EngComp).

For occupational outcomes, our data include the KldB occupation code (German classi-fication of occupations). For 2005, this is the KldB 1992, whereas for the other cohorts it is the KldB 2010. The German Federal Employment Agency provides a categorization into STEM and non-STEM occupations, but only for the KldB 2010 (Bundesagentur f¨ur Arbeit, 2019). For the KldB 1992 codes, we followed a translation from KldB 1992 to KldB 2010. This left us with a small number of cases for which it was not possible to assign a clear STEM or non-STEM status based on the 2010 STEM classification (because these occupations were more or less specific in the KldB 1992 classification than in the KldB 2010 classification). In order to resolve these cases, we employed a specific algorithm, the details of which are available on request.

Chapter 3. 3.5. Data Table 3.2– Descriptive statistics

Males Females

Variables Mean SD Mean SD

DiD

Treated individuals 0.053 0.223 0.059 0.235

HEEQ after treatment 0.414 0.493 0.446 0.497

HEEQ in treatment state Baden-W¨urttemberg 0.144 0.351 0.138 0.345

Age and Parents

Age 26.545 1.796 25.919 1.701

Highest parental education: Other 0.014 0.116 0.011 0.104

Highest parental education: Vocational training 0.357 0.479 0.355 0.478

Highest parental education: HS Diploma 0.049 0.216 0.050 0.218

Highest parental education: PhD, Uni & AU 0.581 0.493 0.584 0.493

Highest parental occupation status: White collar 0.944 0.230 0.951 0.216

Highest parental occupation status: Blue collar and other 0.056 0.230 0.049 0.216 State variables, year of HEEQ

Non working population per capita 0.047 0.023 0.047 0.024

Labor force participation per capita 0.502 0.025 0.503 0.025

Unemployment rate by gender 9.187 3.569 9.969 4.563

GDP per capita 27.485 6.633 27.847 6.948

Share of producing sector 0.087 0.029 0.086 0.030

Share of manufacturing sector 0.199 0.045 0.195 0.045

R&D per capita 0.092 0.045 0.096 0.048

Exports per capita 7.055 3.511 7.261 3.699

Imports per capita 6.619 4.321 6.912 4.589

Density of universities 2.067 0.454 2.071 0.473

Density of applied universities 3.501 1.092 3.578 1.159

Year 2002.094 4.374 2002.712 4.373

Mediators

Finale grade of HEEQ 2.248 0.602 2.155 0.594

Other path than academic HS 0.140 0.347 0.080 0.271

Employment before university 0.249 0.432 0.246 0.430

Vocational training before university 0.145 0.352 0.114 0.318

Applied university 0.298 0.458 0.205 0.404

Degree type: teaching profession 0.047 0.212 0.146 0.353

Outcomes

At least one degree in STEM 0.554 0.497 0.263 0.440

At least one degree in MatNat 0.130 0.336 0.145 0.352

At least one degree in EngComp 0.425 0.494 0.119 0.324

Current or last occupation in STEM 0.427 0.486 0.157 0.351

Current or last occupation in MatNat 0.025 0.151 0.028 0.159

Current or last occupation in EngComp 0.401 0.482 0.129 0.321

Note: HEEQ: Higher education entrance qualification. The two German states, Sachsen-Anhalt and Mecklenburg, are not included because they had a different reform during the period of interest. For all variables and the degree outcomes, we have 5199 male and 7652 female observations. For the occupation outcomes, we have 3664 male and 5470 female observations. For the regressions using the occupations as outcomes, we merge the state variables to the year of the degree. The German occupation classificationKldB is used for classifying individuals into different fields of occupation. Information on the states of the respective HEEQs are included in the appendix.

In order to control for potential time-varying differences between federal states and in order to minimize remaining intra-cluster correlation, we also include a set of state- and time-specific variables, as shown in Table 3.2. All variables are measured at the state

Chapter 3. 3.6. Empirical results level. They are merged to the year of the HEEQ for the degree regressions and to the year of the degree for the occupation regressions.

In the last step, we include variables whose realization was after the reform and which may, therefore, have been mediators of reform effects. As these variables might have been affected by the reform, their inclusion should proceed with caution. However, we also ran our difference-in-differences regressions taking each of these variables as an outcome but did not find any significant reform effects on them. Note that by including these variables, and all other individual-level variables, we control for potential compositional differences in the population before and after the reform.

3.6 Empirical results