• Keine Ergebnisse gefunden

Scaling methodology and scale reporting in the TREE2 panel survey. Documentation of scales implemented in the baseline survey (2016)

N/A
N/A
Protected

Academic year: 2022

Aktie "Scaling methodology and scale reporting in the TREE2 panel survey. Documentation of scales implemented in the baseline survey (2016)"

Copied!
230
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

source: https://doi.org/10.48350/152055 | downloaded: 31.1.2022

S CALING METHODOLOGY AND SCALE

REPORTING IN THE TREE2 PANEL SURVEY

D OCUMENTATION OF SCALES IMPLEMENTED IN THE BASELINE SURVEY (2016)

S TEFAN S ACCHI

D OMINIQUE K REBS -O ESCH

Transitionen von der Erstausbildung ins Erwerbsleben Transitions de l’Ecole à l‘Emploi

Transitions from Education to Employment

(2)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

Imprint

Published by TREE (Transitions from Education to Employment).

University of Bern Fabrikstr. 8

3012 Bern/Switzerland www.tree.unibe.ch tree@soz.unibe.ch

Suggested citation

Sacchi, Stefan, Krebs-Oesch, Dominique (2021). Scaling methodology and scale re-

porting in the TREE2 panel survey. Documentation of scales implemented in

the baseline survey (2016). Bern: TREE, University of Bern. doi: 10.48350/152055

(3)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

Abstract

This paper outlines the methods and the estimation procedures that we have adopted

for the calculation of the student scores in the database of the second TREE cohort

(TREE2). In addition, we describe the calculation and the reporting of scale-specific

statistics and quality measures given in the technical appendix and provide some

clues for their interpretation. The appendix covers all questionnaire-based scales and

item-based composites that have been administered in the baseline survey of TREE2

in 2016.

(4)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

Table of Contents

Some practical guidelines for using the scales

...

1

Introduction

...

2

1 Survey Design and Database

...

4

2 Selection and Adaptation of Scales

...

8

3 Statistical Modelling

...

14

3.1 Estimation of the confirmatory factor models

...

15

3.1.1 Two-step estimation based on polychoric inter-item correlations

...

16

3.1.2 Generalised structural equation model for short response scales

...

19

3.2 Student scores

...

20

3.2.1 Calculation and robustness of student scores

...

20

3.2.2 Inclusion of student scores in multivariate statistical models

...

22

4 Scale-specific reporting: content and interpretation

...

23

References

...

31

Scale Appendix

...

34

Table of Figures Figure 1: Design of the TREE2 baseline survey

...

5

Figure 2: One-dimensional confirmatory factor model

...

14

Figure 3: Example of the reported scale-specific results (initial results page)

...

24

Figure 4: Example of the reported scale-specific results (second results page)

...

27

Table of Tables

(5)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

1

Some practical guidelines for using the scales

For each scale, the technical appendix of this documentation provides a selection of relevant statistics and measures. Section 4 of the introductory text describes the type and calculation of the reported measures and gives some clues as to their interpreta- tion. It is of course up to the data users to decide whether a scale shows the meas- urement properties required for their analysis.

The reported scale-specific measures focus primarily on reliability (in the sense of internal consistency) and measurement invariance across survey settings, modes and languages. What we do not address in this documentation is scale validity, as TREE mostly uses commonly accepted, well-established scales and validity is therefore not likely to be a major problem. In addition, the database offers researchers many op- portunities to conduct external validations tailored to their specific analytical needs.

In some cases, several scales in the TREE2 scientific use file partly draw on one and the same items. The scales in question should therefore not be used simultaneously within the same multivariate model. This concerns some scales for which several ver- sions exist (cf. section 2: scales sourrounded by dotted lines in Table 3) as well as other scales composed of main and subdimensions (cf. section 2, Table 4).

Regarding the use of student scores in the context of multivariate models, we refer the reader to the remarks on this issue in section 3.2.2. Some scores represent item composites rather than scale scores (cf. Table 5), which may, however, be used simi- larly. The variable names and labels of all items, student scores and composite vari- ables in the technical appendix correspond with those in the scientific use file for the second TREE cohort (short variable names without wave-specific prefix).

When estimating the confirmatory factor models and calculating the student scores,

we imputed all missing item information, provided that at least one item of a given

scale had a valid rating (see section 3.1.1b for details).

(6)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

Introduction

This paper documents the questionnaire-based scales and item-based composites that have been collected on the occasion of the baseline survey administered to the second TREE cohort (TREE2) in 2016. First, the paper focuses on the methods and the estimation procedures that we have adopted for the calculation of the scale values published in the scientific use data files. Second, we describe the calculation of the scale-specific key figures and quality parameters (see appended tables) and provide some useful information for their interpretation.

The TREE2 baseline survey is composed of two surveys carried out at a short interval in spring/summer 2016. The first survey is a large-scale national assessment of math- ematics skills administered to students who had reached the end of compulsory school (Assessment of the Attainment of Educational Standards, henceforth AES).1 Beyond the assessment itself, the AES survey programme included a comprehensive student background questionnaire that collected a wide range of student background characteristics presumed to influence maths skills development and/or educational and labour-market pathways in the further (post-compulsory) life course. The second survey, which we refer to as extension survey, was conducted shortly after the first one. Its main purpose was to complete some student background characteristics that had not been collected among all respondents of the first survey. In doing so, TREE was able to substantially extend the size of the TREE2 starting cohort (see section 1 for details).

All parts of the AES student questionnaire include numerous item-based measures designed to capture latent (i.e., not directly observable) respondent, family or con- text characteristics. Instrument selection was largely restricted to instruments vali- dated by previous research in the relevant research fields (see section 2 for details).

The documentation of scales pertaining to the AES survey has been previously pub-

lished along with the AES data in 2017 (Sacchi & Oesch, 2017).2 The present docu-

(7)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

3

setting supervised by carefully instructed test administrators; the extension survey, by contrast, took place in an unproctored individual setting outside of school. Fur- thermore, the latter employed two sequentially applied survey modes (web survey and paper-and-pencil questionnaire). With regard to scaling, this incongruence re- quires that we have to carefully check for measurement invariance across survey set- tings and modes. Consequently, this documentation includes a number of relevant invariance tests and parameters for all scales that are based on data from the exten- sion survey.

Beyond psychometric scales stricto sensu, this documentation also includes a num- ber of item sum scores based on two or more single items. However, we have not included scores of test results and other types of composite variables.

3

For all scales and composites drawing exclusively on data of the AES assessment sur- vey, we report the previously calculated parameters (Sacchi & Oesch 2017) in the technical appendix of this documentation. In doing so, we provide TREE2 data users with an overview of all scales and composite variables available in the TREE2 baseline survey in one single document (see particularly section 2). The introductory text de- scribing the methods of calculation and estimation used and the parameters reported in the technical appendix largely corresponds to the 2017 AES documentation (ibid.).

For each of the scales, we report estimates (i.e., scores) of the individual scale values for all participating students. In addition, our documentation aims at enabling data users to assess the scales’ quality and measurement invariance (cf. particularly the technical appendix). Last but not least, our documentation ought to allow scholars to replicate, if they wish to do so, the calculation of models, tests and scale parame- ters and compare them with alternative specifications.

In the following sections, we first specify some relevant aspects of the TREE2 baseline survey’s design (1), the selection and adaptation of the scales (2) as well as the statis- tical modelling and calculation of the scale values (3). Finally, we specify how the scale-specific results, reliability and quality checks were calculated and give some information on how to interpret them (4).

3 As for the scales, the extension survey considerably enlarges the database on which these scores rely.

(8)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

1 Survey Design and Database

The data of the AES survey were collected by means of a computer-based classroom survey among a random sample of approximately 22,000 students who were in their last year of lower secondary education (i.e., the 11th year

4

of compulsory schooling).

5

The survey included a comprehensive test of basic mathematical skills, along with a computer-assisted self-interview (CASI) of approximately 45 minutes. Among other things, the student questionnaire covered a broad selection of psychometric and other item-based measures, which are the subject of this documentation.

AES implemented a modular design with two different versions of the questionnaire, each of which were administered to a randomised split-half of the total sample.

6

The main building block of one version was the mathematics module, which mainly cov- ered student, teacher and classroom characteristics relevant to the successful acqui- sition of mathematical skills during compulsory education and to related didactical and pedagogical research. The core of the second version was a student background module co-designed by TREE to collect information on a broad range of resources of the surveyed students, their families and the schools they were attending at the mo- ment of the survey. This module was specifically developed for the TREE2 panel sur- vey in order to measure, as comprehensively as possible, the starting conditions deemed to be relevant for the respondents’ further education and labour-market ca- reers and their life courses in general. Both questionnaire versions included a com- mon core (“general questions”) that was completed by all students participating in AES. The common core incorporated items that are of general interest for the re- search objectives of both modules.

Due to the modular design of the AES questionnaire, a substantial part of the ques-

tionnaire pertaining to TREE-relevant starting conditions of post-compulsory path-

ways was administered to only half of the AES sample (see Figure 1). In order to com-

plete the missing items for the respondents to the other half (termed “maths sample

(9)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

5

was equivalent to that of the background module in the AES survey, which was im- plemented in two “standalone” versions, either in the form of a web or a paper-and- pencil questionnaire. The minor adaptations of the questionnaire under these changed setting and mode conditions included slightly modifying the order of in- struments and adding a newly designed scale that had not been administered in the AES survey.

7

Apart from that, the web implementation was largely indistinguishable from the CASI instrument used by the AES.

8

Figure 1: Design of the TREE2 baseline survey

May–June 2016xxxxxxxxxxx

June–August 2016

9

AESxxxxxx Extension survey

Maths test Student questionnaire Student questionnaire

Maths sample split

(≈ 50%)

As se ss m en t o f m ath s s kills Gene ral qu es tio nn ai re

(Non-response)

Maths

module Background

module

Background sample split

(≈ 50%)

Background

module

Not administered

In every canton, the extension survey was carried out as soon as the AES survey had been concluded in all sampled schools.

9

The web survey was implemented as the primary mode. Students who did not participate in the web survey received the ques- tionnaire’s paper-and-pencil version by mail as a secondary mode. As both survey

7 Two additional elements were placed at the end of the questionnaire: a brief cognitive skills test (KFT 4–12 + R; Heller & Perleth, 2000) as well as an experimentally varied repeated measurement of parental education.

8 To maximise comparability with the AES CASI (and contrary to the web surveys in later TREE2 waves), the web mode was not adapted for smartphones (and respondents were asked to complete it on a computer).

9 The median lag between the AES and extension survey was 29 days. 98 % of respondents completed the questionnaire between June and August, with a few pencil-and-paper questionnaires being returned up to the end of October.

(10)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

modes are self-administered, they are well suited for the partly sensitive question- naire items included in the extension survey. With this mixed-mode design, the ex- tension survey achieved a total response rate of almost 75% (73.3% if we consider only complete questionnaires; see also Table 1). Taking the relevant methodological literature into consideration, we do not expect significant mode effects (de Leeuw &

Hox, 2011; de Leeuw, 2018; for proctored surveys see also Colosante et al., 2019).

As Table 1 illustrates, the extension survey enabled us to substantially enlarge the available initial TREE2 sample base with a comprehensive measurement of relevant starting conditions. Among other things, this also allows for a more precise estima- tion of the scaling models and parameters that are at the centre of this documenta- tion.

10

In light of the sample structure displayed in Table 1, it is important to address the issue of measurement invariance across the various survey settings and modes.

That is why this documentation also provides statistical tests and quality measures that are relevant to this end (see section 4 and the technical appendix). The estima- tion of setting effects thereby draws exclusively on the CASI and the web survey, which rely on virtually interchangeable survey modes (i.e., it excludes the paper and pencil questionnaires, n = 15 608). And the estimation of mode effects draws exclu- sively on the extension survey (i.e. it excludes the classroom setting, n = 5 119). In doing so, we avoid the risk that the estimations of mode and setting effects are mu- tually confounded.

Table 1: Sample size and structure of the TREE2 baseline survey

AES Extension survey 1) Total

Survey Setting: Proctored classroom survey Unproctored individualised setting

Survey Mode: CASI Web survey P&P questionnaire

(Sub-)sample size 2) 11124 3) 4484 635 16243

1) Including 89 incomplete questionnaires (with data for some scales only), which are treated as nonresponses when it comes to response statistics and the published sample weights (see also FN 10). 2) The number of cases for particular scales will generally be lower due to non-imputable missing values. 3) Background sample split (cf. Figure 1).

(11)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

7

These considerations do not affect the calculation of any of the scales administered in the general questionnaire and the AES maths module, as these scales do not rely on the extension survey. For calculations based on the general questionnaire, we can draw on data of the complete AES sample (approx. 22 000 students) and, for calcula- tions based on the AES maths module, on the subsample to which the maths module was administered (approx. 11 000 students; cf. Figure 1). To ensure a statistically effi- cient estimate, the scaling models generally draw on the entire available sample base, including cases which, for various reasons, are not included in the scientific use files of the TREE2 dataset (Hupka-Brunner et al. 2021).

11

Table 2: Breakdown of estimation samples by survey languages

Scales implemented in … General questionnaire Background module Math module

Available Estimation Sample 2) Full AES sample Baseline survey 2) Math subsample Survey Language:

German 16349 11698 8106

French 5235 3927 2646

Italian 755 618 379

1) Number of cases for specific scales will in general be lower due to non-imputable missing values. 2) Cf. Table 1.

In a survey administered in several languages, we also have to be careful regarding measurement invariance across survey languages (in our case German, French and Italian), which concerns all scales administered.

12

Basically, variance across lan- guages can be the result of ‘real’ cultural or linguistic differences between language regions but also of inaccurate translations. That is why we report language-specific invariance tests and parameters (section 4 and appendix). As Table 2 reveals, sample size substantially varies across survey languages.

11 Data users who wish to estimate or replicate scaling models drawing on the complete database may do so.

As the data excluded from the published data files are highly confidential, however, this is possible only on the premises of the study’s headquarter in Bern and using a specially protected computer workplace.

12 In the AES, the survey language is identical with the teaching language of the sampled schools. In the exten- sion survey, respondents were able to choose the survey language. In a few cases, this led to the situation that the extension survey was not completed in the same (national) language as the AES survey.

(12)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

2 Selection and Adaptation of Scales

The AES questionnaire incorporated a broad range of more than 90 item-based in- struments from relevant research areas (for theoretical considerations regarding the selection of instruments, see Hupka-Brunner et al. [2015] and Hascher et al. [2015]).

As a general rule, preference was given to well-established, cross-disciplinary vali- dated instruments used in surveys both in Switzerland and abroad.

A first selection of instruments was thoroughly pretested in the year preceding the main survey (2015).

13

One important objective of the pretest was to assess measure- ment properties of the preliminary selection of questionnaire instruments and scales in the Swiss context. This included assessments of the dimensionality, reliability and the cross-language measurement invariance of the scales. Some of the scales had to be newly translated to make them available in all survey languages. In these cases, the pretest was used to check measurement invariance across language versions and to improve improper translations. Moreover, the pretest was used to clean up scales with dodgy items, to shorten others and, lastly, to narrow down and optimise the selection of instruments for the main survey. We shortened many scales to three or four items to ensure a comprehensive coverage of relevant concepts without unduly increasing response burden and interview duration.

Wherever possible, the original instruments were implemented without modification in order to preserve measurement properties of the selected scales and to maximise data comparability. However, given the multitude of aspects to be considered in questionnaire construction (Dillman, Smyth & Christian, 2014), slight adaptations of the original instruments often could not be avoided.

14

13 The main objective of the pretest was to improve the assessment of mathematical skills, the design of the student questionnaire and the fieldwork for the main survey. The pretest sample was split evenly across the three test languages, German, French and Italian, and included more than 2000 students from 70 schools.

(13)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

9

Table 3 conveys a topically ordered overview of all scales and item-based instruments that were implemented in the AES main field. The ‘Positive Attitude towards Life’

scale was administered in the extension survey only. In a few cases, several scales partly rely on the same items. Consequently, they should not be introduced in one and the same multivariate model. Apart from scales involving main and sub-dimen- sions, the scales in question are framed by a dotted line in Table 3.

To enable comparative analyses between TREE1 and TREE2, the range of imple- mented instruments also includes some original scales used in the PISA 2000 survey, the baseline survey of the first TREE cohort (TREE1). For some of these scales (family wealth, social and cultural communication within the family), we implemented both the original version already used in PISA 2000 and an adapted version that was opti- mised for TREE2. The former is preferable for comparative analyses of both cohorts, the latter for analyses of the second cohort only.

Table 3: Item-based scales and composites (without scales for subdimensions)

Survey topic

Scale / composit AES questionnaire module 1) Source 2)

Family background Family climate

Emotional closeness to parents Background module TREE1 - based on Szydlik, 2008

Parental pressure to achieve Background module Böhm-Kasper et al., 2000

Parents' achievement expectations Math module Hascher et al., 2019

Mother's achievement expectations Math module Hascher et al., 2019

Father's achievement expectations Math module Hascher et al., 2019

Mother's social norms about mathematics Math module PISA 2012

Father's social norms about mathematics Math module PISA 2012

Family educational support (PISA2000) 3) Background module PISA 2000

Social communication (PISA2000) 3) Background module PISA 2000

Social communication (adapted TREE2) Background module PISA 2000 (adapted TREE2) Social, cultural & economic resources

Social capital (own)

Perceived social network support Background module TREE2 (BHPS, ISSP 2003)

Cultural capital (family of origin)

Parents: reading interest Background module TREE2

Cultural communication (PISA2000) 3) Background module PISA 2000

Cultural communication (adapted TREE2) Background module PISA 2000 (adapted TREE2) Household possessions: classical culture (PISA2000) 3) Background module PISA 2000 Cultural capital (own)

Embodied cultural capital Background module TREE2

Cultural activities 4) Background module PISA 2000 (partially adapted)

1)Database by module: General → full AES sample; background module → TREE2 baseline sample; math module → AES math sample split.

2) See technical appendix for a detailed list of sources. 3) Scales administered in the surveys of the first TREE cohort (TREE1). 4) A subscale of this scale has been adopted as is from PISA 2000 / TREE1 (cf. Table 4).

(14)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

Table 3 (continued): Item-bases scales and composits Survey topic

Scale or composit AES questionnaire module 1) Source 2)

Social, cultural & economic resources (continued) Economic capital (family of origin)

Household possessions: family wealth (PISA2000) 3) Background module PISA 2000 Household possessions: family wealth (adapted TREE2) Background module PISA 2000 (adapted TREE2)

Family affluence scale (FASIII) Background module Hobza et al., 2017

Satisfaction and well-being Satisfaction

Capabilities Background module Sen, 1985; Anand & van Hees, 2006

Well-being

Positive attitude towards school General questionnaire Hascher, 2004

Enjoyment in school General questionnaire Hascher, 2004

Physical complaints in school General questionnaire Hascher, 2004

Worries about school General questionnaire Hascher, 2004

Social problems in school General questionnaire Hascher, 2004

School reluctance General questionnaire Hagenauer & Hascher, 2012 (modified) Non-cognitive factors

Motivational concepts

Intrinsic achievement motivation General questionnaire IGLU 2001

Extrinsic achievement motivation General questionnaire IGLU 2001

Instrumental learning motivation (PISA2000) 3) General questionnaire PISA 2000

Interest in reading (PISA2000) 3) General questionnaire PISA 2000

ICT interest Math module ICILS 2013

Dispositional interest Math module COACTIV 2008

Identified motivation (mathematics) Math module PISA 2012

External motivation regulation Math module Ryan & Conell, 1989

Classroom participation Math module Eder, 1995, 2007

Performance-approach goals (SELLMO) Math module SELLMO 2012

Learning goal orientation (SELLMO) Math module SELLMO 2012

Work avoidance (SELLMO) Math module SELLMO 2012

Avoidance performance goals (SELLMO) Math module SELLMO 2012

Self-perception

Global self-esteem Background module Rosenberg, 1979

General perceived self-efficacy scale (GSES) Background module GSES (adapted TREE1)

Academic self-efficacy General questionnaire Hascher, 2004

(15)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

11 Table 3 (continued): Item-bases scales and composits Survey topic

Scale or composit AES questionnaire module 1) Source 2)

Non-cognitive factors (continued) Emotions related to maths classes

Mathematics anxiety Math module PISA 2012

Mathematics boredom Math module AEQ-M (short-version)

Mathematics anger Math module AEQ-M (short-version)

Mathematics enjoyment Math module AEQ-M (short-version)

Volitional strategies

Perseverance General questionnaire PISA 2012

Effort: learning (PISA2000) 3) Background module PISA2000

Personality characteristics

Big five: extraversion Background module Rammstedt et al., 2014

Big five: agreeableness Background module Rammstedt et al., 2014

Big five: conscientiousness Background module Rammstedt et al., 2014

Big five: neuroticism Background module Rammstedt et al., 2014

Big five: openness Background module Rammstedt et al., 2014

Internal locus of control Background module GESIS (short-version)

External locus of control Background module GESIS (short-version)

Values & attitudes

Work-related extrinsic value Background module TREE1 - based on Watermann, 2000 Work-related intrinsic value Background module TREE1 - based on Watermann, 2000

Family value Background module TREE1

Positive attitude towards life (AES Extension survey) TREE1; Grob et al., 1991 Attitudes related to mathematics classes

Reality-based learning Math module Girnat, 2015, 2017

Discovery / exploratory learning Math module Girnat, 2015, 2017

Social learning Math module Girnat, 2015, 2017

Instructivist learning Math module Girnat, 2015, 2017

System aspect Math module Girnat, 2015, 2017

Scheme aspect Math module Girnat, 2015, 2017

Application aspect Math module Girnat, 2015, 2017

Education and training

Characteristics of maths lessons (end of lower secondary education)

Teacher: cognitive activation Math module

Teacher: classroom management Math module

Teacher: individual learning support Math module

Teacher: instruction quality Math module

Situational interest Math module

Perceived autonomy support Math module

Perceived competence support Math module

Perceived social relatedness Math module

Classmates' appreciation of mathematics Math module Absenteeism / intention to change education

Absenteeism / truancy 3) General questionnaire

COACTIV 2008 COACTIV 2008 COACTIV 2008 PISA 2006 COACTIV 2008 Seidel, Prenzel & Kobarg, 2005 Seidel, Prenzel & Kobarg, 2005 Seidel, Prenzel & Kobarg, 2005

PISA 2012

1)Database by module: General → full AES sample; background module → TREE2 baseline sample; math module → AES math sample split.

2) See technical appendix for a detailed list of sources. 3) Scales administered in the surveys of the first TREE cohort (TREE1).

PISA2000, PISA 2012

(16)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

In principle, all scales listed in Table 3 are one-dimensional, that is, they have been designed to measure one theoretical construct or latent dimension each.

15

However, some of the scales are composed of several sub-dimensions, each representing a facet of one overarching construct. As researchers may wish to distinguish between the sub-dimensions of these scales, the scientific use files of TREE2 also include student scores for each sub-dimension. The following table lists both the main and sub-di- mensions of the scales in question.

Table 4 Scales with sub-dimensions

Scale – main dimension Variable name 1) Subdimensions Variable name 1)

Background module scales

Global self-esteem 2) [sel_fs] Positive global self-esteem 3) [sele_fs]

Negative global self-esteem / depression 3) [seld_fs]

Embodied cultural capital [inccap_fs] Embodied cultural capital: manners [manners_fs]

Embodied cultural capital: verbal skills [verbskill_fs]

Cultural activities [cult_fs] "Lowbrow" cultural activities [cultlow_fs]

"Highbrow" cultural activities (PISA2000) 4) [culthigh_fs]

Math module scales

Parents’ achievement expectations [expectp_fs] Mother's achievement expectations [expectm_fs]

Father's achievement expectations [expectf_fs]

Instructivist learning [instreplearn_fs] Instructivist learning: teachers instructions [instrlearn_fs]

Instructivist learning: repetitive practice [replearn_fs]

Social learning [soccomlearn_fs] Social learning: social arrangement [soclearn_fs]

Social learning: communication [comlearn_fs]

System aspect [sysformasp_fs] System aspect: logical thinking [systasp_fs]

System aspect: formalism [formasp_fs]

Teacher: cognitive activation 5) [cogself_fs] Cogn. activation: finding solutions & arguing [cogself1_fs]

Cogn. activation: strategies and learning from mistakes [cogself2_fs]

1) The short names of the student score variables in the TREE2 scientific use file are given in brackets. 2) In accordance with Huang et al. (2012) and Donnellan et al. (2016), this scale is clearly two-dimensional in the TREE2 baseline survey. 3) Sub-dimension labels according to Huang et al.

(2012). 4) Corresponds to ‘Cultactv’ scale in PISA 2000/TREE1. 5) As this scale is not one-dimensional in the AES survey, we distinguish two (inductively optimised) sub-dimensions.

(17)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

13

Some of the instruments described in this documentation are based on two items only, making it impossible to fit any scaling model to the data. Henceforward, we call scores derived from mostly short, item-based instruments item-based composites (for an overview see Table 5).

16

In case of the ‘Family affluence scale’ in Table 5, the term

«scale» is a misnomer as it represents de facto a sum score, i.e. an item-based com- posite (for details, see Hobca et al., 2017).

17

Table 5: Item-based composites

Concept1)

Dimension Variable name2) Number of items

Big Five Inventory

Extraversion [big5_e_comp]

Agreeableness [big5_a_comp]

Conscientiousness [big5_c_comp]

Neuroticism [big5_n_comp]

Openness [big5_o_comp]

2 3 3) 2 2 2 Locus of control

Internal locus of control [loci_comp]

External locus of control [loce_comp]

2 2

Effort: learning (PISA2000) 4) [effper_comp] 2

Family values [vafa_comp] 2

Parents: reading interest [joyreadp_comp] 2

Emotional closeness to parents [closep_comp] 2

Family affluence scale (FASIII) FN17 [fasIII_comp] 6

1) With the exception of ‘Effort: learning’ (general questionnaire, full sample), all composites belong to the background module. 2) The short variable names of the composite scores in the scientific use file are reported in brackets. 3) For the composite with one extra item, see Rammstedt and John (2007: 210). 4) This composite has been previously administered in the surveys of the first TREE cohort (TREE1).

16 For item composits, student scores are calculated from imputed item ratings (cf. 3.1.1 b).

17 Note that this composite partly draws on the same items as the wealth scales in Table 3.

(18)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

3 Statistical Modelling

As mentioned above, the scales in the AES questionnaire are item-based instruments intended to measure one theoretical construct each. Confirmatory factor analysis (CFA) is a common approach to the empirical estimation of latent (i.e., not directly observable) characteristics captured by such measurement instruments (see, e.g., Long, 1983; Schmitt, 2011). As our selection of scales is restricted to validated instru- ments that were designed to measure a common latent dimension, we limit ourselves to fitting a straightforward one-dimensional CFA model (see Figure 2 and Aichholzer, 2017: 80–84) to each scale-specific item set. The CFA model illustrated in Figure 2 relies on n items (i

1

, i

2

, …, i

n

) with associated item-level measurement errors ε

n

, which all measure the same latent dimension ξ. For scales with several subdimensions (see Table 4 above), a separate CFA model is fitted to each subdimension.

18

Figure 2: One-dimensional confirmatory factor model

For every model estimated hereafter, selected model parameters, fit statistics and scale quality measures are reported in the technical appendix (p. 34ff.). This includes a test of one-dimensionality, various measures of internal scale consistency as well

i

1

i

2

i

n

ξ 

ε

1

ε

2 ... ...

ε

n

(19)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

15

dimensional) models. It is up to the data user to judge whether the one-dimensional CFA models are appropriate and whether the scales have the required properties.

3.1 Estimation of the confirmatory factor models

In its standard form, structural equation modelling - including CFA as a special case - relies on a number of quite restrictive assumptions that are hardly ever met in practice. Basically, the observations should be independent, and the indicators should be measured on a continuous scale (interval-level measurement) and follow a multi-normal distribution (see, e.g., Hoyle, 2000). As regards the database of the AES and the TREE2 baseline survey, none of these assumptions holds: The two-stage sampling procedure implies that observations are clustered within schools (see Ver- ner & Helbling, 2019) and hence are not independent. Moreover, measurement of the indicators is at ordinal (or binary) level as it mostly relies on Likert-type rating scales.

And last but not least, the skewed univariate distributions of many ratings are hardly consistent with the required multivariate normality.

The methodological literature offers a wide range of suggestions on how to relax some of the assumptions of the standard SEM model and how to deal with ordinal, binary or skewed indicators and clustered observations (cf., e.g., Bryant & Jöreskog, 2016).

19

In particular, the suggestions include two-stage estimation methods that ex- ploit polychoric correlations and generalised structural equation models (GSEM) that are suited for short response scales and categorical indicators (Rhemtulla, Brosseau- Liard & Savalei, 2012; Bryant & Jöreskog, 2016). However, there is currently no well- established, generally accepted estimation approach tailored to both ordinal indica- tors that are not normally distributed and a complex sample with clustered observa- tions.

We therefore follow the recommendations of Rhemtulla et al. (2012; similarly Harpe, 2015: 843) regarding the accurate estimation of CFA models on the basis of ordinal, Likert-type indicators. They suggest two different estimation strategies depending on the length of the rating scales. For item responses that rely on a rating scale with at least five points (i.e., ordered discrete response categories), they suggest a two- step estimation based on polychoric correlations. For item evaluations that rely on shorter rating scales with four or less points, a generalised structural equation model

19 Clustered observations may not only affect variance estimation and model fit but also bias the estimation of model parameters (i.e., factor loadings; cf. Stochl et al., 2016; Muthén & Satorra, 1995; Wu & Kwok, 2012).

(20)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

(GSEM) is in order. Below, we describe these estimation strategies in more detail.

20

As our primary goal is to estimate accurate student scores, we also implement some sensitivity checks to assess the equivalence of student scores obtained via alternative model-estimation strategies (see section 3.2.1).

3.1.1 Two-step estimation based on polychoric inter-item correlations

The two-step approach starts with the estimation of a matrix of polychoric correla- tions between all items of a given scale (tetrachoric correlations, respectively, in the case of dichotomous items).

21

In the second step, maximum likelihood estimation is used to fit the one-dimensional CFA model from Figure 2 to the resulting correlation matrix.

22

The models are identified by setting the loading of the first item and the variance of the latent factor to one. The CFA models are also estimated separately for each of the three language subsamples. This allows for multi-group analysis designed to test and assess measurement invariance across the survey languages (see section 4 and, e.g., Steinmetz et al., 2008; Milfont & Fischer, 2015).

Below, we briefly describe how we deal with (a) the complex AES sample and (b) with missing item values in the context of the two-step estimation approach.

(a) Complex sample design and survey weighting

The AES survey relies on a random sample of students that was disproportionally

stratified by cantons and type of cantonal curriculum (Verner & Helbling, 2019).

23

Furthermore, the samples analysed here are also affected by sample attrition. An un-

biased estimation of any population characteristic therefore requires the application

of an appropriate survey weight to account for the disproportional sampling design

as well as for unit nonresponse. This also pertains to the estimation of polychoric

correlations or the parameters of the CFA models to be estimated (e.g., factor load-

ings).

24

(21)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

17

When estimating the polychoric correlations, we therefore use one out of three dif- ferent survey weights, depending on whether a given scale is embedded in the back- ground module, in the maths module or in the general questionnaire. For the scales from the latter two, we rely on the suitable AES weights.

25

With regard to AES, mod- ule-specific analyses require particular weights, as the sampling design of the ran- domised sample split for the distinct questionnaire modules (according to Figure 1) differs with respect to the shape of disproportional cantonal stratification.

26

On the basis of the module-specific AES weights, we have constructed an additional weight for the TREE2 baseline survey, which accounts not only for the AES sampling design and nonresponse but also for sample attrition in the extension survey.

27

As regards the two-step estimation approach, it should be noted that variance esti- mation does not account for the clustering of observations within schools implied in the two-stage sampling (see Verner & Helbling, 2019).

(b) Handling of missing item values

Missing item values are not a major problem affecting the scales in the AES survey.

As usual in surveys, however, there is a small share of missing item values, owing mainly to item non-response. With the exceptions mentioned below, the share of cases with missing information on at least one item of the scale does not exceed 5%.

For two out of three scales, the percentage is below 1%.

A considerably higher share of missing values results for half of the items of each of the four scales that measure different facets of ‘specific self-efficacy’ in mathematics.

This is a direct consequence of the questionnaire design (and therefore not a matter

weights would be limited to inflating the variances of the estimates to some degree (Bollen, Tueller & Ober- ski, 2013). Given the huge AES sample, this would not be too disturbing.

25 We use the respective non-response adjusted weights from the AES scientific use file ('smp_w_nrastubw' for the scales of the general questionnaire and 'smp_w_qmath' for the scales of the maths module).

26 The reason is that the design of the two complementary sample splits has been optimised for two different purposes: The sample split drawn for the background module is designed to maximise statistical power at the national level, whereas the maths module split is optimised for separate analyses of cantons. In a nut- shell, this was achieved by developing a disproportional subsampling scheme that further reinforces the general overrepresentation of small cantons among the sample split with the maths module and reduces it among the sample split with the background module. The weights for the sample splits then correspond to the general survey weight from the AES scientific use file ('smp_w_nrastubw') multiplied by the inverse of the within-canton subsampling fraction (see also Verner & Helbling, 2019).

27 For the baseline survey, we use an entropy-balancing weight (cf. Hainmueller, 2012; Hainmueller & Xu, 2013) that compensates for the AES disproportionate sampling design (incl. nonrespons adjustments) and, as far as the math-sample split is concerned, for the non-response related to willingness to be (re-)contacted and to participate in the extension survey (for details, see the TREE2 documentation on weighting: Sacchi, forth- coming). For the purpose of scaling, the e-balancing weight for the TREE2 baseline survey was re-estimated by taking into account the somewhat looser definition of survey participation employed throughout the scaling process (see Table 1 and the explanatory text).

(22)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

of methodological concern

28

), as half of the items of each of these scales were incor- porated into the general questionnaire and the other half into the maths module.

This implies that the share of missing item information is close to zero for the general questionnaire, whereas it rises to around 50% for the items implemented in the maths module.

A relatively high share of missing values is also observed for two measures in which students evaluate the items on a rating scale that includes an explicit ‘don’t know’

option. This pertains to the scale measuring ‘perceived social network support’ (clo- supp_fs) and the two-item composite for parents reading interests (joyreadp_comp).

For both instruments, the share of missing information rises to 10.4 and 8.7%, respec- tively, when explicit don’t-know answers are included.

29

Finally, there are four instruments containing some items that could not be admin- istered to a minor portion of the sample.

30

With one exception, the overall share of cases with at least one missing item does not exceed 5% in these instances.

31

These special cases and exceptions notwithstanding, the fraction of missing items is low to very low for the bulk of the scales. Hence, the impact of missing item infor- mation is presumably limited.

We applied multiple imputation to cope with missing values when estimating the scaling models (Rubin, 1996; White, Royston & Wood, 2011). Basically, missing item information was imputed - scale-by-scale - on the basis of all valid items pertaining to the same scale. The imputed samples thus cover all cases with a valid response for at least one of the items of a given scale. Given the ordinal measurement level of the item ratings, we applied chained equations with an ordinal (or, in a few cases, binary) logit link to create samples with imputed values (Royston, 2011). Following the rules of thumb given in White et al. (2011: 388), we set the number of imputations to five.

32

28 The randomised allocation of students to questionnaire modules ensures that the missing-at-random as- sumption (MAR), which is crucial for the imputation of missing values, is almost perfectly met here.

(23)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

19

For each imputed dataset, we separately calculated a matrix of polychoric correla- tions and combined it to estimate the CFA models.

33

For each scale-specific CFA model, we calculated statistics and indices describing factor structures, model-fit and scale properties (see section 4 and the technical ap- pendix).

3.1.2 Generalised structural equation model for short response scales

If scales rely on item evaluations with short response scales of four or less points (including binary items), they were analysed using a generalised structural equation model (GSEM), as recommended in the literature (Rhemtulla, Brosseau-Liard & Sava- lei, 2012; Bryant & Jöreskog, 2016). Model parameter estimates were derived in one step directly from the microdata through numeric integration.

34

Contrary to the two- step approach, this amounts to a full-information, true maximum likelihood method (Bryant & Jöreskog, 2016: 192). We henceforth adopted the GSEM version of a one- dimensional CFA model, mostly with an ordinal logit link to account for the ordinal measurement level of the item sets to be analysed.

35

(a) Accounting for the complex survey design

GSEM, as implemented in Stata, is able to account for complex sample designs. In particular, we used survey weights (as described in 3.1.1a) to obtain unbiased popula- tion estimates of the model parameters and applied cluster-robust variance estima- tion, which controls for the clustering of students within schools. Still, we assume that there is no substantive variation in the measurement model across schools (cf.

Wu & Kwok, 2012).

(b) Handling of missing item values

GSEM estimation proceeds on an equation-by-equation basis. In the context of a sim- ple one-dimensional CFA model, this amounts to an implicit treatment (i.e., imputa- tion) of missing item values, as each item is represented by a separate equation.

suggested by White et al. (2011: 387), indicate that the polychoric correlations and other point estimates are highly stable for an even smaller number of imputations.

33 After applying Fisher’s z-transformation, we simply average the correlation matrices and transform them back (see also footnote 31).

34 Integration mostly relies on mean–variance Gauss–Hermite quadrature with seven integration points (Stata- Corp, 2017: 562).

35 The ordinal logit link reduces to a simple logit link for the two scales that include binary items.

(24)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

One drawback of the GSEM approach is that the calculation of most established sta- tistics to describe model fit and scale properties is not straightforward. This is why we complemented the GSEM estimations for the item sets with short response scales by a separately estimated two-step model, as described in section 3.1. If the resulting factor structures and student scores do not substantially differ from those obtained via the GSEM approach, this may be taken as indirect evidence that the two-step approach works sufficiently well and its assumptions are met (in the appendix, we therefore also check for the equivalence of both types of student scores). Hence, the model and scale statistics taken from the two-step CFA model are likely to be valid approximations as well.

3.2 Student scores

3.2.1 Calculation and robustness of student scores

For instruments relying on item rating scales of 5 or more points, the student scores

in the scientific use file (and the related descriptive statistics in the appendix) repre-

sent regression factor scores (see StataCorp, 2017: 582f. for details) from the two-step

CFA models described in section 3.1.1. For scales based on item sets with short re-

sponse scales (four or less categories), the student scores in the SUF are empirical

Bayes means based on the GSEM models (ibid.: 566). The variable names assigned

to the student scores in the scientific use file are composed of a prefix indicating the

survey wave (e. g. 't0' in case of the baseline survey, 't2' for the 2

nd

follow-up wave),

the root of the variable names of the involved items and the suffix ‘_fs’, which is used

as a marker for student score variables. The corresponding suffix for the item com-

posites from Table 5 is ‘_comp’. The variable labels assigned to the student scores and

item composites correspond to those contained in the scale-specific documentation

in the appendix. For an unequivocal interpretation of the student scores in the TREE2

scientific use file, we recommend inspecting the factor loadings (see section 4). As a

(25)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

21

SEM and GSEM) as measured by the coefficient of determination (CD) (see appendix:

Equivalence of Scores from Two-Step Approach). If their shared variance is close to 100% (i.e., CD approaches 1), one may safely conclude, first, that the different mod- elling strategies have a negligible impact on student scores and, second, that it also seems reasonable to take the various fit and scale statistics obtained from two-step estimation as good approximations. As documented scale by scale in the appendix, the coefficient of determination is indeed close to 1 for most scales (> .94 for 42 out of 48 involved scales). There are six exceptions, however, in which the shared vari- ance is substantially lower (between 60 and 90%), thus indicating that some of the additional assumptions needed for the two-step model have probably been violated.

This pertains to the scales measuring absenteeism (truancy_fs), family wealth as in- dicated by home possessions (both scale versions: wealth_fs, wealth_m_fs), cultural activities including one of its subscales (cult_fs, culthigh_fs) and students’ maths self- concept (matcon_fs). For these scales, the model and scale statistics reported in the appendix should be interpreted with great caution, if at all. Still, this does not indi- cate that the student scores estimated via the GSEM approach are biased in any way.

For an additional robustness check for the student scores, we re-estimated the con- firmatory factor models in s single step directly from the student microdata by using the MLMV method (StataCorp, 2017: 574). This allows us to control for the complex survey design through weighting and cluster-robust estimation and, at the same time, to implement an alternative full-information maximum-likelihood approach to account for missing item values.

Let us again look at the shared variances between the student scores obtained via the MLMV method and those via the two-step approach described in section 3.1.1 (see appendix: Equivalence of Scores from Robust MLMV).

36

With the exception of the aforementioned wealth scale (both scale versions), the shared variances uniformly exceed 96% (i.e., CD > .96) for all of the 87 scales in this documentation. This can again be taken as indirect evidence that the additional assumptions of the two-step approach regarding multivariate normal distributions and the measurement level are mostly met and, hence, that the statistics and indices derived from it are valid. To sum up, the robustness checks imply that with the few exceptions mentioned above, student-score estimates are very robust across the three different estimation meth- ods recommended for the type of data analysed here.

37

36 A disadvantage of this method is that many statistics to judge model fit and scale qualities are unavailable.

37 This may be due to the fact that we analyse short, one-dimensional scales based on a large sample.

(26)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

3.2.2 Inclusion of student scores in multivariate statistical models

Instead of using the scale-specific student scores, there are often good reasons to

embed scale-specific CFA models into a more comprehensive structural equation

model of substantive interest and to fit them all together in one step (cf., e.g.,

Aichholzer, 2017). It should be noted, however, that simultaneous estimation of both

the measurement and the substantive part of a structural equation model is not nec-

essarily always the best choice (cf. Devlieger & Rosseel, 2017): When one analyses a

subsample of limited size, for instance, robust estimation of more complex models

may be impossible. Moreover, even when the sample is large, misspecification bias

in one part of a complex model may spread to other parts when they are fitted in a

single step. A two-step approach employing previously estimated factor scores to in-

vestigate the substantive part of the model may have methodological merits in this

respect (ibid.). This approach also has methodological drawbacks, however, basically

because it implicitly treats factor scores as error-free measures of the latent dimen-

sions to be analysed.

38

Some of the resulting problems, possible biases and correction

methods are discussed, for example, by Croon (2002), Lu and Thomas (2008), Jin et

al. (2016), and Devlieger and Rossel (2017).

(27)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

23

4 Scale-specific reporting: content and interpretation

In this section, we outline the various statistics, indices and quality measures repor- ted in the scale

. For each scale (or subscale; cf. Table 4), this report includes two pages with a variety of scale-specific statistics. Below, we take the scale that measures ‘Parental pressure to achieve’ as an example to illustrate the scope and interpretation of scale-specific results. Figure 3 displays the results for this scale as they appear in the appendix. If nothing else is mentioned, all reported results refer to the two-step estimation of the CFA model according to Figure 2. However, the student-scores descriptives refer to the scores obtained from the GSEM model, as the ‘press’ items are rated on a four- point scale (see section 3.2.1). The header of each scale-specific results section in- cludes the name of the scale that is also used to label the related student-score vari- able in the scientific use file. Furthermore, the headers specify the sample basis on which the calculations for the respective scales draw (baseline survey, full AES sam- ple or maths sample split).

The model and fit statistics reported include two likelihood-ratio tests as well as var- ious common goodness-of-fit statistics, as discussed in the SEM literature (cf.

Schreiber et al., 2006). The likelihood-ratio tests compare the current against the sat- urated model and the baseline model (basically postulating uncorrelated items), re- spectively. Ideally, we would expect a non-significant likelihood-ratio test of the cur- rent against the saturated model, which, for the reasons given above, is an unlikely result, however (see also van der Eijk & Rose, 2015). Moreover, for a well-fitting model, we expect the comparative fit index (CFI) and the Tucker–Lewis index (TLI) to approach 1, whereas the root mean square error of approximation (RMSEA) and the standardised root mean squared residual (SRMR) should be close to 0. Conventional cut-off criteria indicating a good fit between the hypothesised model and the ob- served data are ≥ .95 for CFI and TLI ≤ .06 for RMSEA and ≤ .08 for SRMR (see Hu &

Bentler, 1999). Regarding Figure 3, one could tentatively conclude that the one-di-

mensional CFA model fits the achievement-pressure scale sufficiently well, with

some reservations regarding RMSEA and TLI, however. Two fit measures designed to

compare different models, Akaike’s information criterion (AIC) and the Bayesian in-

formation criterion (BIC), are also reported. They may serve as a point of reference if

data users wish to fit alternative scaling models to the data. Finally, the coefficient of

determination (CD) may be considered as an alternative measure of composite relia-

bility (in the sense of internal consistency; cf. Bollen, 1989: 220f.), to be interpreted

similarly to the reliability measures below.

(28)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

Figure 3: Example of the reported scale-specific results (initial results page)

(29)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

25

1989: 209ff.). We report three alternative measures of internal scale consistency:

Cronbach’s Alpha is still the most widespread, although much criticised, consistency measure (ibid.: 217; Sijtsma, 2009; Revelle & Zinbarg, 2009; Trizano-Hermosilla &

Alvarado, 2016). In a nutshell, it is widely recognised that alpha underestimates in- ternal consistency if the indicators are ordinal or congeneric (i.e., not tau-equivalent) as is typical of most practical research situations. We nevertheless do report the clas- sical version of alpha as it is part of most survey documentations and — if interpreted as a lower-bound estimate of internal scale consistency — may still be useful for com- parative purposes.

39

In addition, we also report Ordinal Cronbach’s Alpha, which is calculated the same way as classical alpha but from the matrix of polychoric instead of Pearson correlations (see Gadermann, Guhn & Zumbo, 2012: 5). This avoids down- ward bias owing to ordinal measurement. Finally, we also report McDonald’s Omega, which is one of the most recommended measures of internal consistency. Omega is calculated on the basis of the factor loadings of the one-dimensional CFA model (according to formula 1 in Trizano-Hermosilla & Alvarado, 2016), which implies that it is adjusted for ordinal measurement. As omega is appropriate for congeneric indi- cators, it is probably the most adequate measure overall of internal scale consistency in our context (see also Yang & Green, 2015). Basically, values close to 1 indicate high internal consistency for all three measures. Looking at Figure 3, many researchers would probably interpret the identical ordinal alpha and omega values of .810 each as an indication of a ‘good’, consistent scale. It should be noted, however, that the widely used rules of thumb to determine whether internal scale consistency can be considered ‘acceptable’ or ‘good’ (usually values above .7 and .8, respectively) are not without problems. First, there exist various such rules of thumb with different critical thresholds. Second, and more importantly, such rules should not be applied blindly, as the acceptable level of internal consistency depends strongly on the type of anal- ysis to be performed (Lance, Butts & Michels, 2006).

40

A crucial assumption of the estimated CFA models is that the analysed item set cap- tures only one latent construct. Therefore, we have also included a test of the assumed one-dimensionality. However, assessing dimensionality of Likert-type items is quite

”risky business”, as van der Eijk and Rose (2015) put it. We used explorative factor analysis of polychoric correlations followed by Horn’s parallel analysis to assess the dimensionality of the item sets, which proves to be a comparatively well-performing

39 The Stata package “Alphawgt”, which allows for weights, was used to calculate alpha (Jann, 2004).

40 There are some rather dubious rules of thumb that distinguish different levels of internal scale consistency (i.e., Cronbach’s alpha). A popular variant is: α < .5: unacceptable; .5 ≤ α < .6: poor; .6 ≤ α < .7: questionable;

.7 ≤ α < .8: acceptable; .8 ≤ α < .9: good; .9 ≤ α: excellent

(cf. https://en.wikipedia.org/wiki/Internal_consistency, accessed on June 23, 2020).

(30)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

method (ibid.; Garrido, Abad & Ponsoda, 2013).

41

Basically, we applied an eigenvalue criterion that was corrected for random factors to account for sampling variance to determine the number of factors to be retained. In Figure 3, this approach gives us no reason to believe that the achievement-pressure scale is not one-dimensional, as only the eigenvalue of the first factor exceeds the critical value of zero. If we leave aside the scales composed of several sub-dimensions (cf. Table 4), the eigenvalues of the second factor are mostly below or only very slightly above zero for most of the scales in this documentation.

42

This being the case, we have no clear indication that the one-dimensionality assumption is violated.

The section below the model-fit statistics in Figure 3 documents the standardised factor loadings for each item, including standard errors and the confidence intervals.

The item names correspond to those in the scientific use file (without the prefix- marker for the survey wave). High standardised loadings above, say, .6 or .7 indicate that neither measurement errors nor strong unique factors contribute excessively to the variance of the observed indicators. Almost all loadings reported in the appended scales reach this level. Occasionally, however, items show noticeably weaker loadings below .5 or even below .4, which some researchers may consider problematic. Even- tually, the definition of an acceptable factor loading remains arbitrary and depends on the type of analysis, the number of scale items affected and the quality as well as the overall internal consistency of the scale (ibd.). As in other respects, we prefer to leave it to the data users to judge a particular scale’s qualities.

To the right of the loadings, a number of item descriptives are reported, including the mean, the standard deviation, the range of the rating scale applied for item evalua- tion (min., max.) and the number of students with valid item data (see section 3.1.1b).

At the bottom of the first page of our scale-specific results, we report the parameters

of the categorical GSEM model (cf. section 3.1.2) where it is estimated. Note that for

this model, there are two types of item-specific parameters, namely, factor coeffi-

cients (‘coef’) that measure the effect of the latent variable on the indicator rating,

(31)

Sacchi & Krebs-Oesch Scaling methodology and scale reporting in the TREE2 panel survey

27

section 3.1) where students’ item evaluations rely on short rating scales with four or less points (as documented by the item descriptives).

A second page of scale-specific results (see Figure 4 below) is dedicated to tests and indices that assess measurement invariance across survey languages and, where ap- propriate, across survey settings and modes. This is an important facet of measure- ment quality, as student scores obviously should be comparable – i.e., measure the same concepts on a possibly invariant scale – across all kinds of measurement condi- tions and subsamples of the underlying student population. We focus on some of the most crucial tests suggested in the literature on the multi-group analysis of measure- ment invariance (e.g., Vandenberg & Lance, 2000; Milfont & Fischer, 2015) to assess cross-language measurement equivalence. On top of the second results page, we first report a chi-square test of the equality of the item-covariance matrices across survey

Figure 4: Example of the reported scale-specific results (second results page)

Referenzen

ÄHNLICHE DOKUMENTE

The results indicate medium-size score differences between the categories obesity versus normal weight or underweight for the subscales Food available and Food present as well as

The 15 lowest energy structures out of 200 calculated structures are shown (from top to bottom) after iterations 0, 2, 4, 6 and 8 of the ARIA protocol, and the

With respect to developments in sample size, the following figures focus on (2.1) comparing the number of successful interviews by cross-section, (2.2) providing a longitudinal

With respect to developments in sample size, the following figures focus on (2.1) comparing the number of successful interviews by cross-section, (2.2) providing a longitudinal

With respect to developments in sample size, the following figures focus on (2.1) comparing the number of successful interviews by cross-section, (2.2) providing a longitudinal

Standardized Regression Coefficients, and R 2 Values, of Nonlinear Regression Equations Explaining Activity Emotions from Proximal

sie können Verantwortung für einzelne sie können Verantwortung für einzelne Arbeitsschritte übernehmen und sich dadurch Arbeitsschritte übernehmen und sich dadurch

As previously reviewed, 33 factors associated with poor SROH include sociodemographic factors (older age, female and low socioeconomic status), oral conditions (toothache,