• Keine Ergebnisse gefunden

Multidimensional Item Response Theory Models in Vocational Interest Measurement An Illustration Using the AIST-R

N/A
N/A
Protected

Academic year: 2022

Aktie "Multidimensional Item Response Theory Models in Vocational Interest Measurement An Illustration Using the AIST-R"

Copied!
14
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Journal of Psychoeducational Assessment 2014, Vol. 32(4) 342 –355

© 2013 SAGE Publications Reprints and permissions:

sagepub.com/journalsPermissions.nav DOI: 10.1177/0734282913508244

jpa.sagepub.com

Article

Multidimensional Item Response Theory Models in Vocational Interest Measurement: An Illustration Using the AIST-R

Eunike Wetzel

1

and Benedikt Hell

1

Abstract

Vocational interest inventories are commonly analyzed using a unidimensional approach, that is, each subscale is analyzed separately. However, the theories on which these inventories are based often postulate specific relationships between the interest traits. This article presents a multidimensional approach to the analysis of vocational interest data, which takes these relationships into account. Models in the framework of Multidimensional Item Response Theory (MIRT) are explained and applied to a widely used German vocational interest inventory based on the RIASEC model, the AIST-R. MIRT models were more appropriate to describe the data than unidimensional models. It follows that responses to some items were not only influenced by the interest type they were designed to measure but also by another dimension. The advantages of MIRT models are discussed.

Keywords

vocational interests, Multidimensional Item Response Theory (MIRT), RIASEC

Introduction

Theories of vocational interests posit differing numbers of interest dimensions. For example, according to Holland’s popular RIASEC model (Holland, 1959, 1997), six interest types describe vocational interests, namely, Realistic, Investigative, Artistic, Social, Enterprising, and Conventional. Numerous vocational interest inventories exist that assess these interest types (e.g., Self-directed Search; Holland, Fritzsche, & Powell, 1994). Analyses of data from voca- tional interest inventories usually treat the subscales as separate constructs with each item only assessing one construct. However, the theories on which these inventories are based often postu- late specific relationships between the interest dimensions. For example, Holland (1997) theo- rizes that his RIASEC types can be represented by a hexagon in which spatial distances between interest domains indicate the degree to which they are related (see Figure 1). In the hexagon, adjacent interests (e.g., Realistic and Investigative) are more highly related than alternate inter- ests (e.g., Realistic and Artistic), which are in turn more highly related than opposite interests

1University of Konstanz, Baden-Württemberg, Germany Corresponding Author:

Eunike Wetzel, Center for Educational Science and Psychology, Eberhard Karls University Tuebingen, Europastraße 6, 72072 Tübingen, Germany.

Email: eunike.wetzel@uni-tuebingen.de

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-263272

Erschienen in: Journal of Psychoeducational Assessment ; 32 (2014), 4. - S. 342-355

(2)

(e.g., Realistic and Social). The hexagonal (or circumplex; Guttman, 1954) structure of voca- tional interests based on the RIASEC model has been confirmed empirically (e.g., Rounds, Tracey, & Hubert, 1992). This raises the question whether relationships between interest domains should be taken into account in analyses of vocational interest data.

A promising development for the analysis of data from distinct, but related, latent traits such as the RIASEC factors, is Multidimensional Item Response Theory (MIRT). MIRT has been used extensively in the ability domain to analyze responses on items that require more than one ability.

For example, Walker and Beretvas (2003) compared a one-dimensional model for general math- ematical ability with a two-dimensional model (additionally including the ability to communi- cate in mathematics) for data from a large-scale mathematics test. Their results indicated that the two-dimensional model was more appropriate and, importantly, that inferences about student ability levels differed depending on which model was chosen. In one of the few applications of MIRT outside the ability domain, Osteen (2010) used MIRT to assess the latent factor structure of the Social Work Community of Practice Scale. As no applications of MIRT to the vocational interest domain could be found, it seems important to broaden knowledge here.

The general aim of this article is to demonstrate the application of MIRT to vocational interest measurement. As an example, we analyze a widely used German vocational interest inventory, namely, the General Interest Structure Test (AIST-R; Bergmann & Eder, 2005) using unidimen- sional and multidimensional models. This also allows us to investigate whether the unidimen- sional or multidimensional modeling of vocational interest data is more adequate for the AIST-R, although these results may not generalize to other interest inventories. The remainder of this article is structured as follows. First, an introduction to MIRT is given in which the differences between unidimensional and multidimensional Item Response Theory (IRT) will be depicted.

Second, the specific IRT models used in this paper’s analyses will be explained briefly. Third, results from unidimensional and multidimensional models for the AIST-R will be compared.

Finally, the results and their implications for vocational interest measurement will be discussed.

Unidimensional and Multidimensional IRT

In the framework of IRT, latent traits are related to responses using probabilistic models. These models describe the interaction between persons and items (Reckase, 1997). In the most basic IRT model, the one-parameter logistic model or Rasch model (Rasch, 1960), the probability of a

S A

I

R

C E

Figure 1. Hexagonal representation of the RIASEC model.

Source. Adapted from Holland (1997, p. 6).

(3)

response of 1 (correct response or agreement), is a function of the distance between a person’s trait level and the difficulty of the item. Many extensions of this basic IRT model exist (for an overview, see Embretson & Reise, 2000).

The main difference between unidimensional IRT (UIRT) and MIRT is that UIRT models posit that only one latent trait influences responses while MIRT models posit that several latent traits influence responses. Thus, UIRT models the probability of a certain response as a function of a single latent trait, while MIRT models the probability of a response depending on a vector of multiple latent trait dimensions. MIRT analyses enable researchers to investigate and take into account the underlying relationships in the data because MIRT models link multiple latent traits to item responses (Finch, 2010; Reckase, 2009). Besides viewing MIRT as an extension of UIRT, it can also be conceptualized as a special case of factor analysis. MIRT and factor analysis “define hypothetical scales that can be used to reproduce certain features of the data that are the focus of the analysis” (Reckase, 2009, p. 63). However, MIRT and factor analysis have different goals.

While MIRT aims at modeling the interaction between persons’ latent trait levels and the responses to items, factor analysis usually aims at data reduction, that is, explaining relationships between items using a small number of factors. One further difference is that in MIRT, item char- acteristics (e.g., item difficulty and item discrimination) are an object of study, whereas in factor analysis, item characteristics are not of interest. For an extensive treatise of MIRT and its rela- tionship to factor analysis, the interested reader is referred to Reckase (2009).

MIRT models, as described above, are sometimes called within-MIRT models to distinguish them from the so-called between-MIRT models. Between-MIRT models can be considered a special case of within-MIRT models because several dimensions are modeled simultaneously, but each item only measures one specific dimension. These two cases of multidimensionality are contrasted with the unidimensional case in Figures 2 to 4 for the first three RIASEC factors, namely, Realistic, Investigative, and Artistic. As can be seen in the comparison of Figures 2 and 3, between-MIRT models are similar to unidimensional models in that each dimension is assessed by a unique set of items. However, because between-MIRT models include several dimensions simultaneously, they allow the researcher to model the correlations between dimensions as well.

In contrast, within-item multidimensionality assumes that responses to some items depend on the trait levels on more than one latent trait (see Figure 4).

Unidimensional and Multidimensional Partial Credit Model

In this article, unidimensional and multidimensional models based on the Partial Credit Model (PCM) will be used. The unidimensional PCM (uPCM) was formulated by Masters (1982) and forms an extension of the dichotomous Rasch Model (Rasch, 1960) to polytomous data with ordered response categories. Because the uPCM is a direct extension of the Rasch Model, all the desirable properties of the Rasch Model, such as the sum score being a sufficient statistic and the independence of item and person parameters (Embretson & Reise, 2000), are preserved. The multidimensional PCM (mPCM) was developed as a generalization of the uPCM to multidimen- sional data by Kelderman (1996). The main difference between the uPCM and the mPCM is that in the multidimensional case, more than one trait parameter exists. Furthermore, weights are included that indicate whether the response to an item depends on a certain trait. The uPCM and the mPCM assume that items only differ in their level of difficulty but not in their ability to dif- ferentiate between different trait levels. The Generalized Partial Credit Model (GPCM; Muraki, 1992) is an extension of the PCM that relaxes this assumption by additionally modeling discrimi- nation parameters for each item. Model fit comparisons between the PCM and the GPCM indi- cate whether the assumption of equal item discriminations in the PCM holds. Detailed accounts of different versions of the PCM can be found in Masters (1982), Kelderman (1996), and Muraki (1992).

(4)

Application of MIRT to Vocational Interest Measurement

Most interest inventories (e.g., Self-directed Search; Holland et al., 1994) consist of subscales that are all purported to measure related, but distinct, latent traits. In these inventories, each item is assigned to one particular subscale and none of the items are common between subscales. The dominant approach is to analyze these subscales separately, ignoring the possibly existing multidi- mensionality between them. In this approach, relationships between latent traits are not modeled.

According to Adams, Wilson, and Wang (1997), when subscales measure highly correlated latent traits, “a joint analysis with an appropriate model will lead to improved estimation of the item parameters and ability predictions” (p. 11). Thus, the unidimensionality assumption may be inap- propriate for analyzing instruments assessing (at least in part) highly intercorrelated latent traits such as the RIASEC factors. Therefore, in this article UIRT and MIRT models will be applied. First, pre-analyses will be conducted to investigate which items may load on additional factors to the one they were designed to measure. Second, unidimensional and multidimensional models will be com- pared for each RIASEC factor to analyze whether multidimensional models provide a more ade- quate representation of the data. Third, a between-MIRT model and a within-MIRT model that incorporate all RIASEC factors simultaneously will be compared regarding their fit. Finally, cor- relations between the RIASEC factors will be contrasted between unidimensional models and a

I R

A

Item 2 Item 3 Item 1

Item 4

Item 5 Item 6

Item 7 Item 8 Item 9

Figure 2. Unidimensional models.

(5)

between-MIRT model to investigate which model best represents the relationships posited by Holland’s model.

Method Instrument

Participants filled out the General Interest Structure Test (“Allgemeiner Interessen-Struktur- Test”; AIST-R; Bergmann & Eder, 2005). The AIST-R is applied to assist occupational and career decisions (e.g., in the context of career counseling). Its 60 items (10 per dimension) describe occupational activities. For example, a Realistic item reads to manufacture something according to a plan or a sketch. Interests in the Investigative domain are, for example, assessed by the item to examine the behavior of animals or plants. The AIST-R’s rating scale is a 5-point Likert-type scale with the poles I am not interested in this at all; I do not enjoy doing this at all and I am very interested in this; I enjoy doing this very much. It takes about 10 to 15 min to complete. The AIST-R was developed according to Holland’s RIASEC model (Holland, 1959, 1997). A (three- letter) Holland code is derived from the test-taker’s subscale scores and can subsequently be matched with a list of occupations classified by their Holland codes. Thus, the test-taker is

I R

A

Item 2 Item 3 Item 1

Item 4

Item 5 Item 6

Item 7 Item 8 Item 9

Figure 3. Between-multidimensional model.

(6)

provided with information on occupations that have identical or similar interest profiles as his or her own profile. Evidence supporting the validity of the AIST-R is reported in its manual (Bergmann & Eder, 2005) and includes relationships between the AIST-R’s RIASEC scales and different personality traits, as well as its satisfactory fit to the circumplex structure. Convergent and discriminant validity is supported by the relationships of the AIST-R’s RIASEC scales with the RIASEC scales in the German version of the Self-directed Search (Explorix; Jörin, Stoll, Bergmann, & Eder, 2003). IRT reliabilities for the RIASEC scales resulting from uPCMs and a between-MIRT PCM are reported in Table 1. These can be interpreted analogous to Cronbach’s alpha.

Samples

Two samples were used, one for the pre-analyses and another for the other (main) analyses. The pre-analyses sample comprised 797 students (62.4% women) who were enrolled either in college or vocational training. Participants were between 16 and 48 years old (M = 21.42, SD = 2.89).

Data were collected online between October 2010 and March 2011.

The main analyses sample contained 3,997 students in their last year of school (56.7% women) with an age range from 17 to 30 years (M = 19.21, SD = 1.10). These data were collected between

I R

A

Item 2 Item 3 Item 1

Item 4 Item 5 Item 6

Item 7 Item 8 Item 9

Figure 4. Within-multidimensional model.

(7)

2003 and 2010 at the University of Linz in Austria. Participants were seeking vocational counsel- ing and filled out a battery of tests (the AIST-R, personality inventories, and cognitive ability tests) described in Bergmann (2008). Participants received extensive feedback on their results.

Raw mean scores and standard deviations for this sample on the AIST-R’s RIASEC scales are displayed in Table 1.

Pre-Analyses

Prior to estimating the IRT models, exploratory factor analyses (EFAs) using maximum likeli- hood estimation with Promax rotation were performed on the correlation matrix using SPSS (IBM, 2012). The purpose of these EFAs was to generate hypotheses concerning which items were measuring another interest domain besides the one they were intended to measure. Items that showed high loadings (>.30) on another factor in addition to the one they were designed to measure were modeled to assess two dimensions in the within-MIRT models: (a) the interest dimension they belonged to by instrument design and (b) the secondary dimension they had a high loading on.

Model Fit Comparisons

Model fit comparisons between unidimensional and multidimensional models were conducted first for each RIASEC factor separately and second for models that incorporated all RIASEC factors simultaneously. Two types of unidimensional models were estimated for each RIASEC factor: (a) uPCMs that only assume differences in item difficulty and (b) uGPCMs that addition- ally allow differences in item discrimination. For RIASEC factors that the EFA indicated con- tained items that loaded on a second dimension, within-MIRT models were estimated (mPCM and mGPCM). Thus, this model fit comparison allowed us to test the dimensionality of the data (uPCM vs. mPCM and uGPCM vs. mGPCM) and whether equal item discriminations could be assumed (uPCM vs. uGPCM and mPCM vs. mGPCM). These models were analyzed using the mirt package (Chalmers, 2012) in R (R Core Team, 2013). mirt uses the Metropolis-Hastings Robbins-Monro algorithm for model estimation.

Then, a between-MIRT model and a within-MIRT model based on the mPCM that contained six dimensions (one for each RIASEC factor) were compared regarding their fit. In the between- MIRT model, each RIASEC factor was modeled to be assessed by the items it was designed to measure, with no items overlapping between dimensions. The within-MIRT model also con- tained six dimensions, one for each RIASEC factor, but here, items with substantial Table 1. Means, Standard Deviations, and Item Response Theory Reliability Estimates for the AIST-R Scales.

Interest Type M (SD)

IRT Reliability

uPCM mPCM

Realistic 24.51 (8.59) 0.86 0.85

Investigative 30.37 (7.93) 0.84 0.83

Artistic 30.28 (8.55) 0.84 0.83

Social 32.77 (7.91) 0.88 0.87

Enterprising 37.02 (7.64) 0.86 0.86

Conventional 30.86 (7.15) 0.84 0.84

Note. uPCM = unidimensional Partial Credit Model; mPCM = multidimensional Partial Credit Model (between-MIRT model).

(8)

cross-loadings in the EFA were modeled to measure more than one dimension. These two models were estimated using ConQuest (Wu, Adams, Wilson, & Haldane, 2007) because latent correla- tions (see below) could only be obtained in ConQuest. ConQuest applies marginal maximum likelihood estimation. The identification constraint was set on the latent trait parameters (i.e., they have a normal distribution), whereas item parameters were estimated freely.

Two information criteria, namely, Akaike’s Information Criterion (AIC; Akaike, 1973) and the Bayesian Information Criterion (BIC; Schwarz, 1978), were consulted to evaluate model fit.

For the AIC and the BIC, the better-fitting model is the one with a lower AIC or BIC value, respectively.

Comparison of Correlations

Correlations between the RIASEC types were compared between the unidimensional models and the between-MIRT model. For the unidimensional models, correlations between latent traits could not be estimated directly as in the between-MIRT model. Thus, the two-step procedure described in Wang, Chen, and Cheng (2004) was applied. First, the Pearson product–moment correlation between the weighted likelihood estimates (WLEs) for two latent traits were com- puted. WLEs (Warm, 1989) are estimates for the person parameters in the framework of IRT, that is, they indicate a person’s standing on the latent trait. Second, the disattenuation formula pro- posed by Spearman (1904) was applied to correct the correlations for the unreliability of the measures. The reliability coefficients used here were the WLE person separation reliabilities computed in ConQuest for each model. This two-step procedure was used for all bivariate cor- relations between the RIASEC interest domains. In the between-MIRT model, correlations can be estimated directly and therefore do not contain measurement error due to unreliability as in the unidimensional approach. These latent correlations between the factors were compared with the attenuated and disattenuated correlations obtained from the UIRT models.

Results

First, results from the pre-analyses using EFA will be reported. Second, model fit comparisons between the unidimensional models and multidimensional models will be described. Third, cor- relations obtained from unidimensional models and the between-MIRT model will be compared.

Pre-Analyses

The EFA conducted using the pre-analyses sample indicated that all RIASEC factors except for Social contained at least one item that loaded substantially (>.30) on a second factor besides the one it was designed to measure. These items were modeled to measure a second dimension in addition to the one they were designed to measure in the within-MIRT models (see Table 2).

Model Fit Comparisons

Table 2 shows the model fit comparisons for each RIASEC factor. Model comparisons between the uPCM, uGPCM, mPCM, and mGPCM yielded the best fit for the mGPCM according to AIC and BIC for Realistic, Investigative, Artistic, and Enterprising. Thus, for these interest types, a multidimensional model that accounts for differing item discriminations was better able to repre- sent the data than a unidimensional model. Note, however, that for Enterprising the difference in AIC and BIC between the uGPCM and the mGPCM was miniscule. For Social, no MIRT models were estimated because none of the items showed substantial cross-loadings on a second factor.

(9)

Here, the comparison between the uPCM and the uGPCM resulted in a better fit for the uGPCM.

The uGPCM was also the best-fitting model for Conventional although the difference between uGPCM and mGPCM was again very small. Thus, the GPCM consistently yielded a better fit than the PCM for all interest types, indicating that the assumption of equal item discriminations made in the PCM did not hold. Concerning dimensionality, Realistic, Investigative, and Artistic were best represented using multiple dimensions, whereas Social appeared to be unidimensional.

Drawing on these results, the within-MIRT model including all six interest types was modeled with items loading on another dimensions besides the one they were designed to measure for Realistic, Investigative, Artistic, and Enterprising. The comparison of model fit between the between-MIRT model and the within-MIRT model yielded an AIC (BIC) of 625,728 (627,370) for the former and 458,374 (459,639) for the latter. Thus, the within-MIRT model showed a bet- ter fit than the between-MIRT model, indicating that the within-MIRT model was more adequate to describe the data.

In addition, WLE person separation reliabilities for the RIASEC scales were compared between the unidimensional models and the between-MIRT model. As can be seen in Table 1, these were practically identical. Furthermore, fit analyses were also conducted at the item level using the weighted mean square computed by ConQuest. Five items exceeded the criteria for low-stake tests according to which MNSQ values between 0.70 and 1.30 are acceptable (Wright

& Linacre, 1994). For the between-MIRT and within-MIRT models, fewer items (three and two, Table 2. Model-Fit Comparison Between Unidimensional and Multidimensional Models.

Interest Type Model Items on Second Dimension Log-Likelihood AIC BIC

Realistic uPCM −49896.66 99881.32 100158.23

uGPCM −49145.62 98399.24 98739.08

mPCM 7, 19, 49, 55 −50570.91 101227.82 101498.43

mGPCM 7, 19, 49, 55 −49031.61 98177.22 98535.94

Investigative uPCM −54223.90 108543.80 108845.90

uGPCM −53161.61 106439.20 106804.20

mPCM 38, 44 −54794.32 109682.60 109978.40

mGPCM 38, 44 −53126.35 106370.70 106742.00

Artistic uPCM −56746.00 113588.00 113890.10

uGPCM −55610.48 111337.00 111702.00

mPCM 9, 15, 27, 39, 51 −57240.79 114575.60 114871.40 mGPCM 9, 15, 27, 39, 51 −54754.22 109632.40 110022.60

Social uPCM −50305.09 100714.18 101041.43

uGPCM −49459.88 99043.76 99433.95

Enterprising uPCM −48812.71 97717.42 98006.91

uGPCM −48336.67 96785.35 97137.77

mPCM 23 −48743.69 97577.37 97860.57

mGPCM 23 −48336.40 96784.80 97137.22

Conventional uPCM −53561.42 107216.80 107512.60

uGPCM −53038.84 106191.70 106550.40

mPCM 48 −53641.42 107374.80 107664.30

mGPCM 48 −53041.46 106196.90 106555.60

Note. AIC and BIC values of the best-fitting model are depicted in italics. N for all models was 3,997. AIC = Akaike’s Information Criterion; BIC = Bayesian Information Criterion; uPCM = unidimensional Partial Credit Model; mPCM = multidimensional Partial Credit Model.

(10)

respectively) demonstrated an MNSQ below 0.70 or above 1.30. Detailed results on the item fit analyses can be obtained from the first author on request.

Comparison of Correlations

Attenuated and disattenuated correlations between latent traits using WLEs from the uPCMs1 are reported in Table 3. As would be expected, disattenuated correlations were generally higher than attenuated correlations. Overall, correlations confirm the RIASEC circumplex structure, for example, the adjacent traits Realistic and Investigative correlated higher than the opposite traits Realistic and Social (r = .78 and r = −.24 for disattenuated uPCM correlations).

Correlations between latent traits were estimated directly in the between-MIRT model. Table 3 contrasts these latent correlations with the attenuated and disattenuated correlations resulting from the separate uPCMs. Correlations obtained in the between-MIRT model were higher than uPCM correlations in most cases. However, for disattenuated correlations, this difference is reduced com- pared with the attenuated correlations. For example, the latent correlation between Realistic and Investigative is .83 (between-MIRT model), while it is .66 (disattenuated .78) for the uPCM.

Discussion

In this article, the unidimensional approach to analyzing interest inventories was extended by analyses in the framework of MIRT. Model fit comparisons indicated that for Realistic, Investigative, Artistic, and Enterprising, item responses were influenced by more than one inter- est type. Enterprising and Conventional only contained one item loading on a second dimension, whereas for Realistic, Investigative, and Artistic, two or more items loaded on a second dimen- sion. This may have led to the very similar AIC and BIC values between the uGPCM and the mGPCM for Enterprising and Conventional, indicating that without this one item, unidimension- ality may also exist for Enterprising. This illustrates that MIRT can be used as a tool to explore the underlying dimensionality of the data (Reckase, 1997; Yao & Schwarz, 2006). Despite the promising developments in the area of MIRT, it should not be overlooked that having items that measure more than one trait can still be considered problematic from a measurement perspective, especially if the intention was to construct unidimensional scales. If scales are shown to be uni- dimensional (as in our study for Social and Conventional), analyzing each scale separately is justified. Nevertheless, between-MIRT models can still be useful in this case because they pro- vide latent estimates of the correlations between scales.

The model fit comparison between the between-MIRT model and the within-MIRT model showed that the within-MIRT model, which allowed several items on Realistic, Investigative, Artistic, and Enterprising to load on secondary dimensions, was more adequate to describe the Table 3. Correlations Between RIASEC Factors for Separate uPCMs and Between-MIRT Model.

Realistic Investigative Artistic Social Enterprising Conventional

Realistic .83 −.22 −.28 −.25 .05

Investigative .66; .78 −.04 −.14 −.20 .00

Artistic −.16; −.19 −.00; −.00 .56 .24 −.00

Social −.21; −.24 −.11; −.13 .45; .52 .47 .17

Enterprising −.19; −.22 −.15; −.18 .20; .24 .37; .43 .53

Conventional .05; .06 .01; .02 .00; .00 .14; .16 .44; .52 Note. Above diagonal: latent correlations for the between-MIRT model, below diagonal: attenuated and disattenuated correlations from uPCMs. uPCM = unidimensional Partial Credit Model.

(11)

data. This confirms the results obtained from separate analyses of the interest types that responses to the AIST-R’s items on these subscales are influenced not only by the interest dimension they were designed to measure but also by a second or third interest dimension. These results also coincide with a recent analysis of the AIST-R regarding differential item functioning (DIF;

Wetzel & Hell, 2013), which found that, for some items showing gender-DIF, the differential functioning of the items appears to be related to a second interest dimension.

WLE person separation reliabilities were practically identical between the unidimensional and multidimensional (between) approach. Thus, in our study, no differences in measurement precision were found between these two approaches. However, other studies (e.g., Cheng, Wang,

& Ho, 2009) showed that MIRT models usually yield higher measurement precision. Item fit was problematic for several of the AIST-R’s items, especially in the unidimensional PCMs. In the MIRT models, fewer of these items demonstrated problematic item fit.

When item content overlaps between traits, multidimensional models may be more appropri- ate than unidimensional models to analyze response data because they explicitly take these underlying relationships into account. Thus, by using MIRT models, we can model the theoreti- cal assumptions made by Holland concerning the circumplex structure of the RIASEC factors.

This is also reflected by the correlations between the interest traits that confirmed Holland’s (1997) predictions. In MIRT models, these can be estimated directly and are not attenuated with measurement error due to unreliability. Consequently, correlations were higher compared to the attenuated correlations obtained in the unidimensional models. However, they were often also higher than the disattenuated unidimensional correlations, which indicates that the multidimen- sional models have a stronger capability of capturing the real relationship between the traits than the unidimensional models.

As noted in the introduction, MIRT and factor analysis models share a number of properties although they follow different goals. One of the benefits of MIRT over factor analysis is that it addresses the complexity in the interaction between persons and items by modeling variation in item properties such as difficulty and discrimination. On the other hand, factor analysis usually analyzes a correlation matrix and thus ignores other item characteristics. Thus, MIRT can provide a better understanding of the items.

Implications of MIRT Models to Vocational Interest Measurement

There are several reasons why we think vocational interest measurement could benefit from analyses in the framework of MIRT. From a psychometric perspective, it is important to consider multidimensionality in the data. If the measured latent trait is multidimensional, the fundamental assumption of local independence made by UIRT models may be violated, resulting in biased item parameter estimates (Finch, 2010) and person trait estimates (Ackerman, 1992). In contrast, MIRT makes use of the relationships between dimensions, which enables it to produce more accurate item and person parameter estimates (Cheng et al., 2009). Furthermore, multidimen- sionality can cause DIF (Ackerman, 1992). The multidimensional approach allows a direct esti- mation of the correlations between latent traits (without measurement error; Wang et al., 2004), which is preferable to correlating trait estimates derived from unidimensional models. Kirisci, Hsu, and Yu (2001) recommend the use of MIRT models when correlations between latent traits vary as is the case in many RIASEC-based interest inventories.

From a substantive perspective, MIRT analyses may deepen the understanding of vocational interest data compared with traditional methods. In MIRT, items provide information about mul- tiple latent traits simultaneously. This additional information that results from linking item responses to multiple traits can have diagnostic value. For example, diagnostic profiles of partici- pants’ latent trait levels can be estimated more precisely when relationships between traits are taken into account (Yao & Schwarz, 2006). This is especially important in the context of

(12)

counseling because using unidimensional models to model multidimensional data can lead to different inferences concerning trait levels compared to when the appropriate multidimensional model is applied (Walker & Beretvas, 2003). MIRT allows modeling RIASEC data in a way that is more consistent with the theoretical assumptions of the model because relationships between dimensions are taken into account in the estimation of correlations and trait levels. However, more research on the application of MIRT to vocational interest measurement is clearly needed.

Limitations

In this study, both samples consisted of students. It is not clear to what extent the results can be generalized to persons from other populations. Moreover, MIRT analyses were only conducted on one interest inventory, namely, the AIST-R. Future research could analyze other widely used interest inventories regarding the multidimensionality of their data. Furthermore, this study mainly addressed two aspects, namely, model fit comparisons and comparisons of correlations.

Further research could compare the trait estimates obtained from different models, although a simulation study would be needed to be able to judge which trait estimates are more accurate.

MIRT models can also be extended to include background variables (e.g., gender, cognitive abili- ties), which may yield further information on relationships between trait variables and back- ground characteristics. This was not implemented in this study as it would have added complexity to the models and their presentation.

Conclusion

This article attempted to demonstrate the benefits of MIRT analyses to vocational interest mea- surement. MIRT analyses (when applied to multidimensional data) have several advantages, for example, greater precision in the estimation of item and person parameters, direct estimation of the correlations between dimensions, and a more accurate representation of theoretical assump- tions. From a practical perspective, it is especially important to apply multidimensional models to multidimensional data to draw correct inferences concerning test-takers’ trait levels.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publi- cation of this article: This research was supported by the German Federal Ministry of Education and Research and by the European Social Fund of the European Union (grant agreement no.: 01FP0930).

Note

1. For the correlations to be comparable between the unidimensional models and the between-Multi- dimensional Item Response Theory models, all estimations were based on the Partial Credit Model, despite the Generalized Partial Credit Model showing a better fit for four interest types.

References

Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimen- sional perspective. Journal of Educational Measurement, 29, 67-91. doi:10.1111/j.1745-3984.1992.tb00368.x Adams, R. J., Wilson, P., & Wang, W. (1997). The multidimensional random coefficients multinomial logit

model. Applied Psychological Measurement, 21, 1-23. doi:10.1177/0146621697211001

(13)

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & B. F. Csaki (Eds.), Second international symposium on information theory (pp. 267-281).

Budapest, Hungary: Academiai Kiado.

Bergmann, C. (2008). Beratungsorientierte Diagnostik zur Unterstützung der Studienentscheidung studier- williger Maturanten [Counseling-oriented assessment to support study decisions of high-school gradu- ates]. In H. Schuler & B. Hell (Eds.), Studierendenauswahl und Studienentscheidung [Student selection and study decisions] (pp. 67-77). Göttingen, Germany: Hogrefe.

Bergmann, C., & Eder, F. (2005). Allgemeiner Interessen-Struktur-Test mit Umwelt-Struktur-Test (AIST-R/

UST-R) [General Interest-Structure-Test]. Göttingen, Germany: Beltz Test.

Chalmers, R. P. (2012). MIRT: A Multidimensional Item Response Theory package for the R environment.

Journal of Statistical Software, 48, 1-29.

Cheng, Y., Wang, W., & Ho, Y. (2009). Multidimensional Rasch analysis of a psychological test with multi- ple subtests: A statistical solution for the Bandwidth-Fidelity dilemma. Educational and Psychological Measurement, 69, 369-388. doi:10.1177/0013164408323241

Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.

Finch, H. (2010). Multidimensional Item Response Theory parameter estimation with nonsimple structure items. Applied Psychological Measurement, 35, 67-82. doi:10.1177/0146621610367787

Guttman, L. (1954). A new approach to factor analysis: The radex. In P. Lazarsfeld (Ed.), Mathematical thinking in the social sciences (pp. 258-348). New York, NY: Russell & Russell.

Holland, J. L. (1959). A theory of vocational choice. Journal of Counseling Psychology, 6, 35-45.

doi:10.1037/h0040767

Holland, J. L. (1997). Making vocational choices: A theory of vocational personalities and work environ- ments (3rd ed.). Odessa, FL: Psychological Assessment Resources.

Holland, J. L., Fritzsche, B., & Powell, A. (1994). Self-directed search: Technical manual. Odessa, FL:

Psychological Assessment Resources.

IBM. (2012). SPSS Statistics (Version 21) [Computer software]. Chicago, IL: Author.

Jörin, S., Stoll, F., Bergmann, C., & Eder, F. (2003). Explorix: Das Werkzeug zur Berufswahl und Laufbahnplanung [Explorix: A tool for occupational choices and career planning]. Bern, Switzerland: Huber.

Kelderman, H. (1996). Multidimensional Rasch models for partial-credit scoring. Applied Psychological Measurement, 20, 155-168. doi:10.1177/014662169602000205

Kirisci, L., Hsu, T., & Yu, L. (2001). Robustness of item-parameter estimation programs to assump- tions of unidimensionality and normality. Applied Psychological Measurement, 25, 146-162.

doi:10.1177/01466210122031975

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

doi:10.1007/BF02296272

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176. doi:10.1177/014662169201600206

Osteen, P. (2010). An introduction to using Multidimensional Item Response Theory to assess latent factor structures. Journal of the Society for Social Work and Research, 1, 66-82. doi:10.5243/jsswr.2010.6 Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark:

Danmarks Paedagogiske Institute.

R Core Team. (2013). R: A language and environment for statistical computing (Version 3.0.1) [Computer software]. Vienna, Austria: R Foundation for Statistical Computing. Available from http://www.

R-project.org/

Reckase, M. D. (1997). The past and future of Multidimensional Item Response Theory. Applied Psychological Measurement, 21, 25-36. doi:10.1177/0146621697211002

Reckase, M. D. (2009). Multidimensional Item Response Theory. New York, NY: Springer.

Rounds, J., Tracey, T. J., & Hubert, L. (1992). Methods for evaluating vocational interest structural hypoth- eses. Journal of Vocational Behavior, 40, 239-259. doi:10.1016/0001–8791(92)90073-9

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

doi:10.1214/aos/1176344136

Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101. doi:10.2307/1412159

(14)

Walker, C. M., & Beretvas, S. N. (2003). Comparing multidimensional and unidimensional proficiency classifications: Multidimensional IRT as a diagnostic aid. Journal of Educational Measurement, 40, 255-275. doi:10.1111/j.1745-3984.2003.tb01107.x

Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item-response models. Psychological Methods, 9, 116-136. doi:10.1037/

1082-989X.9.1.116

Warm, T. A. (1989). Weighted likelihood estimation of ability in Item Response Theory, Psychometrica, 54, 427-450. doi:10.1007/BF02294627

Wetzel, E., & Hell, B. (2013). Gender-related differential item functioning in vocational interest measurement:

An analysis of the AIST-R. Journal of Individual Differences, 34, 170-183. doi:10.1027/1614-0001/a000112 Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement

Transactions, 8, 370.

Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. (2007). ConQuest (Version 2.0) [Computer soft- ware]. Camberwell, Australia: Australian Council for Educational Research.

Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format Tests. Applied Psychological Measurement, 30, 469-492.

doi:10.1177/0146621605284537

Referenzen

ÄHNLICHE DOKUMENTE

An alternative approach is using integral free representations of logit models under the assumption of posterior normality (or its asymptotic justification) of latent vectors, which

The performance of these optimized interest-rate rules is then evaluated across alternative, potentially competing models with regard to their ability to stabilize inflation and

Vale ressaltar que a existência de uma versão anterior é confirmada por Hicks no próprio artigo de 1937, onde informa que o texto apresentado na Conferência gerou um

Mit dem Berufsbildungsgesetz (BBiG) erhielt die Berufsbildungspolitik des Bundes 1969 eine gesetzliche Grundlage: Nun war die Bundesregierung gesetzlich gefordert, die Berufsbildung

• Trend towards multidimensional view on poverty is not matched within the discourse about middle classes.. Multidimensional Poverty – The Construction of Middle Classes

Table S4: Z-scores for differences between correlation coefficients for EDA-8 and conceptually distinct subscales vs.. Sample size for all bivariate correlations

In this study, items with the weakest psychometric properties will be identi ଏ ed and elimi- nated and a short form of the MCSDS (i.e. MCSD-SF) will be developed based on best

Reachable Values o f the Consumption Indicator and the Pollution Level for Region One.. Reachable Consumption Values of Regions Three