Ordinal logistic regression - Generalized linear models

6.3 Generalized linear models

6.3.2 Ordinal logistic regression

Logistic regression is appropriate for dichotomous response variables. ORDINAL REGRES -SIONis appropriate for dependent variables that are factors with ordered levels. For a fac-tor such as gender in German, the facfac-tor levels ’masculine’, ’feminine’ and ’neuter’ are not intrinsically ordered. In contrast, vowel length in Estonian has the ordered levels ’short’,

’long’ and ’extra long’. Regression models for suchORDERED FACTORSare available. The technique that we introduce here,ORDINAL LOGISTIC REGRESSION, is a generalization of the logistic regression technique.

227

DRAFT

WrittenFrequency

Pr(regular)

2 4 6 8 10 12

0.00.40.8

FamilySize

Pr(regular)

0 1 2 3 4 5 6

0.00.40.8

NcountStem

Pr(regular)

0 5 10 15 20 25

0.00.40.8

InflectionalEntropy

Pr(regular)

0.5 1.5 2.5

0.00.40.8

Auxiliary

Pr(regular)

hebben zijn zijnheb

0.00.4

0.8 −

−

Valency

Pr(regular)

0 5 10 15 20

0.00.40.8

NVratio

Pr(regular)

−10 −5 0

0.00.40.8

WrittenSpokenRatio

Pr(regular)

−4 −2 0 1 2 3

0.00.40.8

Figure 6.11: Partial effects of the predictors for the log odds ratio of a Dutch simplex verb from the native (Germanic) stratum being regular.

228

DRAFT

As an example, we consider the data set studied by Tabak et al. [2005]. The model predicting regularity for Dutch verbs developed in the preceding section showed that the likelihood of regularity decreased with increasing valency. An increase in valency (here, the number of different subcategorization frames in which a verb can be used) is closely related to an increase in the verb’s number of meanings.

Irregular verbs are generally described as the older verbs of the language. Hence, it could be that they have more meanings and a greater valency because they have had a longer period of time in which they could spawn new meanings and uses. Irregular verbs also tend to be more frequent than irregular verbs, and it reasonable to assume that this high frequency protects irregular verbs through time against regularization.

In order to test these lines of reasoning, we need some measure of the age of a verb.

A rough indication of this age is the kind of cognates a Dutch verb has in other Indo-European languages. On the basis of an etymological dictionary, Tabak et al. [2005]

established whether a verb appears only in Dutch, in Dutch and German, in Dutch, German and other West-Germanic languages, in any Germanic language, or in Indo-European. This classification according to etymological age is available in the column labeledEtymAgein the data setetymology.

> colnames(etymology)

[1] "Verb" "WrittenFrequency" "NcountStem"

[4] "MeanBigramFrequency" "InflectionalEntropy" "Auxiliary"

[7] "Regularity" "LengthInLetters" "Denominative"

[10] "FamilySize" "EtymAge" "Valency"

[13] "NVratio" "WrittenSpokenRatio"

When a data frame is read intoR, the levels of any factor are assumed to be unordered by default. In order to makeEtymAgeinto anORDERED FACTORwith the levels in the appropriate order, we use the functionordered():

> etymology$EtymAge = ordered(etymology$EtymAge, levels = c("Dutch", + "DutchGerman", "WestGermanic", "Germanic", "IndoEuropean")) When we inspect the factor,

> etymology$EtymAge ...

[276] WestGermanic Germanic IndoEuropean Germanic Germanic [281] Germanic WestGermanic Germanic Germanic DutchGerman Levels: Dutch < DutchGerman < WestGermanic < Germanic < IndoEuropean we see that the ordering relation between its levels is now made explicit. We leave it as an exercise to the reader to verify that etymological age is a predictor for whether a verb is regular or irregular over and above the predictors studied in the preceding section. Here, we study whether etymological age itself can be predicted from frequency, regularity, family size, etc. We create a data distribution object, set the appropriate variable to point to this object,

229

DRAFT

> etymology.dd = datadist(etymology)

> options(datadist = "etymology.dd")

and fit a logistic regression model to the data withlrm().

> etymology.lrm = lrm(EtymAge ˜ WrittenFrequency + NcountStem + + MeanBigramFrequency + InflectionalEntropy + Auxiliary +

+ Regularity + LengthInLetters + Denominative + FamilySize + Valency + + NVratio + WrittenSpokenRatio, data = etymology, x = T, y = T)

> anova(etymology.lrm)

Wald Statistics Response: EtymAge

Factor Chi-Square d.f. P

WrittenFrequency 0.45 1 0.5038

NcountStem 3.89 1 0.0487

MeanBigramFrequency 1.89 1 0.1687 InflectionalEntropy 0.94 1 0.3313

Auxiliary 0.38 2 0.8281

Regularity 14.86 1 0.0001

LengthInLetters 0.30 1 0.5827

Denominative 8.84 1 0.0029

FamilySize 0.42 1 0.5191

Valency 0.26 1 0.6080

NVratio 0.07 1 0.7894

WrittenSpokenRatio 0.18 1 0.6674

TOTAL 35.83 13 0.0006

The anova table suggests three significant predictors, Regularity, as expected, the neighborhood density of the stem (NcountStem), and whether the verb is denominative (Denominative). We simplify the model, and inspect the summary.

> etymology.lrmA = lrm(EtymAge ˜ NcountStem + Regularity + Denominative, + data = etymology, x = T, y = T)

> etymology.lrmA

Frequencies of Responses

Dutch DutchGerman WestGermanic Germanic IndoEuropean

8 28 43 173 33

Obs Max Deriv Model L.R. d.f. P C

285 2e-08 30.92 3 0 0.661

Dxy Gamma Tau-a R2 Brier

0.322 0.329 0.189 0.114 0.026

Coef S.E. Wald Z P y>=DutchGerman 4.96248 0.59257 8.37 0.0000 y>=WestGermanic 3.30193 0.50042 6.60 0.0000 y>=Germanic 2.26171 0.47939 4.72 0.0000

230

DRAFT

y>=IndoEuropean -0.99827 0.45704 -2.18 0.0289 NcountStem 0.07038 0.02014 3.49 0.0005 Regularity=regular -1.03409 0.25123 -4.12 0.0000 Denominative=N -1.48182 0.43657 -3.39 0.0007

The summary lists the frequencies with which the different levels of our ordered factor for etymological age are attested, followed by the usual measures for gauging the predictivity of the model. The values ofC,Dxy, andR²Nare all low, so we have to be careful when drawing conclusions.

The first four lines of the table of coefficients are new, and specific to ordinal logistic regression. These four lines represent four intercepts. The first intercept is for a normal binary logistic model that contrasts data points withDutchas etymological age with all other data points, for which the etymological age (represented byyin the summary) is greater or equal thanDutchGerman. For this standard binary model, the probability of greater age increases with neighborhood density, it is smaller for regular verbs, and also smaller for denominative verbs. The second intercept represents a second binary split, now betweenDutchandDutchGermanon the one hand, andWestGermanic, GermanicandIndoEuropeanon the other. Again, the coefficients for the three predic-tors show how the probability of having a greater etymological age has to be adjusted for neighborhood density, regularity, and whether the verb is denominative. The remain-ing two intercepts work in the same way, each shift the criterion for ’young’ versus ’old’

further towards the greatest age level.

There are two things to note here. First, the four intercepts are steadily decreas-ing. This simply reflects the distribution of successes (old etymological age) and fail-ures (young etymological age) as we shift our cutoff point for old versus young further towardsIndoEuropean. To see this, we first count the data points classified as ’old’

versus ’young’.

> tab = xtabs(˜etymology$EtymAge)

> tab

etymology$EtymAge

Dutch DutchGerman WestGermanic Germanic IndoEuropean

8 28 43 173 33

> sum(tab) [1] 285

For the cutoff point betweenDutchand DutchGerman, we have285−8 = 277old observations (successes) and8young observations (failures), and hence a log odds ratio of3.54. The following code loops through the different cutoff points and lists the counts of old and young observations, and the corresponding log odds ratio.

> for (i in 0:3) {

+ cat(sum(tab[(2 + i) : 5]), sum(tab[1 : (1 + i)]),

+ log(sum(tab[(2 + i) : 5]) / sum(tab[1 : (i + 1)])), "\n") + }

277 8 3.544576

231

DRAFT

249 36 1.933934 206 79 0.9584283 33 252 -2.032922

We see the same downwards progression in the logits as in the table of intercepts. The numbers are not the same, as our logits do not take into account any of the other predic-tors in the model. In other words, the progression of intercepts is by itself not of interest, just as the intercept in least squares regression or standard logistic regression is generally not of interest by itself.

The second thing to note is thatlrm()assumes that the effects of our predictors, NcountStem,RegularityandDenominative, are the same, irrespective of the cutoff point for etymological age. In other words, these predictors are taken to have the same proportional effect across all levels of our ordered factor. Hence, this kind of model is referred to as aPROPORTIONAL ODDS MODEL. The assumption of proportionality should be checked. One way of doing so is to plot, for each cutoff point, the mean of the par-tial binary residuals together with their95% confidence intervals. If the proportionality assumption holds, these means should be close to zero. As can be seen in the first three panels of Figure 6.12, the proportionality assumption is not violated for our data. The means are very close to zero in all cases. The last panel takes a closer look at our continu-ous predictor,NcountStem. For each successive factor level, two points are plotted. The circles connected by the solid line show the means as actually observed, the dashed line shows what these means should be if the proportionality assumption would be satisfied perfectly. There is a slight discrepancy for the first level,Dutch, for which we also have the lowest number of observations. But since the two lines are otherwise quite similar, we conclude that a proportional odds model is justified. The diagnostic plots shown in Figure 6.12 were produced with two functions from theDesignpackage,resid()and plot.xmean.ordinaly. as follows.

> par(mfrow = c(2, 2))

> resid(etymology.lrmA, ’score.binary’, pl = T)

> plot.xmean.ordinaly(EtymAge ˜ NcountStem, data = etymology)

> par(mfrow = c(1, 1))

Boostrap validation calls attention to changes in slope and intercept,

> validate(etymology.lrmA, bw=T, B=200)

1 2 3

2 7 191

index.orig training test optimism index.corrected Dxy 0.3222059 0.3314785 0.31487666 0.01660182 0.30560403 R2 0.1138586 0.1227111 0.10597692 0.01673422 0.09712436 Intercept 0.0000000 0.0000000 0.04821578 -0.04821578 0.04821578 Slope 1.0000000 1.0000000 0.95519326 0.04480674 0.95519326 Emax 0.0000000 0.0000000 0.01871305 0.01871305 0.01871305 D 0.1049774 0.1147009 0.09714786 0.01755301 0.08742437 but the optimism is fairly small, and a pentrace recommends a penalty of zero,

232

DRAFT

EtymAge NcountStem −0.4−0.20.00.20.4

DutchGerman WestGermanic Germanic IndoEuropean

EtymAge

NcountStem

EtymAge Regularity=regular −0.020.000.020.04

DutchGerman WestGermanic Germanic IndoEuropean

EtymAge

Regularity=regular

EtymAge Denominative=N −0.04−0.020.000.020.04

DutchGerman WestGermanic Germanic IndoEuropean

EtymAge

Denominative=N

EtymAge

Mean of NcountStem

Dutch DutchGerman Germanic IndoEuropean

7891011

n=285

Figure 6.12: Diagnostics for the proportionality assumption for the ordinal logististic regression model for etymological age. The lower right panel compares observed (ob-served) and expected (given proportionality, dashed) mean neighborhood density for each level of etymological age, the remaining panels plot for each predictor the distri-bution of residuals for each cutoff point.

233

DRAFT

> pentrace(etym.lrmA, seq(0, 0.8, by=0.05)) Best penalty:

penalty df 0 3

so we accept etymology.lrmAas our final model, and plot the partial effects (Fig-ure 6.13).

> plot(etymology.lrmA, fun = plogis, ylim = c(0.8, 1))

We conclude that the neighborhood density of the stem is a predictor for the age of a verb.

Words with a higher neighborhood density are phonologically more regular, and easier to articulate. Apparently, phonological regularity and ease of articulation contribute to a verb’s continued existence through time, in addition to morphological regularity. It is remarkable that frequency is not predictive at all.

Im Dokument A practical introduction to statistics (Seite 119-122)