• Keine Ergebnisse gefunden

Regularization and word frequency

4 Chapter Four: Data Analysis

4.3 The synchronic analysis of regularization

4.3.1 Regularization and word frequency

In this section, I explore whether in Contemporary English regularization processes take place in the synchronic snapshot and whether there is a tendency for verbs to be regularized, with resistance to regularization coming as a result of high word frequency. For this purpose, I select the same verbs of the sample used in the question 1 mentioned in 3.2: 500 IVs with their word frequencies in the past and perfect forms (see appendix 2). Then, word frequencies of RFs are collected (see appendix 4 and see chapter 3 for more details). Table 17 below illustrates a general overview of word frequency distributions of IVs and RFs split by form and frequency. Moreover, relative frequencies of RFs are computed, as the number of word frequencies of RFs depends on the size of the selected sample.

Table 17: Frequency distributions of IVs and RFs in the selected sample Type/Form High frequency verbs Low frequency verbs Total

Word

IVs 14,728,139 55,439 14,783,578

RFs 64,260 0.43 % 81,670 60 % 145,930 0.98 %

IVs / past 10,778,120 14,758 10,792,878

RFs/ past 36,348 0.34 % 46,841 76 % 83,189 0.76 %

IVs / perfect 3,950,019 40,681 3,990,700

RFs/ perfect 27,912 0.70 % 34,829 46 % 62,741 1.55 % The overall picture shows that regularization processes take place in Current English. Of the total word frequency of IVs in our sample (14,783,578), the instances of RFs are 145,930.

This amount consists of 0.98 % of word frequencies of IVs in the selected sample.

Focusing on frequency, the total word frequency of RFs with low frequency (81,670) is larger than the one with high frequency (64,260). The regularization rate in the low frequency group (60%) is high compared to that in the high one (only 0.43 %). Similarly, in both forms, word frequencies of RFs with low frequency are larger than RFs with high frequency (low: 76 % for the past and 46 % for the perfect versus high: 0.34 % for the past and 0.70 % for the perfect).

The differences in frequency distributions of RFs with high and low frequency may suggest a

relationship between regularization processes and word frequency: high-frequency IVs may be more resistant to regularization processes than low-frequency IVs in the selected sample.

In figure 15, I illustrate the distributions of relative frequencies of RFs split by form and frequency in boxplots. The boxplots of RFs with high frequency are red and the boxplots of RFs with low frequency are blue. I apply a logarithmic transformation to reduce the skewness of the frequency distribution. As we can see in both forms, relative frequencies of RFs with low frequency have larger variability than those with high frequency both in terms of the interquartile range (IQR) (low: around 0.6 for both forms versus high: around 0.2 for both forms) and in terms of range (low: around 1.0 for both forms versus high: around 0.4 for both forms).

Moreover, the boxplots show that frequency distributions of RFs with high and low frequency in both forms are right skewed. The distributions of RFs with high frequency display some high values that lay above 0.5 of relative frequencies of RFs. From this analysis, it appears that the frequency effect on the data distribution of RFs with high and low frequency in both forms may be different. So again, there is a suggestion that less frequent IVs are regularized more quickly than more frequent IVs in our sample.

Figure 15: Box plots of relative frequencies of RFs split by form and frequency

Nevertheless, in the analysis of relative frequency, there are a lot of the verbs that are never regularized. Accordingly, many data points with zero value are displayed. The high number of zero points presented in frequency distribution can remarkably affect the shape of this data distribution. In addition, this non-normally distributed data violates one of the assumptions of a linear model12. In figure 16, after the removal of zero points, relative frequencies of RFs with low frequency (IQR= 0.6 and range= 1.0) still display more variability than those with high frequency (IQR= 0.3 and range= 0.5). In addition, frequency distributions of RFs with high and low frequency in both forms have different centers (low: around 0.7 versus high: around 0.2).

Hence again, this tells us that frequency distributions of RFs with high and low frequency in both forms are probably different: IVs with low frequency may be regularized more often than IVs with high frequency in the past and perfect forms.

12 In a linear model, it is supposed that variables have normal distributions. Non-normally distributed variables may falsify relationships and significance tests (For more details check:

http://pareonline.net/getvn.asp?n=2&v=8).

Figure 16: Box plots of relative frequencies of RFs (without zeros) split by form and frequency To get statistical results that are not affected by skewness of the data distribution, I conducted the following analysis of relative frequencies of RFs after removing zero points. I conducted a statistical model to investigate the effects of form and frequency on relative frequencies of the verbs in the selected sample. A linear mixed model was adopted, where relative frequency was considered as a dependent variable and the factors: form (with two levels: past and perfect) and frequency (with two levels: high and low) were included as fixed factors. A logarithmic transformation was applied to the data to remove most of the skewness of frequency distribution. The linear model reveals that the main effect of frequency (β = 0.16, t = 4.97, p = 1.06e-06) is highly significant. Instead, the main effect of form (β = 0.02, t = 1.30, p = 0.20) and the effect of interaction between them (β = -0.02, t = -0.67, p = 0.50) are not significant. These findings confirm that the differences between frequency distributions of RFs with high and low frequency are statistically significant. However, the differences between frequency distributions of RFs in the past and the perfect are not significant.

Summing up, on the basis of the information collected in this synchronic study, I conclude that there is a relationship between word frequency and regularization processes in our sample.

Hence, IVs with low frequency are generally regularized more often than IVs with high frequency in Contemporary English. I also conclude that there is no clear relationship between word frequency and regularization in the past and perfect forms. These findings are much in line with the predictions of the dual mechanism approach in which IVs with low frequency are predicted to be more prone to regularization processes. This follows from the hypothesis that word frequency reinforces the memory representations of IVs and accordingly makes them easier to be accessed and less to be regularized.

In the next section, I will explore the salience of vowel change that is involved in the transfer from IVs to RFs as a factor that may have an influence on the retention of regularization processes.