• Keine Ergebnisse gefunden

5.1 Preliminary considerations

5.1.2 Learner corpus research

A suitable approach to investigate learner language as well as cross-linguistic influence in second and third language acquisition is with the help of a learner corpus. In a paper directed towards researchers working in the area of multilingualism Wulff (2017) discusses how learner corpus research (LCR) can be useful in the study of multilingualism. Building up on this, this chapter gives a short overview of learner corpus research in general and its development, and it exemplifies the suitability of the methods employed in learner corpus research for answering the research questions of the current study. Hence, we will include some theoretical points on how LCR can add to foreign language acquisition research and how learner corpora can be used

155 in L2 and L3 acquisition studies. We also explain the details of the current study and which specific methods are used subsequently.

Let us start by defining what a corpus is: a corpus is a collection of digitalized or machine-readable texts that could include spoken or written material; usually the texts or oral recordings are transcribed and stored in individual files (McEnery & Hardie 2012: 1-2; Wulff 2017: 734). A learner corpus is then a specialized corpus that includes a specific genre, namely learner language; i.e. texts or spoken language produced by learners of a (foreign) language.

More specifically, learner corpora can be defined as collections of texts produced in a (near) natural setting by language learners (Granger 2008: 338; Granger et al. 2015: 1). Hence, they stand in direct opposition to general reference corpora, such as the BNC (Davies 2004-) or COCA (Davies 2008-), as examples of English reference corpora. In the following, we will clarify what is meant by “(near) natural setting” and how this relates to the current study.

As Granger (2008: 337) puts it “[a]nalysing learner language is a key component of second and foreign language education research.” It allows us to investigate the development and mechanisms of foreign language acquisition and it is also a valuable resource for language teachers. Especially the former is of relevance for the current investigation. Being able to understand the mechanisms of additional language acquisition, here specifically the differences (or similarities) between second and third language acquisition, is what propels this study.

In the past, research investigating learner language, especially studies belonging to the area of second language acquisition (SLA), mainly relied on controlled experimental data and a limited number of participants (Granger 2017: 2). In such controlled settings, learners were typically asked to produce a very specific target form with usually only one or a limited number of correct possible answers, as in “fill-in-the-blanks exercises” and “reading-aloud tasks”

(Gilquin & Granger 2015: 419). There is a clear advantage for using experimental data, namely the possibility to control several variables (such as the contextual setting of the experiment, the topic, and the language acquisition history of language learners) which then facilitates the analysis of the learner output (Granger 2017: 2). Without any doubt, these structured tasks allow one in most or even all cases to decide whether something is correct, i.e. target-like, or incorrect, i.e. non-target-like, in the respective language. Yet, the usually small number of language learners investigated often gave rise to doubt the representativeness of such a study (Granger et al. 2015: 1).

Driven by the desire to include larger numbers of learners, on the one hand, and to include more naturally produced language, or at least near naturally, on the other hand, a new field emerged in the late 1980s (Granger et al. 2015: 1). Learner corpus research could be

156 classified as an “offshoot of corpus linguistics” (Granger et al. 2015: 1). Corpus linguistics had relied until then on native language varieties (Granger et al. 2015: 1) and the methods employed had proven to significantly contribute to the field of linguistics in general (see for example McEnery & Hardie 2012). In that sense, learner corpus research is the merger or SLA research and corpus linguistics, by extending corpus linguistics to a new subfield and by adding new approaches and methods to SLA research.

Granger et al. (2015: 1) explain that access to large collections of data from second language learners enables researchers to not only conduct small-scale studies based on a limited number of language learners, but it paves the way for producing representative results. In addition, since these data are in an electronic format, they can be easily accessed with computer software and manifold analyses can be performed, much faster and more efficiently than before (Granger et al. 2015: 1). Admittedly, the learner corpus that is used in this study is also fairly small and only includes 42,887 word tokens (see Chapter 5.3). One reason for this, and this will be discussed in more detail further down, is the lack of freely available third language acquisition corpora (Wulff 2017: 751).

In addition, we do not need to rely on experimental settings, but, and this was one of the main goals that were mentioned above, more naturally occurring language use of learners is the focus in LCR. By naturally occurring language or language produced in a natural surrounding we refer to “authentic” language use, i.e. one of the principles in corpus linguistics (Gilquin &

Granger 2015: 419). This means that ideally, a corpus consists of language that was not produced for the sake of corpus compilation but that had a communicative function. However, for learner language, it is not always possible to collect such data, simply because in many contexts, learners who formally acquire a foreign language may never actually use this language outside the classroom setting (Gilquin & Granger 2015: 419). This means that “the criterion of authenticity therefore needs to be relaxed in case of learner corpora” (Gilquin & Granger 2015:

419). Therefore, we largely find learner corpora that contain essays, a typical classroom activity (Granger et al. 2015: 2). This is in several respects a useful text type, because learners engage quite regularly in essay writing, at least after a certain amount of formal training. Also, essays usually contain not just a few words but a larger number of sentences or even paragraphs, which results in enough production data for a quantitative analysis.

Yet, there are also text types that are less naturalistic, or even peripheral, such as picture description tasks (Gilquin & Granger 2015: 419). This is definitely more controlled than essay writing, because not just the topic is given, but also the specific setting cannot be freely chosen by the learners. Moreover, certain vocabulary is triggered or even required. Though, it is still

157 considered to represent near naturalistic language, due to the fact that the students are not forced to use a specific word, as would be the case in a fill-the-blank exercise, but they can use their own words and demonstrate lexical as well as structural variety. There are also some spoken corpora available, but strikingly fewer than written corpora (Gilquin & Granger 2015: 419).

Obviously, the transcription of spoken data is more time consuming than transcribing written data, and to collect a large number of essays is easier and quicker than to record the same number of participants. Also, students may even submit essays already in a text format, which would then eliminate the additional step of transcription.

Another bias found in LCR studies is that they largely focus on advanced learners and that there are only few studies that include less advanced learners or beginners (Gilquin &

Granger 2015: 419). When we link this back to the most commonly used text type, it is understandable why we largely find advanced learners: early foreign language learners have not yet acquired the necessary language skills to write long essays. Hence, more controlled text types may be useful to also include younger and less advanced learners in research. Therefore, as was pointed out before, the current study does not rely on freely written essays, but it uses the more controlled text type of a picture description task (see Chapters 5.2.1 and 5.2.2). On the one hand, we can control the topic and trigger specific vocabulary items and also beginners or intermediate learners of English are already capable of producing some words for these pictures.

With their still limited proficiency levels in English, the younger learners of the study would not be able to write a long argumentative essay. In order to use the same text type throughout the entire corpus, all learners were presented with the same task. On the other hand, however, we are aware that this is somewhat artificial and not a free writing task, but a written task that was designed for the sake of corpus creation. Nevertheless, we are convinced that this peripheral text type still produces interpretable output. What is more, we include both written texts and oral production, based on two different picture stories. Hence, we add an additional dimension and can thus compare written and spoken language use, a comparison that has not been frequently used in LCR studies so far.

Up until now, most learner corpora contain English as a foreign language, though other languages are slowly but steadily increasing (Granger et al. 2015: 2). This is by no means surprising, given the attention that English, as a native language, foreign language, and as a lingua franca, receives around the globe. The current study, as should have become apparent by now, adds to this bias, because we also focus on the acquisition of English as a foreign or additional language and we use a learner corpus that includes learner English. Furthermore, we make use of a cross-sectional corpus, i.e. the corpus includes two sets of learners at two different

158 developmental stages, which is again one of the most common types of corpora. There are fewer corpora available that are made up of longitudinal data, but these are also on the rise (Granger et al. 2015: 2).

The last bias that we mention here is that many corpora that are currently available include second language learners; these are either mono-L1 corpora, which means that all learners share the same native language, or they are multi-L1 corpora, with different L1s represented in one corpus (Gilquin & Granger 2015: 419). Yet, there are only few corpora available that include third language learners (Wulff 2017: 751). The current study combines both types, because the learner corpus that was compiled for this project includes both L2 and L3 learners and it includes a number of different native languages.

So far, we said that learner corpus research is a sub-field of corpus linguistics, and therefore, many methods that are used in LCR are taken from corpus linguistics. In spite of this, there are also a number of approaches that originated from LCR, one of them is called Contrastive Interlanguage Analysis (CIA) (Gilquin & Granger 2015: 425). This is especially important for the current study and we will briefly discuss this method, following Gilquin and Granger (2015) and Granger (2015). Within CIA, we find two approaches, one is a comparison between a learner corpus and a reference corpus that includes native language; the other is a comparison of different interlanguages, i.e. different (foreign language) learner populations (Gilquin & Granger 2015: 425).

The L1 versus L2 comparison is a very popular approach. It is commonly used to identify learner errors of advanced learners; especially the overuse of certain features or constructions, in comparison to native speakers of a language, provide useful indications for non-target-like uses of learners (Granger 2015: 11). Instead of relying on comparisons with reference corpora (such as BNC and COCA for large English reference corpora), a number of researchers also rely on novice writing, i.e. language samples that do not come from academically trained, expert native speakers, but that were produced by younger, novice native speakers, such as students (Granger 2015: 12). The reason to use such corpora, i.e. corpora that include student’s essays like the Louvain Corpus of Native English Essays (LOCNESS)22, is that the language used in these better reflects the text types produced by learners (usually argumentative essays, as explained above) (Granger 2015: 17).

However, to use a native speaker baseline is not uncontroversial (Granger 2015: 11).

Numerous researchers have recently criticized this L1/L2 comparison, especially if we take an

22 Available online at < https://uclouvain.be/fr/node/11973>.

159 idealized native speaker and portray learners as deficient (Granger 2015: 13). As an answer to this criticism, Granger rightfully argues that it is useful to be aware that learner language, or interlanguage, is a phenomenon in its own right and that it should be studied as such; yet, the L1/L2 comparison should not be abandoned but rather “be used to bring to light features of learner language that, once uncovered, can be analyzed from a strictly L2 perspective” (Granger 2015: 14). What is more, she stresses that native speaker performance should not be confused with being the norm, or even the target for native speakers, but they should simply be a reference (Granger 2015: 18). This becomes even more relevant if we take English as the example of foreign or additional language that is investigated. Especially against the background of World Englishes, and the fact that there is clearly not just one variety of English, we realize that it becomes more and more difficult to define a native speaker norm for learners of that language (we come back to this issue in the following Chapter 5.1.3; for a more detailed discussion see Granger 2012; 2015).

The second approach of the Contrastive Interlanguage Analysis is a comparison between different L2 varieties (Granger 2015: 12). The reason for this central premise is to differentiate between features of the L2 variety that are influenced by the L1, and features that do not depend on cross-linguistic influence from the L1 but are general learner difficulties (Granger 2015: 12).

Clearly, the L1 is not the only variable that affects the acquisition of an additional language, and therefore, the addition of further variables within this comparison is necessary (Granger 2015: 12).

For the current study, we will mainly concentrate on the latter type, i.e. the comparison of different interlanguages, or differently put between different sub-corpora that include learner language from various L1s. We also consider a novice native speaker baseline and not a reference corpus based on expert native language, because the learners in the current corpus are not yet advanced and we wanted to have a text type that is comparable with the text type the foreign language learners produced. However, as our main focus, we investigate several L2 and several L3 learners of English to “identify the possible source of certain non-standard features”

(Gilquin & Granger 2015: 425). Hence, with such an error analysis, we aim to find an indication for cross-linguistic influence in the learners that we can trace back to their native language or their two previously acquired languages.

Furthermore, corpora usually include metadata, i.e. non-linguistic information about the text itself (McEnery & Hardie 2012: 29). Metadata could include the author of the text, the gender, the year of publication, and the genre, for instance. The same applies to learner corpora, of course; we also need metadata to substantiate the analysis and to interpret the language

160 production (Gilquin & Granger 2015: 430; Granger 2015: 12). The more informative the metadata, the more informative can be the analysis and the comparison of different learner groups; with the help of statistical measures and approaches, we can determine how this background information affect the language production (Gilquin & Granger 2015: 430).

Possible background information in learner corpus research could be the L1, or further previously acquired languages, age, gender, country of origin, country of current stay, proficiency level, and socio-economic-background, to name just a few.

In summary, with the help of learner corpora, it is possible to understand language acquisition processes. Learner corpus research combines second language acquisition research, corpus linguistics, and also a more applied perspective, i.e. foreign language teaching (Gilquin

& Granger 2015: 428). It is therefore the ideal approach to investigate the current research questions. The reason for all the earlier mentioned biases and limitations that we can currently find in LCR is the still young age of learner corpus research (Gilquin & Granger 2015: 427-429). Therefore, the current study can add to this new research area by investigating the use of tense and aspect in a small learner corpus that includes written and spoken English production data of intermediate second and third language learners.