• Keine Ergebnisse gefunden

As was mentioned before, the data come mainly from the E-LiPS project which is part of the English LiMA Panel Study, carried out at the University of Hamburg. It was conducted from 2009 until 2013 and directed by Peter Siemund and Ingrid Gogolin.25 The following researchers were also involved in the data collection process: Simone Lechner, Sharareh Rahbari, Jessica Terese Mueller, Mark Gerken, and Anika Lloyd-Smith.26 Their help is greatly appreciated. This chapter describes the data collection process of both the written and the oral production data and the process of building the English learner corpus, and it briefly comments on the questionnaires the students had to fill in, in addition to completing the written and oral task in English.

Most of the data were collected between 2009 and 2013 in Germany, Russia, Turkey, and in the UK. Then, in 2016 and 2017, additional data collections were carried out. These additional interviews were conducted in Hamburg (2016) with an English native speaker control group, and in Hanoi, Vietnam (2017), with monolingual Vietnamese learners of English. This was necessary to complete the data set.

25 The financial support of Hamburg’s “Behörde für Wissenschaft und Forschung” is gratefully acknowledged.

26 In addition, the following student assistants were also involved in the data collection and/or transcription process of the handwritten texts and oral recordings: Perihan Akpinar, Sevilay Arabaci, Aybül Babat, Merve Bas, Julia Benz, Alexij Benz, Can Bilici, Phan-Ngoc Binh, Bartu Bosdurmoz, Philip Braun, Eugenia Budnik, Viktoria Diana Bui, Irem Bulut, Ayregul Cokiroglu, Thi Tan Dang, Halil Demir, Jana Endres, Volker Englich, Mark Gerken, Onur Gündüz, The Hung Huynh, Anna Kaiser, Sara Kalitina, Tülay Karakaya, Cham Anh Khoung, Lena Knutz, Shari Knutz, Thieu Lien Kong, Sengül Kotan, Viktoria Kronhard, Cem Kücük, Svenja Lubinski, Tarik Meric, Alexander Michaelis, Mehmet Moderba, Olga Neufeld, Tuyet Mai Nguyen, Thi Phuong Hon Nguyen, Efekan Nodasbas, Begüm Oktay, Akin Özbek, Tansel Öztürk, Dao Ngoc Phuong, Ton Kom Phuong, Tran T Phuong, Süreyya Polat, Martina Ruß, Volka Sacok, Malis Sahmanija, Kathrin Sarudko, Jennifer Schemtschuk, Sophia Spiewok, Inci Toksoy, Maria Tschistjakova, Beyla Urgun, Nadja Victoria, Anna Vinets, Hai-Van Vu, Hoai Nam Vu, Paula Marie Walter, Sophie Wedemeyer, Berfin Yavuz, Mihriban Yavuz, Merve Yücel.

176 Even though there was a considerable number of people involved in the data collection process over a long period of time, all researchers and student assistants strictly adhered to a set of defined rules to assure a uniform data collection process. The exact procedure will be explained in the three sections that follow.

Written Task

The main subject matter of the study is the analysis of texts written by learners of English. In the English LiMA Panel Study (E-LiPS), one of the exercises the participants had to perform was to write a narrative based on a picture story by Erich Ohser, “Gut gemeint…” (English:

“Good intentions”), see Figure 10 (Ohser 2003). For the study, we used a colored version of the story. The participants had a time limit of 30 minutes to complete the task: they were asked to write at least two sentences for each of the six pictures of the story. The students were required to complete the task without additional help. Hence, they were not allowed to use any grammar book or dictionary, and they were not allowed to ask the interviewer or the teacher for vocabulary. If such a question came up, the interviewer did not provide an answer to this question but reassured the participant and motivated him or her to think again and to do the task as best as possible.

During the 30 minutes, the interviewer and the teacher made sure that each participant focused only on their sheet of paper and was not able to either talk to their neighbors or to look at their neighbors’ writings. Some children refused to write or gave up writing early. Those were kindly encouraged to continue and to think again if they may be able to write down a little more. It was always stressed that they should not be afraid of any consequences or bad school grades and that they should write as freely as possible, something that came to their minds in this moment.

The main aim of this task was to elicit natural learner language in a guided setting. This may seem at first impossible, especially when keeping in mind what was explained to be premises for learner corpus research and the definition of naturally occurring language (see Chapter 5.1.2). However, the advantage of such a directed writing task (and as we will see later, this is also true for the speaking task) is that all participants have, to a certain extent, the same activity setting (see Coughlan & Duff 1994 for a critical look at learner tasks and replicability).

What is more, by selecting a specific set of pictures, the topic and the potential vocabulary can be manipulated, and the specific context of the writing task is known to the researcher, which facilitates a comparison across different learners (Bardovi-Harlig 2000: 199). Hence, the

177 availability of the task and the exact pictures provides useful guidance for the analysis of the written texts. Therefore, we will be able to compare the language production of the different groups with this peripheral text type.27

In addition, picture descriptions or writing short stories are activities that secondary-school students are familiar with, because such tasks are introduced in the English classroom early on (see for instance Seidl 2006 as one example of an English workbook, school year 5). Using

27 Peripheral text type refers to the premises of learner corpus research to use production data from a naturalistic language production context, see again Chapter. 5.1.2.

Figure 10: "Gut gemeint..." by Erich Ohser.

178 picture stories to elicit written (and also spoken) language has proven useful and effective for analyzing a number of linguistic features (Pallotti 2010: 171). Yet, Pallotti (2010: 171) remarks that the analysis of tense and aspect may prove difficult, because using either simple present or simple past would be acceptable and that with such data one can only analyze “the forms that are used, not those that are missing”.28 Nevertheless, this method of using a picture sequence to elicit written production data seems suitable for comparing learner language.

Furthermore, we are convinced that certain vocabulary items or grammatical structures are triggered because of the story that is portrayed in the pictures. However, we are aware of the fact that even if participants are presented with one and the same task, the results need not necessarily be the same. As Coughlan and Duff (1994: 185) explain, “the basic task can be conceptualized differently by different people.” Having said this, we have to interpret the results carefully, because every task or activity is always part of a specific sociocultural setting and this context affects the task fulfillment and the outcome (Coughlan & Duff 1994: 190).

Oral Task

Some of the children did not only participate in the written task but were also presented a second picture sequence that they should retell orally (Figure 11). This picture story was created by Simone Lechner (2013), based on Gagarina et al. (2012) as part of the LiMA project. The oral task was conducted after the written task. This way the participants had already met the interviewer and were already familiar with him or her and they were familiar with participating in such a study. This was especially crucial for this oral task, because a writing assignment is something the students are already familiar with, because they do similar tasks in their foreign language classes, too. Yet, being recorded while saying something in a foreign language is much more intimidating and, in order to familiarize the students as much as possible, this task was presented last.

The assignment was as follows: Please tell me what you can see happening in the pictures! Before the actual recording, the student was given some minutes to have a closer look at the pictures and to think about what he or she could say about these pictures. When the participant was ready, the oral production was recorded. Again, like the written task, the interviewer was not allowed to answer any questions related to vocabulary or grammar. Here, however, we must acknowledge that the context and especially the presence of the interviewer

28 We come back to this issue in Chapter 5.3, where we discuss the annotation of the learner corpus data.

179 clearly interferes with the performance of the students. Many different interviewers were involved in the data collection process and small differences, such as smiling or encouragingly nodding, be it consciously or unconsciously, potentially influences the participants (see again Coughlan & Duff 1994). This is a variable that we cannot control for in this study.

The comparison with the written texts should allow to get detailed information in how far writing and speaking differs for each student and, on a more general level, for each language group.

Figure 11: Fox and Chicken by Simone Lechner

Questionnaire

In addition to describing the two picture stories, the children had to fill in two questionnaires.

One was about personal information such as age, native language(s), foreign language(s), years of studying English, profession of mother and father, etc. The other was about their attitudes towards English and situations in which English is used in their daily lives. This background and demographic information is relevant for the analysis and the comparison of the different groups.

For this task, and this is a difference to the other two tasks, the students were allowed to ask content questions and to ask for vocabulary. This was frequently done, for instance for the question about the profession of their parents, the students often ask for help. Here again, if

180 students refused to fill in the questionnaires, or if it seemed as if they had not filled it in completely, they were gently encouraged to have another look at it and to try their best to help with the study. The handwritten answers in the questionnaires were later copied into an Excel spreadsheet.

The following background variables and questions, taken from the questionnaires the participants had to fill in, were selected for this study. They are discussed and explained in more detail in Chapter 6.1:

a. Age of onset of learning German b. School type

c. School grades in German and English

d. Socio-economic status of mother, father, and highest socio-economic status value (HISEI) per family

e. Number of books per household

f. Language use at home: language use of parents with each other; language use of participants with mother, father, and sibling(s)

g. Which statement would you agree with?

English is a beautiful language. (Yes/No) English is a useful language. (Yes/No)

In addition, the interviewer filled in a form for each participant containing the following information:

a. Age b. Gender

c. Language Group

Ideally, every participant would have filled in the necessary information. Unfortunately, this idealized situation was not met. As we will later see (in Chapter 6.1), there is a lot of information missing due to non-response. These nonresponses center around certain participants, and especially specific groups of learners, which can probably be traced back to the data collection.

Nevertheless, we decided to keep the entire data set, even if that means that a number of background variables are missing, which can then not be used to explain and support the written and oral data. There were too many nonresponses of too many participants missing; therefore, we could not use data imputation methods to fill the blank spaces (see for example Rubin 2004).

181 5.3 Transcription and manual annotation

Transcription

This chapter describes the transcription process of the handwritten texts and the oral recordings of the learners of English and it explains the manual annotation process of the learner corpus.

It was necessary to transcribe the handwritten texts and the oral recordings in order to create a machine-readable learner corpus that can be accessed with concordance programs such as AntConc (Anthony 2016).

For both the written and the oral transcriptions, a text editor was used that did not have the function of automatically correcting spelling mistakes. This was crucial, especially for the written texts, because one of the most important points when creating a learner corpus is to copy the learner’s writings as exactly as possible and to include all (spelling) errors. In addition, in the written texts, we paid special attention to capitalization and punctuation.29 The structures of the texts were kept, i.e. if a student started a new paragraph, this was copied in the text document. Furthermore, some students did not write a coherent story but rather wrote one or more sentences for each picture and started each picture with the corresponding number. These numbers were included in the learner corpus. We made sure that the texts were copied as exactly as possible; however, if a student crossed out a mistake in his or her writing, this was not marked in the corpus. Hence, a sentence such as example (1) appears as sentence (2) in the E-LiPS learner corpus.

(1) A mann catcht a fisȼh off the water.

(2) A mann catcht a fish off the water.

If a word or individual letters were illegible, the @-symbol was used; the number of @’s within a word represent the number of letters that were unreadable, and a total of four @@@@-symbols demonstrates that the entire word was not decipherable.

The procedure of the oral data description was slightly different. All grammatical and lexical mistakes were transcribed; yet, we did not pay attention to pronunciation. Hence, a non-target-like pronounced {th}, as in this, was still written down as this and not as dis. Short pauses up to two seconds within the recordings were marked with squared brackets, i.e. […]. If the pause was longer than two seconds, the approximate duration was included within the squared

29 This was not of importance for the current study; yet, it may be relevant for future studies.