• Keine Ergebnisse gefunden

3.4 The Experiment

3.4.2 Method

3.4.2.1 Subjects

57 subjects, native speakers of British English, were recruited by postings to mailing lists, fliers, and personal invitation. All subjects were students from King’s College London and Queen Mary, University of London. Participation was voluntary and paid.

The data of nine subjects were excluded for various reasons: Two of the informants were not born in England but in the U.S. and Wales, respectively, raising doubts about the interference of other varieties of English. One subject was bilingual. Three informants did not fill out the questionnaire completely. The data of another three were excluded because they did not apply the open-ended thermometer scale but set up their own scale with fixed maximum and minimum scores. This left 48 subjects for evaluation, 22 female and 26 male, aged between 18 and 30 years.

3.4.2.2 Materials and Design

In order to gain introspection data that can be considered valid and reliable, the experiment design was carefully constructed following the design of psycholinguistic experiments as described in Cowart (1997) and Schütze (1996) and exemplified in Hoffmann (2007). It is well-known that an informant’s judgment of an individual sentence may be affected by many different syntactic, semantic, pragmatic, and even extralinguistic factors (see Section 3.2). In order to control for these influences and minimize their confounding effects, the extraneous systematic factors are distributed uniformly across the experimental conditions by using a counterbalanced factorial design. This ensures that any observed difference between any two sentence types can in fact be attributed to the intended factor.

Following these conditions, a factorial design was employed that crossed the factors definiteness of antecedent NP (DEFINITENESS), verb class (VERB CLASS), and grammati-cal function (GRAMMATICAL FUNCTION). The factor DEFINITENESS included two levels, [+ definite] (using the definite article the) and [- definite] (using the indefinite article a). The factorVERB CLASS had two levels, [+ appearance] (for verbs of appearance) and [- appear-ance] (for non-appearance verbs). For each level, three different verbs (appear, enter, arrive for [+ appearance], and faint, stumble, and smile for [- appearance]) were shown in order

8Obviously, the NP the boy is not unambiguously the direct object of saw, but could also be the subject of arrive. The reason why I used this particular construction is given in Section 3.4.2

CHAPTER 3. A PSYCHOLINGUISTIC EXPERIMENT 54 to exclude the possibility that the results might be confounded by verb preference. The fac-tor GRAMMATICAL FUNCTION comprised two levels, “subject” (i.e., the antecedent NP is the subject) and “non-typical direct object” (i.e., the antecedent NP is a non-typical direct object).

I use the term “non-typical direct object” to emphasize that the antecedent NP in the experimental sentences was indeed not a typical direct object. Sentences with a typical direct object, as, e.g., I called a man yesterday who was wearing a hat, were not used in this experiment since they could have incurred other confounding factors such as distance between the relative clause and its antecedent, and the length and meaning of the sentence.

Moreover, it was decided not to use canonical transitive verbs since the factorVERB CLASS

included the level “verb of appearance”, and verbs of appearance are usually intransitive.

For these reasons, the accusative and infinitive structure [I + saw + NPacc + Vinf + RC], as in, e.g., I saw a man arrive who was wearing a hat, was used instead, even though this structure is controversial in that the status of the NP is ambiguous: it can either be analyzed as the direct object of saw in the main clause, or it may be the subject of the infinitival verb. However, for the above-mentioned reasons this structure was considered to exert fewer confounding influences on this kind of experiment than a structure with an ordinary transitive verb.

The experimental conditions are exemplified in (136). The conditions are shown for one verb of appearance (arrive) and one non-appearance verb (faint).

(136) a. A girl arrived who was hugging a doll. [-def,SUBJ, +app]

b. The girl arrived who was hugging a doll. [+def,SUBJ, +app]

c. I saw a girl arrive who was hugging a doll. [-def, DO, +app]

d. I saw the girl arrive who was hugging a doll. [+def, DO, +app]

e. A girl fainted who was hugging a doll. [-def,SUBJ, -app]

f. The girl fainted who was hugging a doll. [+def,SUBJ, -app]

g. I saw a girl faint who was hugging a doll. [-def,DO, -app]

h. I saw the girl faint who was hugging a doll. [+def, DO, -app]

Thus, each token set contained a total ofDEFINITENESS X GRAMMATICAL FUNCTION X VERB CLASS(each shown in three different lexicalizations) = 2 x 2 x (2 x 3) = 24 cells. Eight lexicalizations were constructed, which were adapted to the various syntactic conditions, yielding a total of 192 stimuli. The stimulus set was then divided into eight material sets of 24 stimuli by placing the items in a Latin square (see Keller (2000, 60n6)). Each subject thus saw a version of a material set that contained 24 experimental sentences in which each condition was represented once and each lexicalization appeared three times.

In order to minimize other confounding factors and balance the acceptability of rela-tive clause extraposition more closely, in the experimental materials both subjects and non-typical direct objects were animate, the relative pronoun was always who, the structure of

the relative clause was kept constant (who + was + Xing + a + Y), and the length, meaning, plausibility, imagery content, and prosody of the words and sentences were matched.

Since the experiment was conducted employing questionnaires, eight different sets of questionnaires were constructed, each comprising one of the eight material sets. For each set of questionnaires, the experimental materials were arranged in two different random orders, so that there were 16 different versions of questionnaires all together, each of which was seen by three informants. In addition to the experimental items, 24 filler items were used that covered a wide range of acceptability and were randomly mixed among the experimental sentences in each of the 16 questionnaire versions. Each subject thus saw a total of 48 items:

24 experimental and 24 filler sentences.

3.4.2.3 Procedure

In this experiment, introspective judgments of naïve informants were gathered using the method of thermometer judgments (Featherston, 2007), which is a variant of magnitude es-timation (Bard et al., 1996). It allows informants to use a linear scale to produce relative numerical judgments. The scale is open-ended, i.e., it has no maximum or minimum scores.

Two reference items fix the location and the amplitude of the scale, one set at twenty and the other at thirty so that informants do not have to give judgments near zero, where distortion has been shown to occur (Featherston, 2007, 76).9

Each subject took part in an experimental session that lasted approximately 20 minutes.

The experiment was conducted using printed questionnaires that were personally distributed to the informants by the experimenter. The questionnaire consisted of five parts: an in-struction part, a short demographic questionnaire, a training session (to familiarize subjects with the concept of thermometer judgments), a practice session (to familiarize subjects with applying thermometer judgments to linguistic stimuli), and the main experimental phase.

The experiment proceeded as follows: The subjects first saw a set of instructions ex-plaining the nature of the experiment and outlining the procedure. It was emphasized that the informants could not give any right or wrong answers, and that they were not being evaluated but were being used as a source of scientific information. The instructions famil-iarized the subjects with the concept of giving numerical judgments relative to two reference items. It was explained that in the first training session, the reference items consisted of two horizontal lines with the length of twenty and thirty “units”.10 Subjects were instructed to estimate the length of each further line relative to these two reference lines and assign it a number that would express how long the line was relative to the two reference lines. Three example lines and corresponding numerical values were provided to illustrate the concept of

9The similarity of this scale to the temperature scale, with the two reference items of freezing point and boiling point, has led to the name “thermometer judgments” (Featherston, 2007).

10The use of a “real” measuring unit of length, e.g., inches, was omitted in order not to mislead subjects into trying to estimate the real lengths of the lines. Instead, subjects were to be familiarized with the method of providing relative judgments.

CHAPTER 3. A PSYCHOLINGUISTIC EXPERIMENT 56 relativity.

For the practice session and the main experiment, subjects were told to use numbers to assess some English sentences relative to two reference items. The criterion they were to use to judge the sentences was how “natural” they sound. The reference items were two sentences that were considered to be worth 20 and 30 units, respectively. As an illustration, three example sentences were provided together with numerical estimates that could possibly be assigned to them depending on how natural one feels each sentence sounds relative to the two reference sentences.

It was stressed that any numbers, including decimals, could be given and that there was no upper or lower limit to the numbers, i.e., numbers above thirty and below twenty could be used, too. It was pointed out that the sentences were not always right or wrong, but that the different “in-between” judgments were of interest. Since the experiment was self-paced, subjects were asked not to think too long about any one sentence (spending less than 10 seconds on each one) but to provide their spontaneous (“gut”) feeling. They were to imagine somebody saying the sentences to them, since the object of interest was the spoken language rather than the written form. Subjects were instructed not to go back if they made mistakes, but to keep on with the experiment.

Following the instructions was a short demographic questionnaire including age, sex, nationality, place of birth (city, county, country), regional dialect, job or subject studied, and languages spoken. Subjects could optionally provide their names or remain anonymous.

The first training session consisted of judging line lengths. Since instructions were given in detail in the instruction part, the subjects were only reminded to judge the lengths of the lines on the following two pages relative to the two reference lines on top of each page. Four lines of different lengths were to be judged on each page, and the numerical estimation had to be written in a space below each line. All subjects saw the same lines in the same order, and the pair of reference lines was the same on each page and for all subjects.

For the practice session, subjects were reminded to judge the naturalness of the follow-ing sentences relative to two reference sentences. It was recalled that any whole numbers or decimals, also those above thirty and below twenty, could be used. The two reference sentences and their assigned units (twenty and thirty) were given at the top of the following page. The reference items were the same as those provided in the instructions because it was considered easier to deal with familiar sentences and in order to avoid confusion. Eight practice sentences were presented on the same page. Subjects’ personal judgments had to be written as numerical values (“units”) below each sentence. Each subject judged the whole set of practice items, which were presented in the same order on each questionnaire. The practice materials were carefully chosen to accustom the subjects on the one hand to the subsequent experimental sentences and the method to be used, but on the other hand to avoid confrontation with some of the less grammatical sentence types at this early stage of the experiment. The practice set contained sentences with relative clauses in canonical and

ex-traposed position as well as items with unrelated constructions. It aimed to cover the full range of acceptability.

Only after this practice phase did the main part of the experiment begin, in which the judgments for the relevant structures were elicited. The procedure in this phase was the same as in the practice session. As a reminder, subjects were given instructions to judge the naturalness of the sentences relative to the two reference items, and to use any whole numbers or decimals. The following six pages showed the same two reference sentences as before on the top of each page, and eight further sentences per page. Thus, subjects saw 48 test items, which consisted of 24 experimental items and 24 fillers.