• Keine Ergebnisse gefunden

2. Corpus and methodology

2.1 The corpus

2.1.1 Research on historical texts

A first survey of the literature reveals various issues that are addressed in this context. Some of them being unanimously accepted, others seem to give rise to controversies. With respect to the former, it is general consent that the sufficient amount of data, i.e. according to most authors a

“certain length of the text” (Zimmermann 2014: 27), is a condition sine qua non for research on the history of a language is. Concerning the latter, one might take as an example what can

be called the discussion of “accessibility” or “material state” of the text. This refers to the need for an edited text versus the preference of a non-edited version respectively the need to work on manuscripts rather than editions. The term “need” anticipates a crucial – and far from trivial – point of the whole discussion that has to be kept in mind: there is not (yet) a canonical way to handle (historical) corpora because the design of a corpus corresponds essentially to the research purposes for which it is created. Recall that the essential aim of my study is to understand the nature of the phenomenon of “Stylistic Fronting” in embedded contexts in relation to information structure which serves as basis for the syntactic modelling of the data.

As a first consequence the total amount of data needs to be large, since the proportion of relative and comparative clauses is expected to be low. Thus, for the present study, only edited texts can be taken into account in order to facilitate the digitalization and the coding of the data. A second consequence is that the necessary syntactical and information-structural annotations entail the following requirements on the texts that are used. On the one hand, verse texts must be excluded since the fronting of elements can be due to metrical instead of syntactical constraints and therefore may falsify the occurrence of fronting. On the other hand, the texts have to be available as a whole in order to allow the pragmatic-contextual annotation of the data.1 Third, since variation in the data may rely not only on language change across time, but also on other instances2, it is indispensable to adjust the selection of texts as much as possible to linguistic coherence. Zimmermann (2014) suggests that texts need to be written in one dialect and to contain as high an amount of direct speech as possible. For the 14th and 15th century, it can be observed that texts written in the same French dialect varied considerably due to the lack of standardization, be it intra-speaker or inter-speaker-variation (Völker 2003). Consequently, texts are preferred that belong to the same genre, that can be dated and located, and whose scribes can be identified. The focus is on legal texts since, according to Balon and Larrivée (2014), they generally provide – in contrast to literary texts on which research on French is predominantly based3 –the required characteristics: “Datés, localisés, avec un auteur souvent identifié, rarement réécrits, d’une édition plus près de la lettre” (Balon and Larrivée 2014: 4-5).

Finally, there is still the question of “accessibility” that was briefly touched upon before. In

1 See the second part of the present chapter for more details on the principles of annotation.

2 Cf. the part on the “linguistic interest of LDR” in this chapter, and Koch and Oesterreicher (1985, 1994) for the different dimensions of linguistic variation.

3 Even though, as Balon and Larrivée (2014) observe, the number of diachronic studies based on French legal texts increased during the last decade, cf. Völker (2003, 2007), Wirth-Jaillard (2014), Ingham and Larrivée (2015).

fact, the potential contradictions between the above-mentioned needs are negligible. The common demand is the “uniform depiction of any possible evolution of the language”

(Zimmermann 2014: 26). While Stanovaïa (2003) insists on the fundamental difference between the text, i.e. what is commonly called archetype, and the manuscripts as the only matter of study directly accessible due to the gap between their respective records, Zimmerman (2014) suggests the use of elaborated editions based on one single manuscript, which should not be written more than 50 years later than the original document. Accordingly, one needs to check in each case individually whether the existing editions can meet the essential editorial requirements that Völker (2003) postulates, for instance, and reflect on possible deviancies overtly.4 In doing so, one notices the existing divergences between texts and manuscripts as Stanovaïa (2003) puts it.

To sum up, the aim of my study requires the use of edited or even already digitalized texts that have to be coded subsequently. The first option would be to resort to already existing annotated corpora covering the investigated time frame (14th and 15th century).

In the next section, a survey of freely accessible corpora of Medieval French is given which allow on-line queries, and it is considered whether and to what extent they benefit the present work and conform to the here established requirements, namely by allowing to base queries on entire legal texts that are dated, can be located, and match the above-mentioned edition criteria.

If possible, further preference will be given to corpora that are annotated syntactically and/or information-structurally.

2.1.1.2 Existing on-line corpora

The Consortium international pour les corpus de français médiéval5 was founded in 2004 by seven research institutes, the University of Ottawa, the École Normale Supérieure of Lyon, the University of Stuttgart, the University of Zürich, the research centre ATILF, Aberystwyth University and the École nationale des chartes. It hosts an Internet portal that assembles information on corpora of Medieval French such as a bibliography, suggestions for, and examples of corpora research in practice (transcription, design, description, coding and

4 With respect to the handling of complex tradition contexts in historical sciences, cf. Daniel (2006) for an introduction.

5 For the web presence of the Consortium international pour les corpus de français medieval and the corpora mentioned in the following please consider the bibliography.

categorization of corpora), a mailing list, and links to the online-corpora proposed by its members. All corpora are composed of different genres and allow doing queries via on-line query masks. Apart from literary and hagiographic texts, some include also chronicles, legal or scientific texts. The query facilities that the different corpora provide for users depend on the purpose for which the different corpora had been designed initially. For instance, there are two corpora that were used in order to establish dictionaries, namely the Base textuelle du Moyen Français used as fundament of the Dictionnaire du Moyen Français (DMF), the Anglo-Norman On-Line Hub for the Anglo-Norman Dictionary and the Base textuelle du Dictionnaire Électronique de Chrétien de Troyes. Both provide an access to the data base and partially allow searching for a single word or a sequence of words and showing the results in its context.

Furthermore, the DMF provides the possibility to analyse verbal morphology and its spelling whereas the Anglo-Norman On-Line Hub and the Textes de Français Ancien database allow searching for concordances. However, the facilities of these corpora did not meet the expectations with regard to the entirety of legal texts, the editions used, and the possibility to situate the respective texts. For the purpose of the present work, corpora that allow searching for syntactic criteria and that provide full access to the texts used were of particular interest, since they would facilitate the research. With respect to Medieval French, there are two projects that provide syntactic annotation of the data. On the one hand, the Syntactic Reference Corpus of Medieval French, directed by Sophie Prévost and Achim Stein, which consists of syntactically annotated parts of two text corpora of Medieval French, namely the Base de Français Médiéval and the Nouveau Corpus d'Amsterdam. The texts here annotated mainly consist of literary texts and some hagiographic texts as well as of one legal text (Serments de Strasbourg) dating from the 9th to the 13th century of which the majority is written in verse.

Hence, the present corpus does not meet the criteria established above, either, above all with respect to the time frame. On the other hand, there is the corpus Modéliser le changement: les voies du français, which comprises texts from the Middle Ages to the 18th century. Some parts of it are identical to parts of other corpora among which the Base de français medieval.

According to Martineau (2008), the corpus covers different dialects and includes, for the medieval part at least, meta-textual data and information about the scribe. She points out that regardless of this information, the corpus is characterized by a certain heterogeneity due to the availability of the texts and to different text editions. Apparently, the corpus consists of both literary and non-literary texts as queries after registration show. With respect to legal texts, the corpus includes four texts altogether, of which two are from the 15th century, both from the

region of Paris.6 There is no legal text that covers the 14th century. With regard to the edition, additional information about the edition was available for one text only, whereas for the second one no information could be found.

To conclude, the present review of already existing corpora shows that none of the corpora could meet the criteria established above. Either, the texts were not available as a whole and queries could not satisfy our needs, or the time frame and the available genre did not correspond to our demands. Hence, I decided to refer to a digitalized corpus that meets the above-mentioned requirements, by including legal texts of a delimited region and covering the 14th and 15th century.