DATABASE OF DIGITIZED MANUSCRIPTS AS A RESULT OF AND RESOURCE FOR
SCIENTIFIC RESEARCH
Dr. Marija Prokopčik
Tartu, 2015 05 18
TOPICS
• DigiCourt – Digitization of GDL Court books, hold at VUL, and creation of script data-base - aims
• Collection in brief – why this collection
• Communication and/or Information paradigm
• Potential users and project team
• Challenges and solutions - TEI 5: flexibility, specificity, customization
• Lessons learned
The first research project Aims • Digitization of 50000
leaves (cc 100 CB)
• Exhaustive description of 2500 documents and DB
• 100 CB – full description
• Creation of DB of CB scripts
DIGICOURT – DIGITIZATION OF GDL COURT BOOKS,
HOLD AT VUL, AND CREATION OF SCRIPT DATA-BASE
COLLECTION OF GDL COURT BOOKS
• 540 manuscript books (1540–
1845) .
• The majority written in the GDL, then Russian Empire, the actual teritory of Lithuania.
• Voivodship courts, pavietai courts, castle courts, city courts of Trakai, Kaunas, Vilnius, Samogitia, Upytė, Ukmergė, etc.
www.teismuknygos.mb.vu.lt
• Historically - part of Central
Old Acts’ Archive of Vilnius
A BIT OF HISTORY....
• COAAV (10 000 CB)
• Preservation of CB in GDL
• The 3rd partition of the Polish–Lithuanian
Commonwealth
• 1852 order of the tsar Nikolai I and establishment of central archives of the old court acts in Vilnius, Kaunas, and Vitebsk
• Closure of COAAV in 1915
• Lithuanian State historical
archive
COLLECTION OF GDL COURT BOOKS
• Wills, sale/purchase documents, donations, complaints, lawsuits, subpoenas, statements, litigations, etc.
• Political, social, and economical life, details, mode of life, society and individual
psychology, culture of communication and inter-relions, property and family relations, life standards;
• Use for synchronic – (XVI c. Samogitia, Ruthenian) and diachronic (XVI – XVII c.
Kaunas Voivodship / province) research;
• Source of knowledge about material culture to replenish and illustrate information found in other sources;
• Time period and linguistic peculiarities, important source for research in history, historic geography, genealogy, linguistics.
LANGUAGES
• Latin,
• руський, Ruthenian, office slavic
• Polish,
• Russian.
WHY THIS PARTICULAR COLLECTION?
In great demand 450 requests per
year Users
• Researchers
• Students
WHY THIS PARTICULAR COLLECTION?
WHY THIS PARTICULAR COLLECTION?
• Continuity of started digitization (2004 ) DB – poor physical
condition, essential metadata, no advanced search);
http://gluosnis.vu.lt/biblio/dperzi ura.sarasas
• Advantage of digitalization in comparison with transliterated printed publication;
• Reconstruction/recreation of Court books collection in
electronic space (10 000 in and
540 in VUL, other places?
WHY THIS PARTICULAR COLLECTION?
Conformity with priorities, principles, directions identified in VUL digitalization guidelines:
• integral and important to VU, Lithuania history and culture;
• included into VUL collections priority list.
Criteria listed in State Strategy of Lithuania Culture Heritage Digitalization : unique and rare; content of historical, cultural and scientific value; great request, old, and poor physical
conditions.
COMMUNICATION AND/OR INFORMATION PARADIGM?
Information paradigm
• digitization of science data
• relevant to research institutions
• object is considered as a medium of impartial data
• data extracted from the source/object.
• target group – professionals
• specialists with high/narrow qualification, time, fair financial resources.
Communication paradigm
• digitization of cultural heritage data
• relevant to memory institutions
• the object itself is more important than data
• extensive “Albums of pictures” with minimum metadata (DC)
• target group general public (Who ?)
• easy to fill in with data, users point of view: not easy or attractive.
COMMUNICATION AND/OR INFORMATION PARADIGM: A “CONCORD”
CHOISE
Domination of information or communication paradigm and/or their balance depend on the purposes of a specific project, type and amount of digitized documents, number and type of metadata, methods of data extraction from the digitized documents, their organization in information system.
The specificity of VUL Court books (plain texts – nothing attractive) - “concord” between communication and information paradigms.
WHY
VUL - memory institution - preservation and communication (promotion) VUL - academic institution - studies and research
VUL - possibility to involve necessary professionals from the institution itself - more importance for information paradigm
Digitalization is not just the production of images.
Source for the further research and investigation.
TARGET GROUP
• “systems are not adapted to individual needs of users. <…> Users are perceived as a society altogether (mystic general public), without dividing them into different professional <…> groups, regarding on their needs”
• CB project is aimed at researchers: historians, linguists, and culture scientists.
• Users’ groups - specific features of documents description and presentation.
• Research - historians and philologists with specific knowledge and skills
THE TEAM
• Historians, philologists, librarians
• Restorers
• Digitizers
• IT specialists
CHALLENGE: LANGUAGE OF DESCRIPTIONS
Multilinguality: – instrument of promotion and use:
• Language(s) of descriptions
• Language(s) of researchers
Challenge of multilinguality: same persons,
places, topics - different expression in
different languages: proper identification,
relation and connection
CHALLENGE: LANGUAGES OF DOCUMENTS
• Unormalised orthography
• Different traditions of original text transcriptions
• 4x3: Lithuanian, Polish, Russian, Belarus x Old Polish, Latin, Office Slavic.
• Several traditions of transcription of historical
sources in each country …
WHANTED!
• Dawidowicz Or
Davidovič Davidovičius Davidavičius Davidaitis Dovydaitis Or
Давидович
ORIGINAL FORMS IN OFFICE SLAVIC
• iva timofvijorlov
• marti kgat
• gana martinovna hrebtovia pani martinovaja pocotkovskaja
• jori fedoroi hrebtovia
SOLUTION - TEI
“Live” description with the possibility of advanced search
An instrument/tool
• To create metadata
• To create a search tool -- Data from document,
-- Data of user (modern interpretation)
• To make system user-friendly and ensure interoperability with other existing similar systems
TEI (Text Encoding Initiative) www.tei-c.org
PERSON
<person
key=“#GaCHrMaPo">
<persName
xml:lang="sla“>
<forename>
gana
</forename><surname
type="complex">
<surname
type="paternal_surname">
hrebtovna
</surname><surname
type="married_name“ notBefor="1548"
>martinovaja
</surname><surname
type="married_surname“ notBefor="1548"
>pocotkovskaja
</surname></surname>
</persName>
The key:
possibility to present different
versions of the same person’s
name
Language
PERSON
• <persName xml:lang="lt“ n=“1”>
• <forename>Ona</forename>
<surname type="paternal">Chrebtavičiūtė</su rname>
<surname type="married_name">Martinienė</
surname>
<surname type="married_surname">Podcotkovi enė</surname>
</surname>
<roleName type="nobility">Žemaičių žemės žemionė </roleName>
Morphological transcription of
surnames
Language
Form Nr...
PERSON
• <persName xml:lang="lt“ n=“2”>
• <forename>Ona<forename>
<surname type="paternal">Chrebtavičiaus dukra</surname>
<surname type="married_name">Martino Pocotkovskio žmona</surname>
</surname>
<roleName type="nobility">Žemaičių žemės žemionė </roleName>
Daughter
Wife
Language Form Nr.
PERSON
• <persName xml:lang="ru“
n=“1” >
• <forename> Ганна <forename>
<surname type="paternal" > Хребтовна </surname>
<surname type="married_name" > Мартиновая </su
rname>
<surname type="married_surname" > Поцoтковска я </surname>
</surname>
<roleName type="nobility" > земянка земли Жемойтской </roleName>
TTrTTranscipti on
PERSON
• <persName xml:lang="ru“ n= “2” >
• <forename> Анна <forename>
<surname type="paternal" > Хребтовна </surname>
<surname type="married_name" > Мартиновая </su
rname>
<surname type="married_surname" > Поцoтковска я </surname>
</surname>
<roleName type="nobility" > земянка земли Жемойтской </roleName>
Modern form
RELATIONS BETWEEN PERSONS
• <relation name= "spouse" mutual ="#GaCHrMa Po #MaPo" notBefor= "1548" notAfter= "1590"/>
• <relation name= "parent" active ="#MaCHr" pas
sive= "#GaCHrMaPo"/>
RESULTS AND LESSONS LEARNED
• www.teismuknygos.mb.vu.lt
• Preparation of documents for digitization is important though often undervalued stage of each digitization project.
• Possibility to use Text Encoding Initiative (TEI P5) Manuscript description
schems for creation of DB using originally spelled and structured personal names and place-names, to record interpersonal relations, links between persons and places
• Possibility to integrate new aggregation of data into common Lithuanian infrastructure of digitized cultural heritage using already approved and functioning controlled tool - BAVIC thesaurus