<title>Long-term Preservation of ETDs in Algeria: Discussion Through the CERIST
Deposit System</title>
<author>
Bakelli Yahia
</author><coauthor>Benrahmoun
Sabrina</coauthor>
<role>Researcher</role>
<affiliation>CERIST research Center</affiliation>
<address>BenAknoun, Algiers (ALGERIA)</address>
<e-mail>E-mail: ybakelli@mail.cerist.dz</e-mail>
<date> May 2003
</date>
Initiatives of CERIST Research Centre concerning The Algerian Academic Literature
(1985-2000)
Bibliographic databases & union catalogs:
Algerian Scientific Abstracts ALGERIANA
CAT (Algerian Theses Catalogue) FNT (National Theses Repository) BDRC (Current Researches Database )
….
www.dctd.cerist.dz
Example of a bibliographic record
from the FNT Database
However, there is a need (in CERIST) to go beyond the bibliographic records.
Because we have to establish that local
users and scholars are in need of
obtaining full text and digital content.
Official decree issued in August 2000
by the Ministry of Higher Education
and Scientific Research.
A new ETDs chain has been set
but without a professional
archiving plan.
Questions
• How the CERIST ETDs system is operating?
• How Theses files are saved?
• What would guarantee that these digital materials deposited by students might be preserved for a long term?
• How international standards, rules and techniques
of Digital archiving can be applied to this system?
The Mass of Electronic Theses (Oct. 2001- Mar. 2003)
• Collection of 1463 electronic media
a) By kind of media:
- 1269 floppy disks (87%) - 194 CD-ROMs (13%)
b) By language of theses:
- Arabic: 1161 Floppy disks and 97 CD-ROMs. (86%) - French: 108 floppy disks and 97 CD-ROMs. (14%)
in average, 54 theses are submitted monthly.
Current Chain of CERIST ETDs System
• Acquisition of ETDs
(two modes of acquisition: The student himself or CERIST representation).
• Control & inventory
(check the integrity of the electronic media).• Codification
(ex: “THA.3.905” ).• Bibliographic recording
(UNIMARC, SYNGEB Software, .THE)• Conversion and storage of files.
(PDF, ARN-A & ARN-F).Anomalies Concerning the Integrity of Submitted Digital Media
a) Examples of Physical anomalies:
- The interruption of the uploading process (from the floppy disk to the hard disk).
- The presence of a Virus infection.
b) Examples of Content anomalies:
- Absence of the Cover page of the thesis.
- Lack of few parts or chapters (TOC, bibliography,..).
The Survey Conducted at CERIST Library (April 2003) shows :
05% of the deposited floppy disks and 32% of submitted CD-ROMs cannot be directly integrated and archived into the current collection.
Some mending operations must be done before.
These operations consist on two main kinds of actions:
a) Repairing faulty disks and healing infected files.
b) Digitising of missing chapters and content from the
printed copies.
03 Sets of Experimentation
A sample of 430 submitted digital media (30% of the whole collection).
Two levels:
a) Conservation of the Digital media itself.
b) Preservation of data and content of the ETD.
-I- The Media of Backup and Storage
To identify, for CERIST ETDs case the appropriate:
• Backup and technology solution
• Archiving system architecture
to optimise the archiving activity
with a minimum of data loss risk.
-II- Refreshing, Migration or Emulation?
Two important criteria:
• The cost of the technique.
• Requested Skills of librarians.
Our assumption
“Refreshing” seems cheap and simple.
-III- The Content of ETDs .
• How does the structure of submitted dissertations must be reformatted?
Axiom = XML is the most suitable standard.
• We are comparing two existing XML DTDs:
- The DiML (of the Humboldt University of Berlin).
- The TeiLite (adopted by PUM, Univ. Lyon2…).
• Two parameters for the comparison:
- Easy to interpret.
- Appropriate for a wide range of disciplines.
The Thesis as submitted by the student (DOC, ASCII …)
Check the conformity Of the Thesis structure With the adopted XML DTD
Reinserting the thesis content into an XML flat (by using an XML editor).
Generation of Valid XML Thesis
CERIST ETDs Repository
XSLTransformations
Export to ETDs Website (XML, HTML, PDF, …) ETD Derivative content
TOC, Bibliography; index;
maps; statistics.. )
Process of Archiving ETDs content on XML:
Long-term preservation of data and value-added possibilities.
Suggested by Y. Bakelli and S. Bnrahmoun (CERIST, Algiers May 2003) Plugins and scripts
Generation of Metadata of ETDs.
According to the model of ETD-MS of Virginia Tech and XML syntax.
ETD-MS Metadata for Arabic texts
A “Naming Scheme”
As a Protocol of How Files Must be Stored in Directories.
• The “URN handle-server” technique is
currently applied for the CERIST ETDs
sample.
Conclusions
• Necessity to develop a digital archiving plan to avoid another more complex problematic of “managing the retrospective”
or “managing past”.
• International standards and concepts of digital archiving are effectively applied for the local context of CERIST ETDs system.
• We need to adopt one generic and exhaustive ETD chain, in a model of Virginia Tech ETD, Cybertheses, etc.
• XML will constitute one major option of CERIST ETDs system.