593 Beverley Hong-Fincher
Ning Chunyan
On Selecting and Featurizing Chinese Compounds for Computer Database
Compounds are generally defined as syntactically unanalyzable consfituents which consist of
more than one lexical component. Roughly speaking, the most salient characterisfic of a
compound is manifested in the boundness among its components.In all languages compounds constitute an indispensable part of the lexicon and compounding is an inevitable lexical or
morphological device either in terms of language acquisition or in terms of linguistic
theoreticalization, though they are not necessarily the main body of the lexicon. Modem
Chinese, however, employs a lexicon containing an extremely high percentage of compounds.
And the compounding boundness appears to be much greater than that in other languages in
such a way that they behave like mono-items in syntactic process and are deprived of their potential creativities just like the linguistic mono-entities established prior to acquisition and
beyond creation. This idiosyncratic aspect of the language makes the study of Chinese
compounds a project of both theoretical necessity and practical importance. But since Chinese is one of the least inflectional languages, a language without lexical morphology and with a
rather impoverished syntacto-morphology, the study of its compounds will come up against
problems unlikely to be encountered in the study of other languages like English.
One of the problems is that the distinction between a compound and a phrase has always been controversial and subtiy implicit, thus blurring the modularity of lexicon in distinction from syntax and vice versa. It seems universally acknowledged that no satisfactory theory either
descriptive or explanatory can be obtained until the distinction between a compound and a
phrase is made with consistency. As for the much pursued enterprise of linguistic
computerization, no feasible computer project related to the language can be successful until
such a distinction can be made because what is to be stored as data and what is to be
manipulated as operation will be otherwise indistinguishable.
The present study is a computer aided project attempting to provide an over-all characteriza¬
tion of Chinese compounds, including their recognition and all other desired descriptions such as their phonetic (tonic) pattems, their syntactic properties and distributions and their intrinsic properties as far as our linguistic intuition can possibly reach. 20,000 compounds have been
selected for observation according to the algorithms stipulated in Section 1. A tonic
description and some observations in terms of phonetic properties of the compounds selected by the algorithms appears in Section 2. Section 3 gives an account of the intrinsic properties of those compounds with a full typology for their classifications. Section 4 through section
12 will provide a detailed description of their intrinsic properties in terms of semantics and their corresponding classifications. For each class or type, a statistical result is given in
percentage. Appendix 1 enumerates all the possible candidates for each class or type.
Appendix 2 gives a descending frequency of the commonly-used han-zi (characters) occurring in compounds.
The database for the project includes a computer dictionary consisting of 30,000 featurized
Chinese compounds. The programme was written in Fortran and built into the Coombs
Computer system (DEC- 10) at the Australian National University. The programme involved
in the project is the Oxford Concordance Program, Version 4.
Since most of the Chinese compounds are bisyllabic, compounds with four syllables, that is,
A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII International Congress for Asian and North African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).
© 1992 Franz Stcincr Verlag Stuttgart
idioms are not witiiin tiie coverage of tlie present project and are left to be dealt with elsewhere. Trisyllabic compounds will be studied separately in the final section.
The pap)er is divided into the following sections:
0. The idiosyncratic aspects of Chinese constitute some serious problems in the
construction of its computer database. These problems can be ultimately attributed to the following characteristics:
(1) (i) implicitness in sentential segmentation
(ii) implicitness in lexical segmentation
(iii) irregular distribution of empty or deleted elements.
Among these linguistic idiosyncracies, the first two are mingled together, thus blurring the modularity of syntax as distinct from lexicon, i.e. the distinction between data and
operation. No English-competidve computer project can be realized until such
distinction is explicitly made. To solve these problems we concentrate on the
behaviour of Chinese compounds, which we find to be one of the major causes for (i) and (ii) with its arbitrary selections and featurizations. As a preliminary work, we
have built up a dictionary of 20,000 compounds into the computer system (DEK 10)
at the Australian National University. The programme is based on the selectional and featurizing principles which we shall address in this paper.
1. The Selection of compounds. The selection principle subsumes the exclusions in (2)
and the algorithms in (3):
(2) a is not a compound if it is (i) a proper noun (ii) a technical term
(3) Algorithms:
Given a bisyllabic string ab, ab is a compound if
(i) the occurence of ab is exhaustively restricted to the occurrence of a'b' in aba'b' or a'b' ab, where a'b' is another bisyllabic string:
and
(ii) the candidates for a/b in the occurrence of ab is analogically enumer¬
able.
2. Each selected compound is featurized in terms of its syntactic properties and its
intrinsic properties. Assuming that syntax is a modular process consisting of two
interactive components, of which one is called inherent component responsible for a deriving syntactic representation and the other is called derivational component
derived from the former, the syntactic property of a compound is determined by its
distribution in the inherent component while its intrinsic features are specified as far as human intuition can reach and natural language understanding requires in terms of syntax, semantics and pragmatics if possible.
3. In case a compound exhibits more than one syntactic property, we choose one and
only one as effective syntactic property at the same time and leave the rest to be
specified as features.
4. The featurization is general-purposed. Since the compounding elements in a compound
Section 15
contribute to the meaning of the whole sentence where they are embedded,
featurizations are represented in semantic relations among themselves and related to other potential constituents.
5. Ambiguous compounds are treated as different mono-refential compounds.
Reinhard Wonneberger
TEX for Philological Typesetting
TEX is a document processing program providing professional typesetting quality as well as source and result exchangeability on a variety of computers from personal to mainframe [1,4].
LaTEX is a macro package allowing logical markup of documents and document support
functions [2,3,14]. All relevant information is covered by a Newsletter of the worldwide TEX
Users Group (TUG) [5]; for free initial information apply to: TEX Users Group, P.O. Box
9506, Providence, RI 02940, USA.
Though TEX was mainly developed to typeset mathematics, there is a growing interest in
philological applications [9 and other articles in TUGboat [5]]. The typesetting process of a theological book containing also Greek and Hebrew [6] is described in [7,8,10], and TEX's
use for a new edition of Biblia Hebraica is discussed in [13]. In [11], the method of
Normaltext [12], developed specifically for philological research on ancient texts, is
implemented using LaTEX and TEX.
[1] Donald E. Knuth: The TEXbook. A Document Preparation System. Computers &
Typesetting / A. Reading, Massachusetts etc.: Addison Wesley 1986. ISBN 0-201-13447-0
[2] Leslie Lamport: LATEX. A Document Preparation System. Reading, Massachusetts
usw.: Addison-Wesley 1986, ISBN 0-201-15790-X
[3] LATEX. Local Guide.
[4] Norbert Schwarz: Einfiihrung in TEX. Bonn usw.: Addison-Wesley 1987. ISBN 3-
925118-25-X
[5] TUGboat. The TEX Users Group Newsletter. Providence, Rhode Island Bd. Iff 1980ff.
[6] Reinhard Wonneberger / Hans Peter Hecht: Verheißung und Versprechen. Eine
theologische und sprachanalytische Klärung. Göttingen: Vandenhoeck & Ruprecht 1986.
[7] R.W.: "Verheißung und Versprechen" — A third generation approach to theological
typesetting. Jacques D^sarm^nien (ed.): TEX for Scientific Documentation. Second
European Conference, Strasbourg, France, June 1986. Lecture Notes in Computer Science
236. Beriin/Heidelberg/London/etc: Springer 1986. ISBN3-540-16807-9-ISBN0-387-16807- 9. p. 180-198.
[8] R.W.: Stream lists and related list types for LATEX. TUGboat 6 (1985/3) 156-67.
[9] R.W.: Towards a TEX Philology Group. TUGboat 7 (1986/3) 132-133.
[10] R.W.: Chapter Mottos and Optional Semi-Parameters in General and for LATEX.
TUGboat 7 (1986/3) 177-185.
A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and Nonh African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).
© 1992 Franz Stcincr Verlag Stuttgart
[11] R.W.: Typesetting 'Normaltext' TUGboat 8 (1987/1) 63-72.
[12] R.W.: Normaltext und Normalsynopse — Neue Wege bei der Darstellung alttestamentli¬
cher Texte. Zeitschrift für Sprachwissenschaft (ZS) 3 (1984/2) 203-233. Forsch.ber. (siehe
1984) 01.010.07 Referate: ZAW 97 (1985/2) S. 273 (G.W.).
[13] R.W.: Überlegungen zu einer maschinenlesbaren Neuausgabe der Biblia Hebraica
Stuttgartensia. Association Intemationale Bible et Informatique (ed.): Actes du Premier
Colloque Intemational "Bible et Informatique: le texte", Louvain-la-Neuve (Belgique) 2-3-4
septembre 1985. (Traveaux de linguistique quantitative 37 = Debora 3) Paris / Geneve:
Champion-Slatkine 1986, 363-379. ISBN 2-05-100769-1 ISSN 0773-3968
[I4]R.W.: Kompaktführer LATEX. Bonn etc.: Addison Wesley 1987. ISBN 3-925118-46-2.
Kai-Uwe Günther
Wolfram W. Latsch
Transcription Program for Writing and Printing Ethiopic Script
This program was devised on a Sinclair QL computer (385 kBytes RAM, 68008 proc.) for
text processing of Amharic and Ethiopic texts.
The main idea was to keep the costs as low as possible and also to find the easiest way of
programming. This was found to be done by using Basic, it being the best known language
and thus making it possible for the user to modify the program for individual use.
The hardware required (Sinclair QL, Pinwriter, monitor/TV-set) does not exceed $ 600.
At the present stage the program already incorporates the main text processing features
needed for laying out Ethiopic texts, either in roman or Ethiopic letters.
Originally, the program was developed for a blind person who was hitherto not able to write
anything but Braille script, and the program was equipped with various sound features in
order to make orientation on the screen jxjssible. The input is controlled and mistakes in key combinafion are acousfically indicated.
Ethiopic, being a syllabic script, is put in by a combinafion of consonants and vowels.
Exceptions from this dual system were made necessary when using labiovelars (e.g. hwe or
hwa) or when Sf)ecifying different letters with the same phonetic value, which is done by
prepositioning a number.
The intricate Ethiopic number system has been incorporated in the program.
A speech synthesizer is in preparation as an additional possibility for the blind user to control the input. The Ethiopic language is made up of phonetically more or less isolated syllables, a feature which makes it possible to agglutinise these entities acoustically without completely obscuring the meaning to the listener, as would be the problem with English. It is planned to be make available this addition at a cost of about $ 70.
Compatibility with a VERSA-BRAILLE computer is desirable and is near to completion.
The basic stmcture of our program makes it possible to transcribe roman into any syllabic script after minor alterations.
A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and North African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).
© 1992 Franz Stcincr Verlag Stuttgart
Donald A. Becker
The Use of Microcomputers arui Dot Matrix Primers in Printing South Asian Texts
Two approaches to the production of camera-ready manuscripts in South Asian writing
systems were outlined in this presentation. The first approach utilizes an IBM PC to drive a Toshiba dot matrix printer. Specially constructed printer driver software transforms ASCII files of Romanized Hindi, Sanskrit, Telugu, Urdu etc. into printouts in the appropriate script.
Using a Romanization for input purposes has several advantages, the most important of which are increased typing ease and the possibility of printing the same text file in totally different
writing systems. The second approach employs an Apple Macintosh computer and Apple
ImageWriter printer. The chief advantages of the Macintosh approach are its emphasis on the WYSIWYG principle ("What you see is what you get") and its ability to manipulate the size and appearance of individual fonts.
Print specimens of several different South Asian scripts produced with each of the two
hardware configurations were presented. The construction of printer driver programs for the
IBM PC was summarized and the use of Macintosh font editors was outiined. A simple
strategy for the construction of alphabetizing utilities for non-Roman scripts was presented
and some glossary-constructing programs based upon this strategy were illustrated. The
presentation concluded with a discussion of keyboard-customizing software for the Macintosh and of a means of simplifying the process of data entry in Macintosh fonts for the complex writing systems of South Asia.
Klaus Boekels
Using a Personal Computer for Wordprocessing and Database Management of
Oriental Languages
Working with the personal computer opens up a new horizon for scientific work. Every area
of research in Oriental languages with a personal computer is connected with different specific problems. One of the main problems is the editing of texts in Oriental languages and their analysis in a database. There are three basic problems to be mentioned:
1. An enhanced set of characters, which is not accessible in the normal mode of editing.
2. Different writing direction in the area of Semitic languages and the problem of many
ligatures in Sanskrit and other languages.
3. Different order of the alphabet.
There exist different solutions to these problems.
One way to overcome these problems, is to work with a text programme which provides
many different character sets by working in graphic mode.
Another way to overcome them is to transliterate the texts.
Yet another way is to use a board which allows the downloading of user-defined fonts.
What is essential is the interchangeability of text and database. For linguistic research the text is a database and the database is a text. It must be possible to analyze texts by importing them
A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and North African Studies, Hamburg, 25th-30lh August 1986 (ZDMG-Suppl. 9).
© 1992 Franz Stcincr Verlag Stultgart
word by word into a database, and then to read in the formatted results into a text
programme, where a glossary or a list may be prepared out of this material for a final
outprint. When interchangeability is not assured, it is impossible to retrieve and analyze texts after their input.
Accordingly, working with a text programme which allows the use of characters in graphic mode is no solution for linguistic work with Oriental languages, because the data cannot be exported into a database without great problems.
The method of transliterating texts using the characters provided by the character ROM
results in text that can be read only with difficulty.
The best way of dealing with the multiple characters in Oriental languages is that of using a
graphic board which allows the modification of characters by using a RAM font. In this
Ramfont you can define the characters used in the intemational transcription of the Oriental language to be used. This input can then be read into a database and for the final output be converted into the original script. In this way the difficult problem of different writing directions and different characters in different positions is easily solved. The data are put in and retrieved in the transcription of the specific language and in the final output the original script is converted out of the transcription. So you have a transcription easy to handle, which optionally can be transformed into the original script. I wrote a programme, which converts Arabic and Persian in transcription into the original script with the combined letters as vowels and shadda, which in the normal mode of typing are not accessible. In the same way it will be possible to provide solutions of this kind for other Oriental languages.
Urs App
Scholarly Use ofa Japanese Microcomputer System
Multilingual word- and dataprocessing which involves European languages as well as Japanese and Chinese is needed by scholars and institutions, but no comprehensive and satisfactory solution has been found so far. The system presented by the speaker is not ideal either, but it functions well within certain limits, and it necessitates only moderate investment in hard- and software.
Hardware:
— NEC 9801 VM2 or similar model with inbuilt kanji chips, 384K memory, V30
processor 8/10 Mhz, two 1.2 MB drives, 640x400 display, and a 2 MB RAM board
to facilitate multitask work)
— EPSON VP 80K or similar model with kanji chips (JIS level 1-1-2), italics, and the
possibility of loading user-defined characters Software:
— Japanese MS-DOS v. 2.11 (includes utilities to create new characters) which accepts
Chinese/Japanese characters on system level
— VJE device driver (VACS Corp. Tokyo) for convenient kanji input
— MXI-Plus RAM-disk driver (Megasoft) for faster work
A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXll Inlemational Congress for Asian and NorUi African Studies, Hamburg, 25th-30lh August 1986 (ZDMG-Suppl. 9).
© 1992 Franz Stcincr Verlag Stuttgart
599
— Wordstar 2000 J (Twinstar) v. 2 (MicroPro Japan) for multilingual wordprocessing
(windows, footnotes, etc.)
— dBase II or III (Ashton-Tate), whose Japanese versions also accept Chinese and
Japanese characters
— ConCur 98 (VACS Corp., Tokyo) for multitask operation (for instance running
Twinstar and dBase 2 concurrently) Advantages:
— With this basic setup, it is possible to write multilingual articles and create databases which consist of or include Chinese and Japanese characters. Top text search speed
on RAM is well over 10000 Chinese characters per second. Multitask capability
allows database or text search work while writing.
— The flexible VJE system assures easy Japanese text input (from roman letters).
Enlargement of the basic dictionary (partly by the speaker) allows speedy input of Chinese or pre-war Japanese characters using the four-comer system, pinyin with and without tone, and radical number with stroke count.
— Specialized dictionaries can be merged or substracted from existing ones.
— The use of intemal standardized character codes (JIS; Japanese Industrial Standard)
assures easy transfer of data between different systems (incl. PCs and mainframes) and stability in the future.
— The NEC 9801 series dominates the Japanese PC market; thus continuity in the future
can be expected.
— In the present absence of scanners that handle this kind of text, a mass of book
manuscripts, catalogues, bibliographies, Chinese texts etc. which are today input on wordprocessors in Japan can be made available as data for databases.
— Easy creation of up to 188 additional characters is possible on the system level. The
impact of the limited number of characters can be softened considerably if much used but lacking characters are thus created.
Disadvamages:
— The gravest disadvantage is the limited number of available characters: only 6350
Chinese characters. For Sinological work, this is never sufficient, and in sotne cases Japanese post-war simplifications must be used. In this way, ordinary classical Chinese
texts can be written without much trouble; if the most frequent 188 additional
characters for any one text are created, a great number of texts can be input
completely.
— While Russian and Greek letters can be used, it is still somewhat awkward (though
possible) to write French, German etc. Sorting which includes such characters may
produce incorrect alphabetical order.
— Machine memory is limited to 640 KB; with the size of these programs, it is for
instance not possible to use Twinstar and dBase III concurrentiy, but Twinstar and dBase II work well.
Hardware service for NEC products, though available in the West, is possibly not stronger than sales figures.
Uwe Glessmer
Greek, Hebrew and Other Fonts in Theological Texts
The lecture is divided into three main sections (followed by the demonstration of a
programme):
1. Short description of the famous old programme WordStar with its possibilities of text
processing: editing and correction of manuscripts, planning of layout with different latin character sets and pitches, underlining, bolding and italicisation as well as indenting, headers and footers etc.
2. To use Wordstar for scientific text processing page-orientated footnotes and different non-latin character sets are a desideratum. An enlargement to integrate both into this system is possible by using a dot-matrix printer with a special driver programme
{Drucke). Such a programme in combination with a dot-matrix printer with graphic
capabilities is able to print every desired character in graphic mode. A special pattem
in an 8*8-grid must be designed for each new foreign letter and conventions for
application in WordStar must be defined. In this way e.g. Hebrew (with automatic retroversion from right to left) and Greek as well as other fonts of scientific symbols can be integrated into this "normal" text processing system.
3. The third step concems the preparation of similar character sets for a high quality
printer (such as a laser-printer) when they are not available on the market for Oriental studies. Here a single letter is also defined as a small picture in a grid of points. But the resolution is extremely high: 300 dots per inch. So the definition of an appropriate bitmap is hardly done by hand because 1500 single pixels for each character must be defined in a matrix of 50*30 dots. To avoid this laborious manual work a much easier procedure of scanning (i.e. digitization or electronic fotograph) of printed prototypes
can be chosen. Printed pages can serve as models to develop laser-printer fonts for
"exotic characters". The way to manipulate such scanned data and to isolate single
character bit-maps of a whole scanned page is shown by a sample programme. The
source code (written in Turbo Pascal) is put into the public domain and available on the "Hamburg-Souvenir-Diskette" which we prepared as a gift to each participant of the section "personal computers and Oriental studies" (available through the author or R.E. Emmerick).
Hans-Peter Vietze
Ein Beispiei des Nutzens rechnergestützter Lexikographie
Das "Wörterbuch Deutsch-Mongolisch" (H.-P. VIETZE, Z. DAMDINSÜREN, G.
LUWSAN, G. NAGY, Wörterbuch Deutsch-Mongolisch, Veriag Enzyklopädie Leipzig,
1981) mit ca. 80 000 Wortstellen wurde mit Hilfe eines Computers "umgedreht", so daß die deutschen und die mongolischen Eintragungen ihre Plätze vertauschten und die mongolischen
A. Wezler/E. Hammerschmidi (Hrsg.): Proceedings of ihc XXXII Inlernalional Congress for Asian and North Afriean Studies, Hamburg, 25lh-30lh August 1986 (ZDMG-Suppl. 9).
© 1992 Franz Stcincr Verlag Stuttgart
Stichwörter, aber auch alle nachgeordneten Ableitungen, Phraseologismen und Anwendungs¬
beispiele, nach dem mongolischen Alphabet sortiert an die linke Seite der Wortlisten rückten.
Natürlich lag mit den ausgedruckten Listen nicht ein komplettes neues mongolisch-deutsches
Wörterbuch vor. Diesen Idealfall gäbe es nicht einmal bei der Umkehrung eines Wörter¬
buches zu nahe verwandten Sprachen. Wegen der unausbleiblichen nichtidiomatischen
Übersetzung deutscher Idiome, der Tatsache, daß Anwendungsbeispiele zu deutschen
Stichwörtern meistens nicht gleichzeitig optimal zu den mongolischen Äquivalenten passen, sich häufig akkumulierender Belege und der auf Grund Tausender Neu- und Erstübersetzun¬
gen mit z.T. aufwendigen Paraphrasen (z.B. balildach 'die dreifache Fußfesselung des
Pferdes durch einen zusätzlichen Riemen zwischen den Vorderbeinen verstärken' u.ä.) waren nur ca. 50% des ausgedruckten Materials direkt verwendbar. Die restlichen 50% der Listen bedurften einer umfangreichen Nachredaktion.
Der Nutzen einer derartigen Computerarbeit wird zunächst durch die Potenzen von
Textverarbeitungs- und relationalen Datenbankprogrammen selbst erreicht. Diese Programme sind für die lexikographische Arbeit bei möglichst geringer Belastung des RAM-Speichers
in geeigneter Form zu kombinieren, da die Datenbankprogramme in der Regel über keinen
guten Editor verfügen.
Alle Vorzüge der rechnergestützten Textverarbeitung wie Fehlerkorrektur auf dem
Bildschirm, Löschen, Ersetzen und Einfügen von Buchstaben, Wörtem, linearen Textteilen
und Spalten, Layout-Gestaltung, Schriftartenwechsel, Ersetzen von kürzeren Textstellen mit verschiedenen Optionen und vieles mehr, kommen bereits bei der Arbeit am Rohmanuskript voll zur Geltung. Die Datenbankprogramme bieten mit mächtigen Befehlen wie z.B. "replace all ... with ... for ..." Möglichkeiten, die für die lexikographische Arbeit geradezu maßgeschneidert sind. Die Ursache hierfür liegt in der praktisch immer wieder erwiesenen und selbst bei bester konzeptioneller Vorarbeit nicht vermeidbaren permanenten Notwendig¬
keit der Revision, was dazu führt, daß jedes Wörterbuch bis zu Dutzenden von Malen wieder
von vom durchgearbeitet werden muß.
Den Hauptvorteil der Computerarbeiten sehen wir jedoch nicht in den editorischen Lösungen, sondem in der problemlosen Nachnutzung der in das "Wörterbuch Deutsch-Mongolisch"
investierten Forschungsarbeit für das "Wörterbuch Mongolisch-Deutsch". Diese Forschungen
bezogen sich im wesentiichen auf die Valenzangaben und auf die Klämng spezieller
grammatischer Probleme. Ohne die Hilfe eines Computers hätten die neu erarbeiteten
Valenzangaben zu ca. 6000 Wörtem manuell gesondert aufgelistet bzw. verzettelt und dann
an verschiedenen Stellen der Kartei zum neuen Wörterbuch eingefügt werden müssen.
Wie die grammatischen Angaben, so blieben auch die Sachgebietshinweise (Lit, Zool, Phys
usw.) bei der "Umkehmng" des "Wörterbuches Deutsch-Mongolisch" erhalten. Dadurch ergibt sich die Möglichkeit, kleinere Spezialwörterbücher zu jedem beliebigen Sachgebiet automatisch zu erstellen.
Schließlich sei darauf verwiesen, daß das einmal auf Datenträger gespeicherte Wörterbuchma¬
terial mit Hilfe eines Computers nach vielen anderen Gesichtspunkten, auch nach anderen
Programmen weiterverarbeitet, d.h. ausgewählt, ausgezählt, kollationiert und sortiert werden
kann. Sofort möglich werden Distributionsuntersuchungen von Graphemen, Morphemen,
Wörtem und Syntagmen, die Extraktion von Beispielkomplexen zu vielen morphologischen
Arbeiten, sprachstatistische Untersuchungen, die automatisierte Herstellung rückläufiger
Wörterbücher, die Ermittlung grammatischer, semantischer oder phonetischer Gesetzmäßig¬
keiten bei feststellbaren Häufungen entsprechender Phänomene u.a.m.
Schlußfolgerung: Bei der jetzt zur Verfügung stehenden leistungsfähigen Daten verarbeitungs¬
technik und den benutzerfreundlichen relationalen Datenbankprogrammen sollte die
Verwendung eines Computers bei jeder lexikographischen Arbeit zumindest geprüft werden.
Boris Oguibenine
First Results of Vedic Grammar Processing by Computer
The incentive for the computer treatment of Vedic grammar has been the compilation of an
index of mythological motifs in the Rgveda hymns. An utterance with two slots filled by a
noun and a verb respectively is considered as the linguistic implementation of a mythological
motif; this bipolar structure comprehensively reproduces the content of the minimal
mythological information provided in the hymns.
In terms of computer processing the recognition of the data can be made by distinguishing two language categories: the verb forms and the noun forms.
Distinguishing the categories signalled in the occurring words is tantamount to discovering their changing features carrying the grammatical meaning and to leaving aside the features which also change, but which have only the lexical meaning. In the language of the hymns the categories are signalled by the cumulative morphs, i.e. by the sets of the language signs
which carry simultaneously several grammatical meanings. In Vedic the cumulative morphs
occur most frequentiy at the end of the words. This means that using graphic representation of the Aufrecht edition of the hymns the scope is to analyze the words proceeding from right to left and to disclose the minimal adequate information about the category status of each Vedic word.
The main question is thus: how long must the final graphemic/phonemic sequences be to
enable the definition of the categorical attribution of the respective words? Examples are shown in the paper.
Some of the specific problems of Vedic and of general linguistics are worth mentioning:
(1) There are striking analogies between the proposed technique and the technique of the
Ancient Indian grammarians (Vedic Prätisäkhyas and Panini's system). They need of course further investigation.
(2) The sequences enabling identification belong to the so-called sub-morphic level although actually linguists do not have much faith in the interrelations between the smallest fragments of the grammatical signifiant and their signifie. In a full-fledged description of a language the analyst must take into consideration the cases in which a part of the morph, an individual
phoneme (and respectively an individual grapheme) becomes a meaning carrier. In my
analysis the sub-grammatical sequences are the parts of the morphs which do not belong to
either morph at the juncture of the two morphs; they comprise morphs preceded by a unit
belonging to another morph so that the morph boundaries overlap.
(3) The study of the subgrammatical sequences is of prime importance for the analysis of the Vedic poetic language.
A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII International Congress for Asian and North African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).
© 1992 Franz Steiner Verlag Stuttgart
603
(4) The problem of what is traditionally called the transfer stems in Vedic needs serious reconsideration. Solutions for individual cases are suggested.
The contribution is not meant to enrich the computer science, but to show that the use of
computers furthers new approaches to the traditional linguistic and philological problems.
605 G.A. Oddie
Christian Conversion Movements in India: Some Recent Interpretations
Conversion movements involving changes in religious affiliation, whereby individuals or
groups opt out of one community and join another, have long been evident in India. In recent years a number of scholars, including R.L. Hardgrave, and more recenUy still, R.M. Eaton and F.S. Downs, have attempted to explore and explain these movements.'
One possibility which is beginning to emerge from this and other research is that there may be important differences between what has happened among Hindus converting to Christianity within the caste system and movements which have taken place in the more loosely structured and less oppressive tribal societies. It is well known, for example, that a high proportion of
Hindu converts to Christianity in the nineteenth and twentieth centuries were drawn from
among the depressed classes, and Hardgrave, referring to the nadars, one of the more
marginal groups in Hindu society, argues that the missionaries offered them not only the
gospel of a new religion "but also the possibility of secular salvation in release from the fetters of the tradition which had for centuries burdened them with social disabilities and economic dependence" . In focussing their attention on the hill peoples of north-eastem India,
however, Eaton and Downs were exploring Christian movements among those who, in
Eaton's words, had "never experienced pariah status either intemally or in relation to
outsiders" and the emphasis of these historians has correspondingly been much less on
exploitation and social disabilities as a factor in conversion. They both stress the importance of changes and disruption in tribal societies and argue that these circumstances greatiy
stimulated the search for a more satisfactory and meaningful world view including the
acceptance of Christianity.
While these somewhat different interpretations of conversion movements seem to reflect
differences in the social conditions which affected responses to Christianity, it would be a
mistake (as Eaton and Downs imply) to ignore the complex of other factors involved. The
existence of an hierarchical and oppressive social system may, for example, have created a predisposition among those at the bottom of the social scale to search for a way out, but it
does not explain why some members of the same caste converted and others (apparentiy
equally affected by social disabilities) did not. Clearly other factors, such as family
considerations, the effects of travel or education, economic ties and the ability to respond (without encountering crippling opposition), the attractiveness or otherwise of the local Christian community, and local traditions and views of religion, have to be weighed up and considered in any satisfactory interpretation.
' Hardgrave, R.L., The Nadars of Tamilnad. The Political Culture of a Community in
C/iOT^e.Berkeley and Los Angeles, 1969, esp. ch. 2; Eaton, R. M., "Conversion to Christianity among the Nagas, 1867-71", Indian Economic and Social History Review, Vol. XXI, No. 1, January- March, 1984, 1984, pp. 1-44; Downs, F. S., Christianity in North East India. Historical Perspectives.
Delhi 1983.
A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and North African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).
© 1992 Franz Sleiner Verlag Stuttgan