This idiosyncratic aspect of the language makes the study of Chinese compounds a project of both theoretical necessity and practical importance

(1)

593 Beverley Hong-Fincher

Ning Chunyan

On Selecting and Featurizing Chinese Compounds for Computer Database

Compounds are generally defined as syntactically unanalyzable consfituents which consist of

more than one lexical component. Roughly speaking, the most salient characterisfic of a

compound is manifested in the boundness among its components.In all languages compounds constitute an indispensable part of the lexicon and compounding is an inevitable lexical or

morphological device either in terms of language acquisition or in terms of linguistic

theoreticalization, though they are not necessarily the main body of the lexicon. Modem

Chinese, however, employs a lexicon containing an extremely high percentage of compounds.

And the compounding boundness appears to be much greater than that in other languages in

such a way that they behave like mono-items in syntactic process and are deprived of their potential creativities just like the linguistic mono-entities established prior to acquisition and

beyond creation. This idiosyncratic aspect of the language makes the study of Chinese

compounds a project of both theoretical necessity and practical importance. But since Chinese is one of the least inflectional languages, a language without lexical morphology and with a

rather impoverished syntacto-morphology, the study of its compounds will come up against

problems unlikely to be encountered in the study of other languages like English.

One of the problems is that the distinction between a compound and a phrase has always been controversial and subtiy implicit, thus blurring the modularity of lexicon in distinction from syntax and vice versa. It seems universally acknowledged that no satisfactory theory either

descriptive or explanatory can be obtained until the distinction between a compound and a

phrase is made with consistency. As for the much pursued enterprise of linguistic

computerization, no feasible computer project related to the language can be successful until

such a distinction can be made because what is to be stored as data and what is to be

manipulated as operation will be otherwise indistinguishable.

The present study is a computer aided project attempting to provide an over-all characteriza¬

tion of Chinese compounds, including their recognition and all other desired descriptions such as their phonetic (tonic) pattems, their syntactic properties and distributions and their intrinsic properties as far as our linguistic intuition can possibly reach. 20,000 compounds have been

selected for observation according to the algorithms stipulated in Section 1. A tonic

description and some observations in terms of phonetic properties of the compounds selected by the algorithms appears in Section 2. Section 3 gives an account of the intrinsic properties of those compounds with a full typology for their classifications. Section 4 through section

12 will provide a detailed description of their intrinsic properties in terms of semantics and their corresponding classifications. For each class or type, a statistical result is given in

percentage. Appendix 1 enumerates all the possible candidates for each class or type.

Appendix 2 gives a descending frequency of the commonly-used han-zi (characters) occurring in compounds.

The database for the project includes a computer dictionary consisting of 30,000 featurized

Chinese compounds. The programme was written in Fortran and built into the Coombs

Computer system (DEC- 10) at the Australian National University. The programme involved

in the project is the Oxford Concordance Program, Version 4.

Since most of the Chinese compounds are bisyllabic, compounds with four syllables, that is,

A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII International Congress for Asian and North African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).

(2)

idioms are not witiiin tiie coverage of tlie present project and are left to be dealt with elsewhere. Trisyllabic compounds will be studied separately in the final section.

The pap)er is divided into the following sections:

0. The idiosyncratic aspects of Chinese constitute some serious problems in the

construction of its computer database. These problems can be ultimately attributed to the following characteristics:

(1) (i) implicitness in sentential segmentation

(ii) implicitness in lexical segmentation

(iii) irregular distribution of empty or deleted elements.

Among these linguistic idiosyncracies, the first two are mingled together, thus blurring the modularity of syntax as distinct from lexicon, i.e. the distinction between data and

operation. No English-competidve computer project can be realized until such

distinction is explicitly made. To solve these problems we concentrate on the

behaviour of Chinese compounds, which we find to be one of the major causes for (i) and (ii) with its arbitrary selections and featurizations. As a preliminary work, we

have built up a dictionary of 20,000 compounds into the computer system (DEK 10)

at the Australian National University. The programme is based on the selectional and featurizing principles which we shall address in this paper.

1. The Selection of compounds. The selection principle subsumes the exclusions in (2)

and the algorithms in (3):

(2) a is not a compound if it is (i) a proper noun (ii) a technical term

(3) Algorithms:

Given a bisyllabic string ab, ab is a compound if

(i) the occurence of ab is exhaustively restricted to the occurrence of a'b' in aba'b' or a'b' ab, where a'b' is another bisyllabic string:

and

(ii) the candidates for a/b in the occurrence of ab is analogically enumer¬

able.

2. Each selected compound is featurized in terms of its syntactic properties and its

intrinsic properties. Assuming that syntax is a modular process consisting of two

interactive components, of which one is called inherent component responsible for a deriving syntactic representation and the other is called derivational component

derived from the former, the syntactic property of a compound is determined by its

distribution in the inherent component while its intrinsic features are specified as far as human intuition can reach and natural language understanding requires in terms of syntax, semantics and pragmatics if possible.

3. In case a compound exhibits more than one syntactic property, we choose one and

only one as effective syntactic property at the same time and leave the rest to be

specified as features.

4. The featurization is general-purposed. Since the compounding elements in a compound

(3)

Section 15

contribute to the meaning of the whole sentence where they are embedded,

featurizations are represented in semantic relations among themselves and related to other potential constituents.

5. Ambiguous compounds are treated as different mono-refential compounds.

Reinhard Wonneberger

TEX for Philological Typesetting

TEX is a document processing program providing professional typesetting quality as well as source and result exchangeability on a variety of computers from personal to mainframe [1,4].

LaTEX is a macro package allowing logical markup of documents and document support

functions [2,3,14]. All relevant information is covered by a Newsletter of the worldwide TEX

Users Group (TUG) [5]; for free initial information apply to: TEX Users Group, P.O. Box

9506, Providence, RI 02940, USA.

Though TEX was mainly developed to typeset mathematics, there is a growing interest in

philological applications [9 and other articles in TUGboat [5]]. The typesetting process of a theological book containing also Greek and Hebrew [6] is described in [7,8,10], and TEX's

use for a new edition of Biblia Hebraica is discussed in [13]. In [11], the method of

Normaltext [12], developed specifically for philological research on ancient texts, is

implemented using LaTEX and TEX.

[1] Donald E. Knuth: The TEXbook. A Document Preparation System. Computers &

Typesetting / A. Reading, Massachusetts etc.: Addison Wesley 1986. ISBN 0-201-13447-0

[2] Leslie Lamport: LATEX. A Document Preparation System. Reading, Massachusetts

usw.: Addison-Wesley 1986, ISBN 0-201-15790-X

[3] LATEX. Local Guide.

[4] Norbert Schwarz: Einfiihrung in TEX. Bonn usw.: Addison-Wesley 1987. ISBN 3-

925118-25-X

[5] TUGboat. The TEX Users Group Newsletter. Providence, Rhode Island Bd. Iff 1980ff.

[6] Reinhard Wonneberger / Hans Peter Hecht: Verheißung und Versprechen. Eine

theologische und sprachanalytische Klärung. Göttingen: Vandenhoeck & Ruprecht 1986.

[7] R.W.: "Verheißung und Versprechen" — A third generation approach to theological

typesetting. Jacques D^sarm^nien (ed.): TEX for Scientific Documentation. Second

European Conference, Strasbourg, France, June 1986. Lecture Notes in Computer Science

236. Beriin/Heidelberg/London/etc: Springer 1986. ISBN3-540-16807-9-ISBN0-387-16807- 9. p. 180-198.

[8] R.W.: Stream lists and related list types for LATEX. TUGboat 6 (1985/3) 156-67.

[9] R.W.: Towards a TEX Philology Group. TUGboat 7 (1986/3) 132-133.

[10] R.W.: Chapter Mottos and Optional Semi-Parameters in General and for LATEX.

TUGboat 7 (1986/3) 177-185.

A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and Nonh African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).

(4)

[11] R.W.: Typesetting 'Normaltext' TUGboat 8 (1987/1) 63-72.

[12] R.W.: Normaltext und Normalsynopse — Neue Wege bei der Darstellung alttestamentli¬

cher Texte. Zeitschrift für Sprachwissenschaft (ZS) 3 (1984/2) 203-233. Forsch.ber. (siehe

1984) 01.010.07 Referate: ZAW 97 (1985/2) S. 273 (G.W.).

[13] R.W.: Überlegungen zu einer maschinenlesbaren Neuausgabe der Biblia Hebraica

Stuttgartensia. Association Intemationale Bible et Informatique (ed.): Actes du Premier

Colloque Intemational "Bible et Informatique: le texte", Louvain-la-Neuve (Belgique) 2-3-4

septembre 1985. (Traveaux de linguistique quantitative 37 = Debora 3) Paris / Geneve:

Champion-Slatkine 1986, 363-379. ISBN 2-05-100769-1 ISSN 0773-3968

[I4]R.W.: Kompaktführer LATEX. Bonn etc.: Addison Wesley 1987. ISBN 3-925118-46-2.

Kai-Uwe Günther

Wolfram W. Latsch

Transcription Program for Writing and Printing Ethiopic Script

This program was devised on a Sinclair QL computer (385 kBytes RAM, 68008 proc.) for

text processing of Amharic and Ethiopic texts.

The main idea was to keep the costs as low as possible and also to find the easiest way of

programming. This was found to be done by using Basic, it being the best known language

and thus making it possible for the user to modify the program for individual use.

The hardware required (Sinclair QL, Pinwriter, monitor/TV-set) does not exceed $ 600.

At the present stage the program already incorporates the main text processing features

needed for laying out Ethiopic texts, either in roman or Ethiopic letters.

Originally, the program was developed for a blind person who was hitherto not able to write

anything but Braille script, and the program was equipped with various sound features in

order to make orientation on the screen jxjssible. The input is controlled and mistakes in key combinafion are acousfically indicated.

Ethiopic, being a syllabic script, is put in by a combinafion of consonants and vowels.

Exceptions from this dual system were made necessary when using labiovelars (e.g. hwe or

hwa) or when Sf)ecifying different letters with the same phonetic value, which is done by

prepositioning a number.

The intricate Ethiopic number system has been incorporated in the program.

A speech synthesizer is in preparation as an additional possibility for the blind user to control the input. The Ethiopic language is made up of phonetically more or less isolated syllables, a feature which makes it possible to agglutinise these entities acoustically without completely obscuring the meaning to the listener, as would be the problem with English. It is planned to be make available this addition at a cost of about $ 70.

Compatibility with a VERSA-BRAILLE computer is desirable and is near to completion.

The basic stmcture of our program makes it possible to transcribe roman into any syllabic script after minor alterations.

A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and North African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).

(5)

Donald A. Becker

The Use of Microcomputers arui Dot Matrix Primers in Printing South Asian Texts

Two approaches to the production of camera-ready manuscripts in South Asian writing

systems were outlined in this presentation. The first approach utilizes an IBM PC to drive a Toshiba dot matrix printer. Specially constructed printer driver software transforms ASCII files of Romanized Hindi, Sanskrit, Telugu, Urdu etc. into printouts in the appropriate script.

Using a Romanization for input purposes has several advantages, the most important of which are increased typing ease and the possibility of printing the same text file in totally different

writing systems. The second approach employs an Apple Macintosh computer and Apple

ImageWriter printer. The chief advantages of the Macintosh approach are its emphasis on the WYSIWYG principle ("What you see is what you get") and its ability to manipulate the size and appearance of individual fonts.

Print specimens of several different South Asian scripts produced with each of the two

hardware configurations were presented. The construction of printer driver programs for the

IBM PC was summarized and the use of Macintosh font editors was outiined. A simple

strategy for the construction of alphabetizing utilities for non-Roman scripts was presented

and some glossary-constructing programs based upon this strategy were illustrated. The

presentation concluded with a discussion of keyboard-customizing software for the Macintosh and of a means of simplifying the process of data entry in Macintosh fonts for the complex writing systems of South Asia.

Klaus Boekels

Using a Personal Computer for Wordprocessing and Database Management of

Oriental Languages

Working with the personal computer opens up a new horizon for scientific work. Every area

of research in Oriental languages with a personal computer is connected with different specific problems. One of the main problems is the editing of texts in Oriental languages and their analysis in a database. There are three basic problems to be mentioned:

1. An enhanced set of characters, which is not accessible in the normal mode of editing.

2. Different writing direction in the area of Semitic languages and the problem of many

ligatures in Sanskrit and other languages.

3. Different order of the alphabet.

There exist different solutions to these problems.

One way to overcome these problems, is to work with a text programme which provides

many different character sets by working in graphic mode.

Another way to overcome them is to transliterate the texts.

Yet another way is to use a board which allows the downloading of user-defined fonts.

What is essential is the interchangeability of text and database. For linguistic research the text is a database and the database is a text. It must be possible to analyze texts by importing them

A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and North African Studies, Hamburg, 25th-30lh August 1986 (ZDMG-Suppl. 9).

(6)

word by word into a database, and then to read in the formatted results into a text

programme, where a glossary or a list may be prepared out of this material for a final

outprint. When interchangeability is not assured, it is impossible to retrieve and analyze texts after their input.

Accordingly, working with a text programme which allows the use of characters in graphic mode is no solution for linguistic work with Oriental languages, because the data cannot be exported into a database without great problems.

The method of transliterating texts using the characters provided by the character ROM

results in text that can be read only with difficulty.

The best way of dealing with the multiple characters in Oriental languages is that of using a

graphic board which allows the modification of characters by using a RAM font. In this

Ramfont you can define the characters used in the intemational transcription of the Oriental language to be used. This input can then be read into a database and for the final output be converted into the original script. In this way the difficult problem of different writing directions and different characters in different positions is easily solved. The data are put in and retrieved in the transcription of the specific language and in the final output the original script is converted out of the transcription. So you have a transcription easy to handle, which optionally can be transformed into the original script. I wrote a programme, which converts Arabic and Persian in transcription into the original script with the combined letters as vowels and shadda, which in the normal mode of typing are not accessible. In the same way it will be possible to provide solutions of this kind for other Oriental languages.

Urs App

Scholarly Use ofa Japanese Microcomputer System

Multilingual word- and dataprocessing which involves European languages as well as Japanese and Chinese is needed by scholars and institutions, but no comprehensive and satisfactory solution has been found so far. The system presented by the speaker is not ideal either, but it functions well within certain limits, and it necessitates only moderate investment in hard- and software.

Hardware:

— NEC 9801 VM2 or similar model with inbuilt kanji chips, 384K memory, V30

processor 8/10 Mhz, two 1.2 MB drives, 640x400 display, and a 2 MB RAM board

to facilitate multitask work)

— EPSON VP 80K or similar model with kanji chips (JIS level 1-1-2), italics, and the

possibility of loading user-defined characters Software:

— Japanese MS-DOS v. 2.11 (includes utilities to create new characters) which accepts

Chinese/Japanese characters on system level

— VJE device driver (VACS Corp. Tokyo) for convenient kanji input

— MXI-Plus RAM-disk driver (Megasoft) for faster work

A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXll Inlemational Congress for Asian and NorUi African Studies, Hamburg, 25th-30lh August 1986 (ZDMG-Suppl. 9).

(7)

599

— Wordstar 2000 J (Twinstar) v. 2 (MicroPro Japan) for multilingual wordprocessing

(windows, footnotes, etc.)

— dBase II or III (Ashton-Tate), whose Japanese versions also accept Chinese and

Japanese characters

— ConCur 98 (VACS Corp., Tokyo) for multitask operation (for instance running

Twinstar and dBase 2 concurrently) Advantages:

— With this basic setup, it is possible to write multilingual articles and create databases which consist of or include Chinese and Japanese characters. Top text search speed

on RAM is well over 10000 Chinese characters per second. Multitask capability

allows database or text search work while writing.

— The flexible VJE system assures easy Japanese text input (from roman letters).

Enlargement of the basic dictionary (partly by the speaker) allows speedy input of Chinese or pre-war Japanese characters using the four-comer system, pinyin with and without tone, and radical number with stroke count.

— Specialized dictionaries can be merged or substracted from existing ones.

— The use of intemal standardized character codes (JIS; Japanese Industrial Standard)

assures easy transfer of data between different systems (incl. PCs and mainframes) and stability in the future.

— The NEC 9801 series dominates the Japanese PC market; thus continuity in the future

can be expected.

— In the present absence of scanners that handle this kind of text, a mass of book

manuscripts, catalogues, bibliographies, Chinese texts etc. which are today input on wordprocessors in Japan can be made available as data for databases.

— Easy creation of up to 188 additional characters is possible on the system level. The

impact of the limited number of characters can be softened considerably if much used but lacking characters are thus created.

Disadvamages:

— The gravest disadvantage is the limited number of available characters: only 6350

Chinese characters. For Sinological work, this is never sufficient, and in sotne cases Japanese post-war simplifications must be used. In this way, ordinary classical Chinese

texts can be written without much trouble; if the most frequent 188 additional

characters for any one text are created, a great number of texts can be input

completely.

— While Russian and Greek letters can be used, it is still somewhat awkward (though

possible) to write French, German etc. Sorting which includes such characters may

produce incorrect alphabetical order.

— Machine memory is limited to 640 KB; with the size of these programs, it is for

instance not possible to use Twinstar and dBase III concurrentiy, but Twinstar and dBase II work well.

(8)

Hardware service for NEC products, though available in the West, is possibly not stronger than sales figures.

Uwe Glessmer

Greek, Hebrew and Other Fonts in Theological Texts

The lecture is divided into three main sections (followed by the demonstration of a

programme):

1. Short description of the famous old programme WordStar with its possibilities of text

processing: editing and correction of manuscripts, planning of layout with different latin character sets and pitches, underlining, bolding and italicisation as well as indenting, headers and footers etc.

2. To use Wordstar for scientific text processing page-orientated footnotes and different non-latin character sets are a desideratum. An enlargement to integrate both into this system is possible by using a dot-matrix printer with a special driver programme

{Drucke). Such a programme in combination with a dot-matrix printer with graphic

capabilities is able to print every desired character in graphic mode. A special pattem

in an 8*8-grid must be designed for each new foreign letter and conventions for

application in WordStar must be defined. In this way e.g. Hebrew (with automatic retroversion from right to left) and Greek as well as other fonts of scientific symbols can be integrated into this "normal" text processing system.

3. The third step concems the preparation of similar character sets for a high quality

printer (such as a laser-printer) when they are not available on the market for Oriental studies. Here a single letter is also defined as a small picture in a grid of points. But the resolution is extremely high: 300 dots per inch. So the definition of an appropriate bitmap is hardly done by hand because 1500 single pixels for each character must be defined in a matrix of 50*30 dots. To avoid this laborious manual work a much easier procedure of scanning (i.e. digitization or electronic fotograph) of printed prototypes

can be chosen. Printed pages can serve as models to develop laser-printer fonts for

"exotic characters". The way to manipulate such scanned data and to isolate single

character bit-maps of a whole scanned page is shown by a sample programme. The

source code (written in Turbo Pascal) is put into the public domain and available on the "Hamburg-Souvenir-Diskette" which we prepared as a gift to each participant of the section "personal computers and Oriental studies" (available through the author or R.E. Emmerick).

Hans-Peter Vietze

Ein Beispiei des Nutzens rechnergestützter Lexikographie

Das "Wörterbuch Deutsch-Mongolisch" (H.-P. VIETZE, Z. DAMDINSÜREN, G.

LUWSAN, G. NAGY, Wörterbuch Deutsch-Mongolisch, Veriag Enzyklopädie Leipzig,

1981) mit ca. 80 000 Wortstellen wurde mit Hilfe eines Computers "umgedreht", so daß die deutschen und die mongolischen Eintragungen ihre Plätze vertauschten und die mongolischen

A. Wezler/E. Hammerschmidi (Hrsg.): Proceedings of ihc XXXII Inlernalional Congress for Asian and North Afriean Studies, Hamburg, 25lh-30lh August 1986 (ZDMG-Suppl. 9).

(9)

Stichwörter, aber auch alle nachgeordneten Ableitungen, Phraseologismen und Anwendungs¬

beispiele, nach dem mongolischen Alphabet sortiert an die linke Seite der Wortlisten rückten.

Natürlich lag mit den ausgedruckten Listen nicht ein komplettes neues mongolisch-deutsches

Wörterbuch vor. Diesen Idealfall gäbe es nicht einmal bei der Umkehrung eines Wörter¬

buches zu nahe verwandten Sprachen. Wegen der unausbleiblichen nichtidiomatischen

Übersetzung deutscher Idiome, der Tatsache, daß Anwendungsbeispiele zu deutschen

Stichwörtern meistens nicht gleichzeitig optimal zu den mongolischen Äquivalenten passen, sich häufig akkumulierender Belege und der auf Grund Tausender Neu- und Erstübersetzun¬

gen mit z.T. aufwendigen Paraphrasen (z.B. balildach 'die dreifache Fußfesselung des

Pferdes durch einen zusätzlichen Riemen zwischen den Vorderbeinen verstärken' u.ä.) waren nur ca. 50% des ausgedruckten Materials direkt verwendbar. Die restlichen 50% der Listen bedurften einer umfangreichen Nachredaktion.

Der Nutzen einer derartigen Computerarbeit wird zunächst durch die Potenzen von

Textverarbeitungs- und relationalen Datenbankprogrammen selbst erreicht. Diese Programme sind für die lexikographische Arbeit bei möglichst geringer Belastung des RAM-Speichers

in geeigneter Form zu kombinieren, da die Datenbankprogramme in der Regel über keinen

guten Editor verfügen.

Alle Vorzüge der rechnergestützten Textverarbeitung wie Fehlerkorrektur auf dem

Bildschirm, Löschen, Ersetzen und Einfügen von Buchstaben, Wörtem, linearen Textteilen

und Spalten, Layout-Gestaltung, Schriftartenwechsel, Ersetzen von kürzeren Textstellen mit verschiedenen Optionen und vieles mehr, kommen bereits bei der Arbeit am Rohmanuskript voll zur Geltung. Die Datenbankprogramme bieten mit mächtigen Befehlen wie z.B. "replace all ... with ... for ..." Möglichkeiten, die für die lexikographische Arbeit geradezu maßgeschneidert sind. Die Ursache hierfür liegt in der praktisch immer wieder erwiesenen und selbst bei bester konzeptioneller Vorarbeit nicht vermeidbaren permanenten Notwendig¬

keit der Revision, was dazu führt, daß jedes Wörterbuch bis zu Dutzenden von Malen wieder

von vom durchgearbeitet werden muß.

Den Hauptvorteil der Computerarbeiten sehen wir jedoch nicht in den editorischen Lösungen, sondem in der problemlosen Nachnutzung der in das "Wörterbuch Deutsch-Mongolisch"

investierten Forschungsarbeit für das "Wörterbuch Mongolisch-Deutsch". Diese Forschungen

bezogen sich im wesentiichen auf die Valenzangaben und auf die Klämng spezieller

grammatischer Probleme. Ohne die Hilfe eines Computers hätten die neu erarbeiteten

Valenzangaben zu ca. 6000 Wörtem manuell gesondert aufgelistet bzw. verzettelt und dann

an verschiedenen Stellen der Kartei zum neuen Wörterbuch eingefügt werden müssen.

Wie die grammatischen Angaben, so blieben auch die Sachgebietshinweise (Lit, Zool, Phys

usw.) bei der "Umkehmng" des "Wörterbuches Deutsch-Mongolisch" erhalten. Dadurch ergibt sich die Möglichkeit, kleinere Spezialwörterbücher zu jedem beliebigen Sachgebiet automatisch zu erstellen.

Schließlich sei darauf verwiesen, daß das einmal auf Datenträger gespeicherte Wörterbuchma¬

terial mit Hilfe eines Computers nach vielen anderen Gesichtspunkten, auch nach anderen

Programmen weiterverarbeitet, d.h. ausgewählt, ausgezählt, kollationiert und sortiert werden

kann. Sofort möglich werden Distributionsuntersuchungen von Graphemen, Morphemen,

Wörtem und Syntagmen, die Extraktion von Beispielkomplexen zu vielen morphologischen

Arbeiten, sprachstatistische Untersuchungen, die automatisierte Herstellung rückläufiger

(10)

Wörterbücher, die Ermittlung grammatischer, semantischer oder phonetischer Gesetzmäßig¬

keiten bei feststellbaren Häufungen entsprechender Phänomene u.a.m.

Schlußfolgerung: Bei der jetzt zur Verfügung stehenden leistungsfähigen Daten verarbeitungs¬

technik und den benutzerfreundlichen relationalen Datenbankprogrammen sollte die

Verwendung eines Computers bei jeder lexikographischen Arbeit zumindest geprüft werden.

Boris Oguibenine

First Results of Vedic Grammar Processing by Computer

The incentive for the computer treatment of Vedic grammar has been the compilation of an

index of mythological motifs in the Rgveda hymns. An utterance with two slots filled by a

noun and a verb respectively is considered as the linguistic implementation of a mythological

motif; this bipolar structure comprehensively reproduces the content of the minimal

mythological information provided in the hymns.

In terms of computer processing the recognition of the data can be made by distinguishing two language categories: the verb forms and the noun forms.

Distinguishing the categories signalled in the occurring words is tantamount to discovering their changing features carrying the grammatical meaning and to leaving aside the features which also change, but which have only the lexical meaning. In the language of the hymns the categories are signalled by the cumulative morphs, i.e. by the sets of the language signs

which carry simultaneously several grammatical meanings. In Vedic the cumulative morphs

occur most frequentiy at the end of the words. This means that using graphic representation of the Aufrecht edition of the hymns the scope is to analyze the words proceeding from right to left and to disclose the minimal adequate information about the category status of each Vedic word.

The main question is thus: how long must the final graphemic/phonemic sequences be to

enable the definition of the categorical attribution of the respective words? Examples are shown in the paper.

Some of the specific problems of Vedic and of general linguistics are worth mentioning:

(1) There are striking analogies between the proposed technique and the technique of the

Ancient Indian grammarians (Vedic Prätisäkhyas and Panini's system). They need of course further investigation.

(2) The sequences enabling identification belong to the so-called sub-morphic level although actually linguists do not have much faith in the interrelations between the smallest fragments of the grammatical signifiant and their signifie. In a full-fledged description of a language the analyst must take into consideration the cases in which a part of the morph, an individual

phoneme (and respectively an individual grapheme) becomes a meaning carrier. In my

analysis the sub-grammatical sequences are the parts of the morphs which do not belong to

either morph at the juncture of the two morphs; they comprise morphs preceded by a unit

belonging to another morph so that the morph boundaries overlap.

(3) The study of the subgrammatical sequences is of prime importance for the analysis of the Vedic poetic language.

A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII International Congress for Asian and North African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).

(11)

603

(4) The problem of what is traditionally called the transfer stems in Vedic needs serious reconsideration. Solutions for individual cases are suggested.

The contribution is not meant to enrich the computer science, but to show that the use of

computers furthers new approaches to the traditional linguistic and philological problems.

(12)

(13)

605 G.A. Oddie

Christian Conversion Movements in India: Some Recent Interpretations

Conversion movements involving changes in religious affiliation, whereby individuals or

groups opt out of one community and join another, have long been evident in India. In recent years a number of scholars, including R.L. Hardgrave, and more recenUy still, R.M. Eaton and F.S. Downs, have attempted to explore and explain these movements.'

One possibility which is beginning to emerge from this and other research is that there may be important differences between what has happened among Hindus converting to Christianity within the caste system and movements which have taken place in the more loosely structured and less oppressive tribal societies. It is well known, for example, that a high proportion of

Hindu converts to Christianity in the nineteenth and twentieth centuries were drawn from

among the depressed classes, and Hardgrave, referring to the nadars, one of the more

marginal groups in Hindu society, argues that the missionaries offered them not only the

gospel of a new religion "but also the possibility of secular salvation in release from the fetters of the tradition which had for centuries burdened them with social disabilities and economic dependence" . In focussing their attention on the hill peoples of north-eastem India,

however, Eaton and Downs were exploring Christian movements among those who, in

Eaton's words, had "never experienced pariah status either intemally or in relation to

outsiders" and the emphasis of these historians has correspondingly been much less on

exploitation and social disabilities as a factor in conversion. They both stress the importance of changes and disruption in tribal societies and argue that these circumstances greatiy

stimulated the search for a more satisfactory and meaningful world view including the

acceptance of Christianity.

While these somewhat different interpretations of conversion movements seem to reflect

differences in the social conditions which affected responses to Christianity, it would be a

mistake (as Eaton and Downs imply) to ignore the complex of other factors involved. The

existence of an hierarchical and oppressive social system may, for example, have created a predisposition among those at the bottom of the social scale to search for a way out, but it

does not explain why some members of the same caste converted and others (apparentiy

equally affected by social disabilities) did not. Clearly other factors, such as family

considerations, the effects of travel or education, economic ties and the ability to respond (without encountering crippling opposition), the attractiveness or otherwise of the local Christian community, and local traditions and views of religion, have to be weighed up and considered in any satisfactory interpretation.

' Hardgrave, R.L., The Nadars of Tamilnad. The Political Culture of a Community in

C/iOT^e.Berkeley and Los Angeles, 1969, esp. ch. 2; Eaton, R. M., "Conversion to Christianity among the Nagas, 1867-71", Indian Economic and Social History Review, Vol. XXI, No. 1, January- March, 1984, 1984, pp. 1-44; Downs, F. S., Christianity in North East India. Historical Perspectives.

Delhi 1983.

A. Wezler/E. Hammerschmidt (Hrsg.): Proceedings of the XXXII Intemational Congress for Asian and North African Studies, Hamburg, 25th-30th August 1986 (ZDMG-Suppl. 9).