Grammar Engineering: Problems and Prospects - Report of the Saarbrücken Workshop on Grammar Engineering

(1)

published as: CLAUS Report number 1, Saarland University, July 1990

G ^RAMMAR E ^NGINEERING : P ROBLEMS AND P ^ROSPECTS

Report on the Saarbrücken Grammar Engineering Workshop

Gregor Erbach and Hans Uszkoreit University of the Saarland and

German Research Center for Artificial Intelligence

Abstract

The "Saarbrücken Workshop on Grammar Engineering" took place from June 21st to 23rd, 1990. The aim of the workshop was to bring together for 3 days of intensive discussion a number of people with practical experience in the development of large- coverage grammars and researchers who have investigated concepts and tools for grammar development. The workshop focused on the methodology of grammar engineering, testing and evaluation of grammars, the problem of distributed development, the formalisms and tools needed, and grammar maintenance and reusability. A variety of approaches to grammar writing were presented. Prerequisites for effective grammar engineering were identified.

Introduction

Purpose and Scope of the Workshop

At Coling 1988 in Budapest, M. Nagao organized a panel discussion on "Language Engineering: The Real Bottleneck of Natural Language Processing." The main question was "How can grammar writers use linguistic theory?" Indeed, linguistic engineering constitutes a serious bottleneck in the development of useful NL systems. On the other hand, recent developments in theoretical linguistics have reduced the distance between linguistic theory and linguistic engineering.

(2)

A major problem of linguistic engineering is the lack of appropriate concepts, skills, methods and tools for this special type of knowledge engineering. In order to get a clearer understanding of the problems involved, the workshop on Grammar Engineering was organized by the authors of this report.

Grammar Engineering is the development of linguistic knowledge bases with a broad coverage to be employed in natural language systems. Our workshop focused on the development of syntactic grammars.

The Grammar Engineering bottleneck seriously hinders the commercial exploitation of NL research for product development. It also limits the value of research systems as simulation devices for human linguistic competence and performance, that could be used for developing, testing and improving linguistic theories.

There are four observations that add evidence to this claim:

1. The grammars and the linguistic technology in NL products that are on the market today are usually 10 to 15 years old.

2. There are no NL products on the market yet that exhibit sufficient coverage, i.e., something close to the linguistic competence of the human language user.

Extending existing large grammars constitutes a real problem.

3. For every new product, the grammar is written from scratch.

4. There are no means for specifying the coverage of grammar or for comparing systems according to coverage.

These observations indicate serious problems that need to be solved before essential progress can be made. Thus the workshop dealt with methods, tools and formalisms needed for Grammar Engineering, not with the development of specific applications and products.

The participants contributed expertise in three relevant research areas: development of large grammars, theoretical concepts for grammar development, and tools for grammar development. We are very pleased that the developers of some of the largest computational grammars ever written participated in the workshop.

(3)

List of Participants

Tania Avgustinova, Bulgarian Academy of Sciences, Sofia Igor Boguslavski, USSR Academy of Sciences, Moscow

Stephan Busemann, German Research Center for AI (DFKI), Saarbrücken Dagmar Dwehus, IPSI GMD, Darmstadt

Gregor Erbach, University of the Saarland, Saarbrücken

Karin Harbusch, German Research Center for AI (DFKI), Saarbrücken Robert Ingria, BBN, Cambridge, MA

Mark Johnson, MIT, Cambridge, MA

Martin Kay, Stanford University and XEROX Palo Alto Research Center, CA Esther König, University of Stuttgart

John Nerbonne, German Research Center for AI (DFKI), Saarbrücken Klaus Netter, University of Stuttgart

Karel Oliva, University of the Saarland, Saarbrücken

Stanley Peters, Center for the Study of Language and Information, Stanford, CA Bettina Rehse, University of the Saarland, Saarbrücken

Jane Robinson, Palo Alto, CA

Stefanie Schachtl, Siemens, München

Paul Schmidt, IAI EUROTRA-D, Saarbrücken

Petra Steffens, IBM Germany, Institute for Knowledge-Based Systems, Stuttgart Harald Trost, German Research Center for AI (DFKI), Saarbrücken

Hans Uszkoreit, DFKI and University of the Saarland, Saarbrücken Wolfgang Wahlster, DFKI and University of the Saarland, Saarbrücken Susan Warwick, ISSCO, Genève

Annie Zaenen, XEROX Palo Alto Research Center, CA

(4)

Magdalena Zoeppritz, IBM Germany, Inst. for Knowledge-Based Systems, Heidelberg

Karen Jensen, IBM Hawthorne and Bethesda, who was unable to attend the workshop, sent a summary of her opinions on the topic.

Report on the Workshop

How does Grammar Engineering relate to Theoretical Linguistics?

If linguistic principles are sensible, you will rediscover them as practical necessities.

(Annie Zaenen)

The participants of the workshop reported general dissatisfaction with using analyses proposed in the linguistic literature for grammar writing. The major problems encountered were:

• Incorrectness of the analyses. This is a serious problem for languages like Bulgarian, which have not been studied as extensively as English or German.

• Lack of explicitness, especially in traditional grammars, where some necessary distinctions are not made.

• Not enough attention to "messy details" like dates, names etc. Linguistic theory is too much concentrated on the core grammar, and neglects the periphery.

• Problems with implementation, as exemplified by "movement" accounts.

• Insufficient coverage. Linguistic theory does not provide coherent descriptions of large fragments.

A serious problem is the shortage of well trained computational linguists with expertise and experience in the area of grammar writing. Theoretical linguists learn to develop theories of grammar. Very few have learned to design grammars for larger fragments of the language that would really work.

The main reason is the lack of methodology that could be taught. Another reason is the relative recency of the confluence of computational and theoretical linguistics. Until

(5)

very recently the analyses and languages of the computational linguist were quite different from the ones used in theoretical linguistics. Only very few researchers were able to transfer results from linguistics to computational linguistics.

The flow of information should not only be from theoretical linguistics to grammar development, but grammar development should produce linguistic descriptions of high quality, and thus have an impact on linguistic theory.

Methodology of Grammar Engineering

As the grammar gets larger, the number of rules written per week decreases.

(Wolfgang Wahlster)

Ideally, developing a grammar should start with defining the functionality and coverage of the grammar. In practice, however, there are no established methods for determining the coverage that is needed for a specific application, and for the specification of coverage. It was suggested that the coverage should be specified semantically rather than syntactically because a user of a natural-language processing system cannot be expected to use only certain syntactic constructions, but can be expected to use only a specific semantic domain.

The goals of grammar development should be clearly specified: the coverage, the domain of application, and the output of the grammar. The grammars presented on the workshop provided as output phrase structure trees, f-structures, dependency structures, and semantic representations. There was general agreement that some semantic representation as output is important.

Stanley Peters stressed the need for a generic semantic interface language which would serve three purposes:

1. It is possible to specify exactly what the output of the grammar for a set of sentences should be. It would thus facilitate testing because a test suite could contain pairs of sentences and semantic representations (see the section on testing and evaluation below).

2. The performance of different grammars would be comparable, because they produce similar output.

(6)

3. Different grammars can be used for an application, because they would produce the same output.

There is no systematic method for grammar engineering. There was agreement that some planning ahead is necessary for the development of solid grammars, but that there is no foolproof method for working one's way from the specification of the coverage to the final grammar — linguistic intuition is important. However, "legislation", i.e., carefully documented design decisions, was considered beneficial.

Jane Robinson advocated an empirical approach, not tied inflexibly to one currently available formalism or theoretical school. The availability of linguistic data, for example representative texts and concordances, is very important for grammar development.

Some methods can be taken over from programming. This is true about structured programming, regular testing and good documentation.

Jane Robinson suggested the principle that there should be no syntactic ambiguities that do not correspond to semantic ambiguities.

While Klaus Netter insisted that grammars should be developed without interruption, Stefanie Schachtl reported that she could easily resume work on her grammar after an intermission of several weeks.

Another problem for grammar engineering is to determine exactly which linguistic data should be described. Stanley Peters suggested statistical analysis of large corpora in order to find co-occurrences of phenomena that would otherwise remain unnoticed.

Another question is whether a large-coverage grammar developed by a linguist should be re-implemented more efficiently for use in a natural-language processing system.

Good documentation is a prerequisite for continuous work, reusability of the grammar, and collaborative work. There are two kinds of documentation: linguistic documentation, which discusses the ideas and principles behind the design of the

(7)

grammar, and technical documentation pertaining to implementation issues, in which all the details and hacks are documented. An example of the first kind of documentation is the description of the ETAP-2 machine translation system, which is published as a book [Apresyan et al. 1989].

As a tool for documentation, Wolfgang Wahlster suggested to use one of the available truth maintenance systems to control interdependencies between rules and tells which rules must be modified together. Jane Robinson believed this to be too intricate a problem, which was dropped in other projects because of the complexity of the task.

Modularization and Distributed Development

Linguists view programmers as slaves on the plantations of their excellent ideas.

(Karel Oliva)

The development of larg e grammars is extremely slow. Existing large grammars have usually been developed by a single person (at any given time), sometimes with very limited assistance from a few coworkers.

In computer programming, modularization has proved to be a useful concept for the distributed development of large programs. A requirement for a module is that its adequacy and correctness can be specified independently from other modules.

No methods exists for efficient distributed grammar engineering since no methods exist for the modularization of grammar. The organization of grammars in rules does not accommodate a useful modularization because the rules are highly interdependent (noun phrases contain verb phrases, verb phrases contain noun phrases, etc.)

More recent linguistic approaches organize the grammatical knowledge in principles instead of rules. Since the principles interact even more closely than the rules do, modularization as it is needed for distributed development becomes even harder—at least at a first glance.

However, the organization of knowledge elements in lattice-based type hierarchies as it is employed in feature unification formalisms offers a very promising scheme for

(8)

modularization. A modularization concept would not only further efficient development it would also boost reusability and grammar evaluation.

There is no general agreement about the division of labor in a natural-language group, but the experiences of the workshop participants can be summarized as follows.

• The linguistic work can be subdivided along the traditional linguistic levels of description: phonology, morphology, syntax, semantics, discourse, pragmatics would qualify as modules. An exception is compositional semantics which should be handled in parallel with syntax. Nonetheless, regular integration on a variety of levels is needed (as it was practiced in the LILOG project, where "milestones" were set at which the entire system was integrated.)

• There is a division of labor between programmers and grammar developers. They should know enough about each other's fields to communicate effectively. In particular, the linguist should have an idea about the parsing problem. If the linguists don't know what they can or cannot expect of the programmers, they tend to use the programmers as "slaves on the plantations of their excellent ideas".

• There was also agreement that work on the lexicon can be given to people other than the grammar developers. The use of abbreviatory devices like templates (or macros) allows the lexicon worker to classify a lexeme as belonging to a certain class, say [transitive-verb, present, 3rd, singular], without having to spell out what these abbreviations mean in the linguistic analysis adopted by the grammar writer.

This method has the advantage that the lexicon is only viewed as a database, and the linguistic analyses can be changed without having to modify every single lexical entry. Only the definitions of templates like "transitive-verb" must be changed.

• Cooperative development of a grammar may help its extensibility and reusability, because problems like "legislation", interfaces and documentation must be taken care of at an early stage in the development.

• Everything in syntax is interdependent so that it is not easy to split syntax into modules. There was no consensus about how to divide the work of syntactic grammar development. In most groups, one person was responsible for the entire grammar.

Several groups reported that they had one person responsible for the noun phrases and one person responsible for the verb phrases. However, this is only possible if the two

(9)

people work together very closely. NP syntax and VP syntax are too closely related to be good candidates for modules.

Igor Boguslavski reported that their work within a dependency-grammar framework was subdivided according to syntactic relations. At Siemens, Stefanie Schachtl reported, coordination was treated as a separate module.

Within stratificational frameworks like Meaning-Text-Theory or the EUROTRA levels of description there have been attempts to have one person responsible for each stratum, but it turns out that these strata are too closely interrelated to be viewed as independent modules.

Jane Robinson suggested modularization according to semantic field rather than syntactic phenomenon. Such semantic fields like time or comparison may then have their syntactic effects and manifestations. Grammars should be extended by adding new semantic fields to the coverage.

There is a need for thorough design and legislation. This means that there should be explicit agreement about which categories and features are used, about the spelling of category and feature names. Likewise, all design decisions and all hacks must be well documented to make it possible for another person to work on the grammar.

There must be clear ownership of files: each file should have only one owner.

Magdalena Zoeppritz said "I think it is not a good idea to share files. You can share ideas and worries, but one person must be responsible for integration." It was suggested that responsibility for a component be separated from control.

Petra Steffens reported synchronization problems with grammar development in a large team: the tools evolved at the same time as the linguistic descriptions.

John Nerbonne sketched two models for the development of natural language systems:

one in which everything is tightly integrated from the beginning (the way large Lisp machine environments were developed), and a more chaotic way in which separate components are developed independently, and then integrated (the way UNIX

(10)

evolved). The latter approach has the advantage that there is more room for creativity within each module, but the risk is higher that the modules will not fit together very well.

Testing and Evaluation

Grammars are like Swiss cheeses when it comes to coverage.

(Petra Steffens)

There is no agreed-upon measure for the size or the coverage of a grammar.

Participants of the workshop reported the sizes of their grammars in terms of bytes, lines of code, the number of rules and/or template definitions, the number of unifications, distinct node descriptions, and a list of phenomena covered. GPSG illustrates that the number of rules per se is not a good measure because some highly schematic rules are equivalent to a very large number of context-free rules, and the latter number can be used to compare GPSG grammars.

As far as coverage is concerned, there are no generally accepted standards for determining the coverage of a given grammar. The participants agreed that test suites are needed that cover a wide range of grammatical phenomena. Corpora are not considered as adequate for grammar testing because they do not contain a systematic sample of phenomena.

In order to control overgeneration, the test suite should contain negative examples of ungrammatical strings. Since not all possible negative examples can be included in the test suite, a generator is needed that produces a representative sample of the sentences licensed by the grammar. This is particularly important if the grammar is to be used not only for analysis, but also for generation. A good test suite should contain at least 500 to 1000 sentences, judging from the numbers that the workshop participants gave.

A problem with the use of test suites is to define what counts as successful processing of the sentences contained in the test suite. Five different criteria were given:

1. Can the sentence be parsed (or rejected in the case of negative examples)?

2. Does it get the right number of parses?

(11)

3. Does it get the correct analysis?

4. Is it assigned the right logical form?

5. Does an application based on the grammar give correct answers or translations?

In our opinion, the first two criteria are too weak. They would be adequate only if the sole purpose of the grammar were to characterize what is a sentence of the language and what is not (observational adequacy). The fifth criterion may blur the distinction between grammar testing and system testing, unless the grammar is hooked into an application whose behavior is thoroughly understood so that any changes in behavior can be attributed to the grammar. This is the approach taken by Hewlett-Packard who have a standard test database.

Criteria 3 and 4 raise the question of how a test suite should be organized. It should not be a list of sentences, but rather a list of pairs <sentence, syntactic analysis> (the approach of the Treebank project) or a list of pairs <sentence, logical form>. The latter would again presuppose a generic semantic interface language, as suggested by Stanley Peters. For machine translation, a list of pairs of sentences was suggested as a test suite.

Stanley Peters suggested to use the "mean time between failures" as a measure for the performance of the grammar, because grammar engineering is interested in developing grammars that show adequate performance for a particular application. The advantage of the approach is that one may use a corpus for evaluation so that frequently occurring phenomena are given higher weight than exceptions. What counts as success and failure may depend on the five criteria given above. The use of semi-automatic statistical methods was suggested. Another related issue is the robustness of the grammar (or parser), i.e., its ability to process ungrammatical input.

John Nerbonne proposed and the participants of the workshop agreed that testing should be taken into the hands of the natural-language community, and not be left to the funders. The reasons are that regular testing is needed for the development of accurate grammars, and that the NL community can do the testing more intelligently.

(12)

Maintenance and Reusability

The term reusability was used in two senses: reusability of grammars for other applications than that for which they were originally developed, and reusability of ideas and analyses for writing new grammars. Jane Robinson remarked that grammar writing can be shortened by looking at other grammars in other formalisms. No one had a neutral language for writing down syntactic analysis, while a theory-neutral lexicon was considered feasible. Martin Kay reported about a morphological dictionary for English, in which each word is associated with a code, which can be interpreted in different ways, depending on the application.

It is not always easy to decide whether an existing grammar should be extended or whether the grammar should be redesigned (the life cycle problem). Klaus Netter reported that their German grammar was rewritten every two years.

There was general agreement that it is useful to have a core grammar, which can be extended in different ways for different applications. Igor Boguslavski reported that they selected and expanded a subset of an existing large grammar for application building. Bob Ingria claimed that the semantics can easily be adapted to a new domain.

There was no doubt that good documentation of the grammar is an essential prerequisite for maintenance and reusability.

In summary, the following points were made about the methodology of grammar engineering:

• Collection of linguistic data is a prerequisite.

• A non-formal description of linguistic phenomena should exist, according to which the coverage of a formal grammar can be specified.

• A generic semantic interface language is needed in order to have a uniform output for different grammars and a criterion for evaluating the correctness of the output.

• Grammar engineering is similar to programming in that structuring, regular testing, and extensive documentation are needed.

(13)

• Linguistic intuition and solid linguistic training are essential.

• Grammar writing should be introduced into (computational) linguistics curricula.

• "Legislation", i.e., careful documentation of design decisions, is useful for collaborative work.

Formalisms

No one knows what to do with an f-structure.

(Annie Zaenen)

The participants of the workshop reported experiences with special-purpose formalisms (Tree Adjoining Grammar, Trace Unification Grammar, Lexical Functional Grammar, Meaning-Text Model) and with general-purpose formalisms (PATR-II, STUF, Prolog), in which grammars based on theories like HPSG or CUG were implemented.

Formalisms have been shaped by practical needs. Jane Robinson talked of the evolution rather than the development of a formalism. For example, parametrized templates were introduced because they were needed by the grammar developers.

General formalisms are preferred to those that are constrained by a particular linguistic theory. There was agreement that formalisms cannot and should not be idiot proof (as a formalism would be in which one can express only what is within the bounds of universal grammar).

Theoretical linguists have the interest of finding the most constrained formalism that embodies universal grammar, while grammar developers need a general formalism, because they also have to describe phenomena that do not belong to the core grammar.

The necessity to handle exceptions can lead to a proliferation of features in a grammar that would otherwise be simple and elegant. General purpose formalisms are also preferred over specialized formalisms because the linguistic theory is modified and developed during grammar writing.

As to expressiveness, M. Zoeppritz proposed that new ideas must be expressible in one place, and not be scattered all over the grammar. Klaus Netter demanded the

(14)

possibility of rule-independent declarations and global specifications. In general, dissatisfaction with stratificational approaches (like EUROTRA) was expressed.

Esther König argued that the current practice puts too much load into feature structures. She suggested that movement phenomena should not be handled within feature structures by means of slash features, but rather by the deductive component of the grammar, as exemplified by extended categorial grammar.

In summary, the following are the requirements of grammar formalisms:

• Formalisms must be declarative.

• Formalisms must be expressive and provide convenient notation; templates and macros are needed.

• Grammatical principles and generalizations must be expressible in one place.

• Formalisms must allow efficient, incremental and bidirectional processing.

• Formalisms cannot and should not be idiot proof.

• General-purpose formalisms are preferred for an explorative style of grammar development. They may be replaced by a special-purpose formalism after certain parameters of the grammar have been fixed.

• The handling of exceptions must be supported.

Tools

There are no types of knowledge that exhibit more complexity and interdependence than grammatical competence. Clearly, the development of large grammars cannot be done with pen and paper only, but tools are also urgently needed for maintaining the consistency of the grammar and for checking its correctness and completeness with respect to an intended fragment. The role of development tools in software engineering cannot be overestimated, but appropriate development tools for grammar engineering are still missing. Although there has been noticeable progress in the design of grammar development environments, existing tools do not support distributed development.

Neither do they offer sufficient facilities for the organization and presentation of the grammar that could help the computational linguist cope with the complexity of the subject matter. The most advanced technologies for working with highly associative

(15)

knowledge need to be exploited. Among them are visually supported knowledge navigation techniques.

Most of the tools in use today consist of a grammar formalism, a parser, and means for inspecting the parse results. Grammar engineering today is based on the edit-parse- inspect cycle: A grammar is written or modified, then sentences are parsed and the results are analyzed in order to evaluate and debug the grammar.

Tools are not always used as intended: the LFG workbench was designed as an educational tool, but it is used as a grammar engineering tool today.

Wolfgang Wahlster has conducted a survey of existing grammar engineering environments (in particular GDE, Wednesday2, TagDevEnv, D-PATR and the LFG Workbench) and compared their functionalities. The results of this survey and the discussion during the workshop led to the following list of requirements of grammar engineering tools:

• Tools must be convenient to use, and include navigation, browsing and help facilities.

• Tools should be unobstructive and not interrupt the creative process. For example, it should be possible to turn off syntax checking and consistency checking while writing a grammar.

• The tools must support grammar administration and be able to keep different versions of a grammar.

• Tools for constant testing of each extension and revision of the grammar are needed.

• Speed is an important factor because the tools are constantly used. A grammar engineering tool should include a fast parser, and allow for incremental compilation of the grammar or for mixing of interpreted and compiled code.

• Reasonable error messages must be provided.

• Structure oriented editors, especially graphic editors for trees or feature structures are useful, because they allow to use the same graphic format for data input (the grammar) and output (the parse results).

(16)

• Macro processors are needed to facilitate grammar writing, and type and template hierarchies to capture generalizations.

• The tools should support documentation of the grammar and provide visually supported knowledge navigation techniques that give access to the different knowledge sources, explanatory texts, and to the parser, generator etc.

• Tools for debugging should include presentation of parse results and partial analyses, the possibility to discover where unifications failed, and the possibility to see which knowledge elements (rules, templates, lexical entries) are responsible for errors and to locate the definitions of these knowledge elements in the source files.

• A tracer, stepper and backtracer were suggested for debugging.

• "Instrumentation of the parser" is recommended to obtain measurements and statistics of parse times, the structures that were built, and the behavior of individual rules.

• Facilities for display and inspection of type and template hierarchies are needed.

• Tools for lexicon development are needed.

• Facilities for comparing files and parse results are needed.

• A generator of representative language samples is needed to control overgeneration.

• Consistency checking of the grammar was considered desirable, but there were doubts about the feasibility. A grammar engineering tool should support

"legislation", i.e., agreements about the names of categories, features etc.

• If structures have been converted to some normal form for processing, a correspondence mapper is needed to display structures in the original notation.

Currently the linguists build their own tools, especially in the United States, but there was general agreement that more effort must be put into the implementation of grammar engineering tools by professional software developers. A grammar engineering tool must meet all the requirements of professionally developed software, e.g., recovery from a system crash, backup copies of files etc. However, the question was raised whether the market for such tools is large enough.

(17)

Wahlster proposed two types of systems: a polytheoretical workbench which can process different grammatical formalisms, but provides a uniform user interface for all formalisms. For cooperative grammar development, a number of workbenches and a workbench server should be connected with fast FDDI Ethernet (100 MB/sec). The tools should support conflict resolution, take minutes of team discussions, keep different versions of the grammar etc., by making use of Computer Supported Cooperative Work (CSCW) technology. The other system proposed by Wahlster is the Linguistic Toolbox, a laptop-based natural language system for linguistic field work. The toolbox should support communication with the workbench located in a different place.

While Wahlster advocated one integrated powerful workbench, John Nerbonne considered a collection of simple tools more useful.

Conclusion

Until very recently, all large grammar development was performed with representation languages and tools that did not permit the direct utilization of progress in theoretical linguistics for natural language processing. With the emergence of declarative grammar formalisms in linguistics, this situation was remedied. Several contemporary feature unification formalisms are shared by theoretical linguistics and computational systems building. They have opened promising new directions in the abstract specification and modular organization of linguistic knowledge.

However, presentations and discussion at the workshop have shown that prerequisites are still missing for four tasks:

1. efficient large scale development (distributed grammar engineering),

2. extending the coverage of large grammars (engineering of very large grammars), 3. recycling of existing grammars (grammar reusability),

4. specifying, evaluating and comparing grammars (grammar specification).

Despite the progress in the area of representation formalisms and development tools, the new means have not yet enabled the grammar developers to overcome the language

(18)

engineering bottleneck. At this stage, large scale collective grammar engineering efforts are highly unlikely to yield short or medium term systems.

The missing prerequisites fall into three classes: concepts, tools and training.

First new concepts are needed for the specification, organization, modularization, and implementation of grammatical competence. The logic-based declarative grammar formalisms offer powerful means for developing these new concepts, but those will not come about without extensive efforts dedicated to this task. Existing engineering tools such as comfortable development environments and tool boxes support state-of-the-art formalisms, but they do not offer means for diagnosis, consistency control and, most importantly, distributed collective development. Engineering tools with such extended functionality can obviously only be built on the basis of the envisaged new formal engineering concepts. Finally, computational linguists need to be trained for the development of grammars according to the new concepts and tools.

Grammar formalisms have changed rapidly and drastically. So do the linguistic processing systems that depend on them. The fast evolution in this area has had the undesirable side effect that existing large grammars are not used anymore in new systems. Often the development stops when the project ends or when the main developer leaves the project. Since the development of grammars is a very costly endeavor, valuable resources are wasted.

A prerequisite for achieving the reusability of grammatical resources are mathematical concepts and a representation language for the abstract specification of grammatical knowledge. An abstract declarative specification language with a clean semantics is needed for the specification of grammatical competence. Current developments in the area of typed feature unification formalisms already move in the right direction. The observed convergence of formalisms could lead to specification standards that may serve as the basis for grammar reusability. Again, this development will not take place without considerable additional efforts.

As a result of the workshop, we feel strongly that the following practical steps need to be taken in order to overcome the deficiencies of current grammar engineering:

(19)

• research projects on the modularization of grammars in close connection with the development of grammar engineering methods,

• research projects on the theory-neutral abstract specification of linguistic analyses and observations,

• development of a generic semantic interface language, which would allow to specify the input/output behavior of a grammar,

• professional implementation of comfortable and powerful engineering tools,

• the large-scale collection, annotation and classification of linguistic data as the basis for evaluation and faster processing models (statistical methods),

• linguistic test suites as the basis for tools for diagnosis and consistency maintenance,

• mandatory courses on grammar development in computational linguistics curricula,

The existence of the desired concepts and tools will undoubtedly boost the productivity of grammar development with obvious implications for the commercial success of computational linguistics research. We are well aware of the enormous efforts in linguistics research still required before computational grammars will approximate the linguistic competence of human speakers. But the current situation where it takes a decade to develop a grammar that covers a language fragment that has been covered before hinders progress in language technology. The way out of this deplorable state of affairs is a new kind of grammar engineering with methods, tools and practitioners as effective as their counterparts in today's software engineering.

Acknowledgments

We want to thank all those who contributed to the success of the workshop. The workshop was financially supported by IBM Germany through the project LILOG-SB conducted at the University of the Saarland at Saarbrücken and by the German Federal Ministry of Research and Technology through the project DISCO carried out at the German Research Center for Artificial Intelligence (DFKI) in Saarbrücken. During the workshop, all talks and discussions were recorded by Petra Schwenderling, who is a

(20)

stenographer in the State Parliament of Baden-Württemberg, and also recorded on tape.

Special thanks to Bobbye Pernice, who proofread the 90 pages produced by the stenographer and transcribed some of the tapes.

We are especially grateful to those participants who provided helpful comments and corrections on earlier drafts.

References

[Apresyan et al. 1989]

Yu. D. Apresyan, I. M. Boguslavski, L. L. Iomdin, A. V. Lazurski, I. V. Pertsov, V. Z.

Sannikov, L. L. Tsinman. Lingvisticheskoe obespechenie sistem’y ETAP -2, Moscow:

Nauka, 1989 [Nagao 1988]

Makoto Nagao (Panel Organizer). Panel: Language Engineering: The Real Bottle Neck in Natural Language Processing, Proceedings of COLING '88, Budapest, pp. 448 - 453.

Grammar Engineering: Problems and Prospects - Report of the Saarbrücken Workshop on Grammar Engineering