Gregor Erbach
Abstract|The purposeofthis paper isto survey there-
quirementsthatnatural languageprocessing(NLP)hason
a programminglanguage,evaluatetowhatextentthey are
satised by various programming logic programming lan-
guages, and inparticular bytheOz language. Itturnsout
thatOzapperstobeapromisingcandidateforNLPimple-
mentations.
Keywords|NaturalLanguageProcessing, GrammarFor-
malisms,Oz
1. Introduction
For a long time, NLP hasbeen ina love-hate relation-
ship with logic programming in general, and with Prolog
inparticular. Let'sconsiderthepositivesidesrst:
Prologisa declarativelanguage
Prolog provides useful data structures such as trees
(terms),lists,andlogicalvariables
Unicationcomesforfree
Search comesforfree
It'snicetoremembertheenjoymentofwritingone'srst
DCGandhavingitparseandgeneratesentencesafterjust
someminutesofdevelopmenttime. Andindeedmanyuse-
ful natural language applications have been implemented
inProlog.
However,afterusingPrologforawhile,onecomestothe
realization thatit isnotas beautifulas itappearsat rst
sight. Thefollowingdrawbacksareencountered:
Prolog's top-down depth-rst backtracking search
strategy is not ideal for NLP applications. In order
todealwiththelargedegreeofambiguity,itisneces-
saryto usetabulation(generallyreferredto ascharts
inNLP)forstoring partialresults[21], [9]. However,
inPrologthisleads toineÆciency.
Typedfeaturestructures [4],thefavouredrepresenta-
tionformalisminNLParenotsupported. Implement-
ing featurestructure unication algorithmsontop of
PrologleadstoeÆciency problemsthatarenottoler-
ableforrealisticNLPapplications.
Onlyrst-orderunicationisavailableinProlog. This
isunfortunatesincestandardapproachesforhandling
ellipsisinNLPmakecrucialuseofhigher-orderuni-
cation[6].
Higher-orderprogrammingisnotsupportedbyProlog.
ThecontrolfacilitiesofProlog(essentiallythecut)are
notsuÆcient.
Finitedomain constraints can be hacked upwith the
`brute-force' encoding[18], but noreally eÆcientim-
plementationisavailable.
It isnot alwayseasy to add newkinds ofconstraints
withoutlosingeÆciency. Thishasbeenourexperience
Gregor Erbach is a researcher in the Computational Linguis-
tics Lab of the German Research Center for Articial Intelli-
gence in Saarbrucken. E-mail: erbach@dfki.uni-sb.de, WWW:
inadding set constraints [15], linearprecedence con-
straints [16] and guarded constraints by making use
of the coroutingmechanism of Sicstus Prolog in the
projectReusabilityof Grammatical Resources[8].
Prologhas no standard built-in support for develop-
ing graphical user inferfaces, which are indispensible
for the development and debugging of large natural-
languagegrammars.
Thereisnogoodsupportfordisjunction, suchas dis-
tributeddisjunctions(cf. thepapersin[23]).
All these disadvantages have led to the development
of specialised grammar formalismsandprocessing models
(PATR[22],STUF [3],CUF[7],ALE[5],TDL[14],etc.).
Whilethesespecialisedformalismsareusefulfordeveloping
anddebugginggrammars,theyareoftentooslowforbeing
usedinrealisticNLPapplications. Moreover,itishardfor
grammarformalismstokeepupwithprogressthatisbeing
madeintheeldofprogramming languages.
Most of the above aspects have been addressed in iso-
lation in logic programming (e.g. tabulation (memoing)
inXSB Prolog [26], constraint handlingin CHIP, feature
structuresinthe -termsofLIFE[1],higher-orderunica-
tionin-Prolog,etc.),buttherehasbeennooneprogram-
minglanguage thatfulllsalltheneedsofNLP.
Recent years haveseen a developmentof \lean formal-
ims": grammarformalismswitha relativelyhigh-levelno-
tationintypedfeaturestructures,whicharethencompiled
intoPrologterms,sothat Prolog'sbuilt-inunicationcan
be used for theunicatation of feature termsat runtime.
These formalims (e.g. the SRI's Core Language Engine
[2],Siemens'LKP,ALEP[19],ProFIT[11]etc.) havebeen
usedwithsomesuccessinNLPsystems,buttheexperience
of developingthe CL-ONE system, which extendsALEP
andProFITwithadditionalconstraints[8],hasshownthat
thisapproach isadeadend.
However,theintroductionofattributedvariables[13]has
madeit possibleto addnewkindsofconstraintsolvers in
systems such as Sicstus3. This mechanismis very useful
forNLPbecauseitpermitstheexperimentationwithand
application of new kinds of constraints solvers (e.g. the
handlingofgrammaticalprinciplesastypeconstraints[17],
ormorphologyas aconstraintbetweensurfaceandlexical
forms[24]).
2. Requirements ofNLP fora programming
language
Inthissection,weoutlinetherequirementsforadeclar-
ativelogicprogramminglanguageforthepurposesofNat-
uralLanguageProcessing[12]. WithinNLP,therearetwo
separatekindsofactivities:
1. ThedevelopmentofformalmodelsofNLsyntaxand
systems (LanguageEngineering)
Both kinds of activities are closely related, and make
use of thesame declarativecore,but needdierent kinds
ofsupport servicesfroma programminglanguage:
For Computational Linguistics, it is importantto have
goodsupportfordeveloping,testing,debugginggrammars
andforformalisingandevaluatinglinguistictheories. The
emphasis on developinglinguistic descriptions,expressing
generalisations about linguistic phenomena, and experi-
menting with new linguistic theories and new descriptive
devicesimposesthefollowingrequirementsonalogicpro-
gramming language, which can support the building of
toolsforthese activities:
declarativeness
an expressive constraint language with a convenient
notation
thepossibilitytoaddnewkindsofconstraintsolvers
tools for debugging and visualisation (e.g. Tcl/TK
interface,asitisprovided inSicstusandOz).
ForLangaugeEngineering,ontheotherhand,computa-
tionaleÆciencyisa very importantconsiderationbecause
systems must eitherrespond with a minimum of waiting
time (dialog systems) or be ableto handle largeamounts
ofdata(textunderstanding,message extraction).
A second important consideration for Language Engi-
neering is the need to make a selection among the large
numberofpossiblesolutionsofaparticularparsingorgen-
erationtask. Thechosensolutionshould be thepreferred
(mostprobable)readinginthecaseofparsing,orthesen-
tencethatisoptimalforachievingthedesiredcommunica-
tiveeectinthecaseof generation.
While it is not yet clear what the optimal model for
rankingthepossiblesolutionsis,thereiscurrentlyastrong
trendtowardsstatisticallanguagemodels(whichoftenstill
useveryprimitivemodelssuchasbigramortrigramstatis-
tics). In Computational Linguistics research, this new
trendisalsoreectedintheshiftfromcompetencemodels
to performancemodels [25]. Aprogramming language for
natural language must support research into the integra-
tion of declarativelogicalmodelsand statisticallanguage
models.
In summary, a programming language which supports
ComptationalLingustics research and thedevelopment of
applied NLP systems must satisfy the following require-
ments:
Declarative Core
1. featureconstraints(inadditionto terms)
2. nitedomains(notonlyoverintegers)
3. higher-order unication and higher-order program-
ming
4. supportfordierenttypesystems
Resolution Strategies
1. supportforpreference-drivenconstraintsatisfaction
2. eÆcientdisjunctionhandling
3. thepossibilitytoaddnewtypesofconstraints(such
head-driven,best-rstetc.)
5. tabulation support
Support Services
1. support for `programming in the large' (e.g., mod-
ules, integration with imperative, object-oriented
andfunctionalprogramming)
2. embeddability into other systems (e.g., database
support forlargelexicons,compilationto machine-
codewhichcanbelinkedwithothersystemcompo-
nents)
As a whole, the programming language should provide
eÆcientimplementationsofthebasic services,suchasfea-
tureconstraints,tabulation,disjunction,orsearchengines,
so that a grammar formalism can be built on top of it
withouta lotofcomputationaloverhead, andontheother
hand be exible enough to oer the possbility to pursue
interestingresearch directions, e.g.,inthe combinationof
declarative with statisticallanguage models, or the addi-
tionofnewtypesofconstraints.
3. Howsuitable is OzforNLP?
SinceweconcludethatthetimehascomeforNLPtodi-
vorceProlog,wetakea closerlook atOztosee ifitreally
fulllstheneedsofNLP.Onthefundamentalissues(declar-
ativity, data structures, logical variables, unication), Oz
shares or surpassesall theadvantages of Prolog. Wenow
goontoconsiderthedetails.
DataStructures
Oz providesopen andclosed feature structures. Inthis
sense,itisveryclosetotherequirementsofNLP.However,
it does not provide sorts or types as a built-in datatype.
While this may appear as a drawback, it is really an ad-
vantagesincesortinheritancecanbeimplementedeÆcient-
ly in Oz, and there is still disagreement inNLP whether
singleinheritance isenough,orwhethermultipleormulti-
dimensional[10]inheritanceisneeded. So,Ozprovidesthe
exibilitytoadddierentsortsystems asrequired.
ConstraintHandling
ThehandlingoffeatureconstraintsinOzisideallysuited
fortheneedsofNLP.Fortheadditionofconstraintsolvers
(e.g., forset constraintsor linearprecedenceconstraints),
as well as dierent typesystems,it would be desirableto
havea mechanismsuchas attributedvariables.
Search
A number of search strategies is supported by Oz, and
Ozisagoodplatformforaddingnewkindsofsearchstrate-
gies. Ozprovidesgoodsupportforcomputingwithpartial
information,whichisusefulforincrementalNLPwherein-
putmustbeprocessedevenifitisonlypartiallyknown. It
will haveto beseen to whatextentOz supportsbest-rst
searchbasedonstatisticalmodelsofnatural language.
Ozprovidesnobuilt-insupportformemoingandre-use
ofpartialresults(tabulation),which isanimportanttech-
Oz provides nite domain constraints,which are useful
forhandlingmanykindsofsimpledisjunctionwhich arise
inNLPsystems. Ozprovidesnitedomainsforsetsofin-
tegers,whileNLPneedsnitedomainsoversetsofpossbile
atomicfeaturevalues (suchasthevaluesoftheagreement
featuresnumberandperson). However,nitedomaincon-
straintsoversetsoffeaturevaluescaneasilybeimplement-
ed bydening a bijectivemappingbetweena nite set of
feature valuesanda nitesetofintegers.
Ozprovidesnobuilt-insupportformorecomplexforms
ofdisjunction,buttheconcurrentconstraintprogramming
approachoersgoodpossbilitiesforimplementingdisjunc-
tionhandling.
SupportServices
Oz is very suitable for `programming in the large'
since it provides for the integration of dierent program-
ming paradigms (logic,functional, andconcurrentobject-
orientedprogramming)[20]. Object-orientedprogramming
inOzprovidesformodularprograms.
Graphical User Interface
Graphical user interfaces are important for increasing
the productivity of grammar development. The support
that OzprovidesforTcl/Tkmakesitasuitablecandidate
forimplementingaexiblegrammardevelopmentenviron-
ment.
All of these factors taken together make Oz appear as
a suitable platform for implementingNLP systems. The
interfaceforcombiningOzwithprocedureswritteninoth-
erprogramming languagessuch as C enables theintegra-
tionofmodulessuchasspeechrecognitionormorphological
analysis.
4. Projects
ThefollowingprojectsinvolvingOzarecurrentlyunder
wayinSaarbrucken.
Grammar Formalism
Thersttaskistodeneagrammarformalismontopof
Oz. Thisshouldbedoneinthespiritof\leanformalisms",
i.e., providing a nicenotationfor thegrammardeveloper,
andcompilingitinsuchawaythatitmakesuseofbuilt-in
operations such as unication and doesnot incur a large
processing overhead. It is planned to make some of the
constraintsolversforsetsandtreesdevelopedinaprevious
projectavailableforOz. Atthetimeofwriting,thisproject
isstillinthespecicationphase.
Performance Modelling
In two projects, Oz will be used to perform research
on models of human linguistic performance. One of the
projectswillusedeclarativestate-of-the-artNLgrammars
(competence models) and try to model performance by
tence/performancedistinctionisviewedinanewway. The
concurrentconstraintcomputationmodelprovidessupport
forthisnewview. Anenvironmentforexperimentingwith
dierent performance models will be implemented in Oz.
TheseprojectswillbefundedbytheGermanScienceFoun-
dation(DFG)andrunfromJanaury1996forthreeyears.
5. Conclusion
IntheprecedingdiscussionsomeaspectsofOz(e.g. ob-
ject orientation, softreal-time control,deep guards) have
notbeen mentioned,but itis not unlikely that the avail-
abilityofthesetoolstogetherwiththeotheraspectsofOz
will open up newpossibilities inNLP. Weexpect a quite
fruitfulrelationshipbetweenOzandNLP.
References
[1] HassanAt-KaciandPatrickLincoln. Life,a naturallanguage
fornaturallanguage.T.A.Informations,30(1-2):37{67,1989.
[2] HiyanAlshawi,editor. TheCoreLanguageEngine. MITPress,
1991.
[3] Gosse Bouma, Esther Konig, and Hans Uszkoreit. A exi-
ble graph-unicationformalismand its applicationto natural-
language processing. IBM Journal ofResearchand Develop-
ment,32(2):170{184,1988.
[4] BobCarpenter.Thelogicoftypedfeaturestructures.Cambridge
TractsinTheoreticalComputerScience.CambridgeUniversity
Press,Cambridge,1992.
[5] Bob Carpenter. ALE Version : User Manual. Universityof
Pittsburgh,1993.
[6] Mary Dalrymple, Stuart M. Shieber, and Fernando C. N.
Pereira. Ellipsisand higher-orderunication. Linguisticsand
Philosophy,14(4):399{452,1991.
[7] Jochen Dorre and Michael Dorna. CUF { A formalism for
linguistic knowledge representation. In Jochen Dorre, editor,
ComputationalAspectsofConstraint-BasedLinguisticDescrip-
tion.DeliverableR1.2.A.DYANA-2{ESPRITBasicResearch
Project6852,1993.
[8] G. Erbach, M. van der Kraan, S. Manandhar, H. Ruessink,
W.Skut, and C.Thiersch. Extending unication formalisms.
In Proceedings of the 2nd Language Engineering Convention,
London,1995.
[9] GregorErbach. Bottom-upearleydeduction. InProceedingsof
COLING,pages796{802,Kyoto,1994.
[10] Gregor Erbach. Multi-dimensional inheritance. In H. Trost,
editor,ProceedingsofKONVENS'94,pages102{111,Vienna,
1994.Springer.
[11] GregorErbach. ProFIT:Prologwithfeatures,inheritanceand
templates. InSeventh ConferenceoftheEuropeanChapter of
theAssociationforComputationalLinguistics(EACL),Dublin,
1995.
[12] GregorErbachandSureshManandhar. Visionsforlogic-based
Natural Language Processing. To appear in: Proceedings of
theILPS'95workshop\Visionsforthefutureoflogicprogram-
ming",Portland,1995.
[13] C. Holzbaur. DMCAICLP reference manual. Technical Re-
portTR-92-24,
OsterreichischesForschungsinstitutfurArticial
Intelligence,Vienna,1992.
[14] Hans-UlrichKriegerandUlrichSchafer. TDL|atypedescrip-
tionlanguageforconstraint-basedgrammars.InProceedingsof
the 15th International Conference onComputational Linguis-
tics,COLING-94,Kyoto,Japan,1994.
[15] SureshManandhar. Anattributivelogicofsetdescriptionsand
setoperations. In32ndAnnualMeetingoftheAssociationfor
ComputationalLinguistics(ACL),pages255{262,LasCruces,
NM,1994.
[16] Suresh Manandhar. Deterministicconsistencycheckingof LP
constraints.InSeventhConferenceoftheEuropeanChapterof
theAssociationforComputationalLinguistics(EACL),Dublin,
guage using CLPTechniques. PhDthesis,TechnischeUniver-
sitat,Vienna,October1994.
[18] ChristopherS.Mellish. Implementingsystemicclassicationby
unication. ComputationalLinguistics,14(1):40{51,1988.
[19] PaulMeylemans. ALEP- arrivingat thenextplatform. EL-
SNews,3(2):4{5,1994.
[20] MartinMuller,TobiasMullerandPetervanRoy.Multiparadigm
ProgramminginOz. Toappearin: ProceedingsoftheILPS'95
workshop \Visionsfor thefutureoflogicprogramming",Port-
land,1995.
[21] GunterNeumann.AUniformComputationalModelforNatural
LanguageParsingandGeneration.PhDthesis,Universitatdes
Saarlandes,Saabrucken,1994.
[22] Stuart M. Shieber, Hans Uszkoreit, Fernando C. N. Pereira,
JaneJ. Robinson, and MabryTyson. Theformalismand im-
plementation of PATR-II. In Barbara J. Groszand MarkE.
Stickel,editors,ResearchonInteractiveAcquisitionanduseof
Knowledge, pages39{79. SRI International, Menlo Park,CA,
1983.
[23] HaraldTrost,editor. FeatureFormalismsandLinguisticAmbi-
guity. EllisHorwood,Chichester,1993.
[24] HaraldTrostandJohannesMatiasek. Morphologywithanull-
interface. InProceedingsofthe15thInternational Conference
onComputationalLinguistics,Kyoto,Japan,1994.
[25] Hans Uszkoreit. Strategies for adding control informationto
declarativegrammars.InProceedingsofthe29thAnnualMeet-
ingoftheAssociationforComputationalLinguistics,pages237{
245,Berkeley,CA,1991.
[26] DavidS.Warren.Memoingforlogicprograms.Communications