• Keine Ergebnisse gefunden

Jennifer DECAMP

MITRE Corporation, McLean VA, USA

Abstract: Problems in multilingual technology and technology transfer relevant to African languages include lack of (i) information about available technologies, (ii) means of easily sharing available technologies, and (iii) input to developers, professional organisations, and standards committees on technology needs. This paper discusses these problems and provides suggestions for circumventing and/or addressing them.

1. Information about available technologies

Many language technologies – particularly word processing – have been developed by or for particular university or government offices (e.g. Hausa for the United States International Broadcasting Bureau, aka Voice of America) but have not been extensively marketed. There are also workarounds that are not always known by the representatives of the company offering the product, some of which are discussed in the last section of this paper. Lack of information about capabilities and lack of access to the software often results in research groups and software companies providing redundant efforts, at a time when there is extensive new development that is needed.

There are many services and products that can be of help. One is a project called the Foreign Language Resource Center, which is a website at http://flrc.mitre.org designed for information exchange about language technology. The site is sponsored by the U.S. government, and extensive global participation is encouraged. The site includes a survey form, which can be filled in and updated by the developer or sales organisation. The survey form includes questions about the functionality, quality, and availability of the software. Input is solicited on current and planned products, prototypes, and research.

The site also includes electronic reports, where the user can request all products in the database for a particular language, all products for a particular technology (e.g., optical character recognition), all products for a particular language and technology (e.g. all Hausa optical character recognition), and/or all products available in certain time frames (e.g. new in the coming year). We are working on a capability for the products to be displayed by functional capabilities (e.g. to provide optical character recognition of handwriting rather than just of print), as indicated by the vendor or developer providing the information. A second stage will be to validate, assess, and rate the information, with the information then marked as having been validated.

Feedback will be provided to the vendors and developers.

The survey currently includes a capability for the respondent to provide information on assessments, specifically on the metric, measure, score, date of assessment, and who conducted the assessment. This information is provided in the

electronic reports described above, and where possible, the original assessments are provided as links. There is, of course, a cautionary note in the report that information cannot be easily compared across assessments, particularly when assessments are conducted with different methodologies. For selected products that meet a wide range of functional requirements, my company, MITRE, plans to conduct or help structure assessments that can be compared across products, thus enabling better relative ratings.

We welcome any input on the types of data we collect, the languages for which data is collected, and the products. We also welcome any input of data and any use of this resource.

2. Means of easily sharing available technologies

Lack of information about technologies of course, has been one of the key barriers to sharing technologies. However, there often is not the support structure to make computer code more robust, to add documentation, and to provide assistance. Thus people may give their code to friends or colleagues, but are reluctant to make it more broadly available because of the impact to their time and resources. Conversely, users are often reluctant to try programs available on the Internet, particularly if it is untested and/or undocumented.

There is a further problem that products developed for English and Western European languages do not always work well and/or cannot be easily adapted to substantially different languages. Many companies (e.g. TRADOS, Microsoft) are increasingly adopting standards for the International Organisation of Standards (ISO) and the World-Wide Web Consortium (W3C). However, until 2002 there has been little African input to such products or standards, either in the design or testing stages.

Substantial time and work is needed to review standards in light of the requirements of the many African languages to ensure that these standards – particularly ones used in software development – are conducive to meeting the needs for applications in these languages.

Where software is available without licensing costs and would be helpful to this global language community, we would like to make it available on the Foreign Language Resource Center site through linking to relevant sites, or through providing the software to be downloaded. Information and/or software can be sent to Jennifer DeCamp at jdecamp@mitre.org. In some cases, with the permission of the developers, we would like to make the freeware more robust by providing documentation and instructions for installation and use.

For the U.S. government, we also provide a help desk, where members of the U.S. government can e-mail or telephone with requests for information or for help with software problems (e.g. incompatibility with other products, inability to perform as advertised, etc.). A similar help desk is provided by the Austrian government to make multilingual products more accessible to businesses. Such a help desk may also be

useful for South Africa, particularly in this current stage of rapid emergence of technological capabilities in African languages.

3. Input to developers, professional organisations, and standards committees on technology needs

In order to have commercial products that support word processing, optical character recognition, searches, terminology management, and other functions in African languages, it is often necessary to provide input to developers, professional organisations, and standards committees on technology needs relevant to these languages. One approach is to participate in organisations that develop standards for industry use. The Unicode Consortium and the Localisation Industry Standards Association (LISA) are organisations that were formed by industry to develop standards for their use. Meetings of these organisations include product managers and marketing managers from a wide range of companies. Both organisations welcome new members, although there are steep membership fees. There are also open-source software groups that welcome participation of all interested parties, particularly people willing to take on new development projects. Examples include the LINUX I18N (Internationalisation) newsgroup, which develops new software available free to all parties. Another example is Netscape, which has made its source code for the basic browser, Mozilla, available on the Internet for developers to expand upon. A particularly productive area for expansion has been international extensions for word processing and localisation. These organisations are particularly relevant for word processing, browsing, and localisation applications.

In the terminology area, there are more traditional standards organisations such as ISO Technical Committee 37 and W3C Internationalisation (I18N). Such committees have extensive participation by industry and by government organisations.

Organisations such as InfoTerm and TermNet include terminology publishers and interact with software developers. Professional organisations such as TAMA and Terminology Knowledge Engineering (TKE) provide opportunities for exchange of information between users, publishers, software developers, and standards representatives. LISA is also a good forum.

For machine translation, there are numerous organisations, including the International Association of Machine Translation (IAMT). For many other language technologies, a good forum is ELSNET (the European Network of Excellence in Human Language Technologies).

Direct approaches are also effective. Many software companies welcome input on user needs, although a frequent question from such companies concerns the amount of business that could be expected if the requested capability is added. There is also the approach of enlisting broad support for capabilities in certain languages, thus enlarging the market for industry. Many universities, government diplomatic services, government health systems, and libraries throughout the world have needs for abilities

to create, edit, display, print, and search text in at least some African languages. By developing joint business cases, it is often possible to get a sufficient business case to interest software developers. If the market for certain language software is small, it helps to reduce the cost to industry by prototyping or developing or paying the company to develop and integrate certain capabilities.

Many companies also welcome beta testers. If a requested capability is developed, it is a good idea for the requesting organisation to at least review it, or preferably beta test it, before the software is released in order to ensure that the capability functions as envisioned. Considerable input to future software releases can be provided in the beta testing process.

4. Conclusion

Africa and the global community interested in language technologies face a significant endeavour in identifying language requirements (e.g. character sets, keyboard layouts, linguistic constraints, etc.); identifying existing technology that can meet those requirements or can be easily adapted to meet those requirements; providing information and support to users through websites, help desks, and other sources;

ensuring that existing and planned technologies provide maximum functionality and easy use with commercially-available products; ensuring that international standards meet the linguistic and cultural requirements of African languages; and addressing gaps. This is not the scope of work for a single person; it is not the scope for a few busy volunteers with other commitments. This work will require commitments of funding, time, and dedication, but should be highly instrumental in advancing technologies and technology use throughout Africa.

Recommended websites

http://www.dictionary.com & http://www.yourdictionary.com Sites with useful electronic dictionaries.

http://www.multilingualcomputing.com

A site with extensive information on multilingual technology.

http://www.systransoft.com & http://www.systranet.com

Sites with ability to use free rough machine translation, mainly in European languages and English. The second site includes a capability for the users to add their own terms.

http://www.elsnet.org/list.html

Site with extensive information on resources (predominantly European) for language technology expertise.

http://www.unicode.org

The Unicode site with extensive information on this standard.

http://www.eamt.org/compendium.html

Machine translation resources, available through the website of the European Association of Machine Translation (EAMT).

http://www.lisa.org

Site of the Localisation Industry Standards Association (LISA), with information on localisation and on terminology standards.

http://www.sign-lang.uni-hamburg.de/bibweb/Keywords/African-sl.html Information on sign languages for African languages.

http://www.sas.upenn.edu/African_Studies/K-12/menu_EduLANG.html Information on African languages.

http://www.ethnologue.com

Information on languages, searchable by language name and/or country.

http://polyglot.lss.wisc.edu/lss/lang/african.html Information on African languages.

http://www.balancingact-africa.com/news/back/balancing-act_69.html Article on technologies for African languages.

Towards Strategies for Translating Terminology into all South African

Outline

ÄHNLICHE DOKUMENTE