• Keine Ergebnisse gefunden

Computation Services

Im Dokument Digital Technical Journal (Seite 60-65)

To bu ild systems that process multil ingual data, such as the one shown in F igure 1, a rich variety of text operations is necessary. This section catego­

rizes such operations, bu t a complete specification of their interfaces would consume too much space in this pap er. Text operations require parsing, value mapping, and operational fu nctions, as described earl ier.

Text Manipulation Services

Text manipul ation services, such as those speci­

fied in C p rogram ming language standard ISO/IEC 9899: 1990, System V Release 4 Mul ti-National

Vol. 5 No. 3 Summer /'J':J.) Digital Teclmical journal

Jntemational Distributed Systems-Architectural and Practical Issues

PARAMETER PARAMETER

� l

DOCU M E NT PAGE

INTERCHANGE - LAYOUT

r-

DESCRIPTION

1-

RENDER

-FORMAT LANGUAGE

I J

I I

FONT SERVER

FONT DATABASE

Figure 3 Layout and Rendering Services

Language Supplemen t (MNLS), or XPG4 run-time l ibraries (includ ing character and text element clas­

sification fu nctio ns, string and substring opera­

tions, and compression and encryption services) need to be extended to mu ltil ingual strings such as Strings(Unicode) and other DI Fs, and to various text object class I ibraries .u.s. 13

Data Type Tramformations

Data type transformations (e.g. , speech to text, image-to-text optical character recogn ition [OCR] , and handwriting to text) are operations where the data is transformed from a representation of one abstract data type to a representation of another abstract data type. The presentation form transfor­

mati ons T•--•T_Presentation_Form and the funda­

mental i nput and output services are data type transformations. Care needs to be taken when paramete rizing these operations with user prefer­

ences to keep the transformation thread-safe.

Again, this is best accomplished by keeping the pre­

sentation form preferences attached to the data.

Encoding Conversions

Encoding conversions (between encoded character sets, DIFs, etc.) are operations where only the rep­

resentation of a single data type changes. For exam­

ple, to support Un icode, a system must have fo r each other encoded character set a function to_uni:Strings(E)-•Strings(Unicode), which con­

verts the code points in E to code points in Unicode . 1 1 The conversion fu nction to_uni has a par­

tial inverse from_uni:Strings( Unicode) >Strings(E),

D igital Tecbllical ]ounwl Vol. 5 No . . i S11111mer 1993

which is only defined on those encoded text ele­

ments in Unicode that can be expressed as encoded text elements in E. If s is in Strings(E), then from_uni(to_uni(s)) is equal to s. Other encoding conversions Strings(E)-> Strings(E') can be defined as a to_u ni operation followed by a from_un i oper­

ation, for E and E' respectively. Another class of encod ing conversions arises when the character set encoding remains fixed, but the conversion of a document in one DIF to a document in another DIF is required . A third class originates when Unicode or ISO 10646 strings sent over asynchronous com m u nication channels m ust be converted to a Universal Transmission Format (UTF), thus requir­

ing Strings(Unicode)<-> 1T encoding conversions.

Collation or Sorting Services

Another group of computation services, col lation or sorting services, sorts l ists of strings according to application-specific requirements. These ser­

vices were d iscussed earlier in t he paper.

Linguistic Services

Linguistic services such as spell checking, gram mar checking, word and line breaking, content-based retrieva l , translation (when existent), and style checking need standard AP!s. Although the imple­

mentation of these l inguistic services is natu ral langu age-specific, most can be implemented with the structure shown in Figure 2.

Also, large character sets such as Unicode and other m u l ti lingual structures require a u ni­

form exception-hand l ing and fal l back mechanism

59

Product Internationalization

because of the large number of un assigned code points. For example, a system should he able to uniformly hand le exceptions such as "glyph not t<Jund for text element." Mechanisms such as global variables for error codes inhibit concurrent pro­

gramming and therefore should be discouraged . Returning an error code as the return value of the procedure ca l l is preferred, and when supported, raising and hand l ing exceptions is even better.

System Naming, Synonyms, and Security

The multil ingual aspect of Unicode can simplify system naming of objects and their attribu tes, e.g. , in name services and reposi tories. Using encoded strings tagged with their encoding type for names is too rigid, because of the high degree of overlap in canonical t{)rm in Unicode according to the fol low­

ing definitions. characters fol lowed by their assorted marking char­

acters in some prescri bed order. The recom­

mended order is the Unicode " priority value :· 1 1 · 2 1 The canonical for m should have the fol lowing prop­

erty: When c(u) is equal to c(u), the plain text rep­ strings used for names are desirable, e.g. , the absence of special characters and tra i l i ng blan ks. In a multi­

vendor environment, both the canonical form and the name restrictions should be standard ized . The X.'500 work ing groups currently studying this prob­

lem plan to achieve comparable standard ization.

Since wel l-chosen names convey usefu l informa­

tion, and since such names are entered ami d is­

played in the end user's writing system of choice, it is often desirable for the system to store various

translations or "synonyms" for a name. Synonyms, for whatever purpose, shou ld have attributes such as long_namc, short_name, language, etc . , so that directory fu ncti ons can provide easy-to-usc inter­

faces. Access to objects or attribute values through synonyms shou ld be as efficient as access by means of the primary name.

Jn a global network, publ ic key authentication using a replicated name service is recommended 22 One principal can look up another i n the name ser­

vice by in itially using a (possibly meani ngless) name for the object in some com mon character set, e.g., {A-Z,0-9}. Su bsequently, the principals can define their own synonyms in their respective lan­

guages. Attribu tes for the principals, such as net­

tribu ted system is somewhat more complicated than for a monolingual system. The fol lowing is a partial l ist of the services that must be provided:

Services for various mono I ingual subsystems

Registration services for user preferences, locales, user-defined text elements, formats, etc.

Both m u l t i l i ngual and mu ltiple monolingual run- t i me l ibraries, simultaneously (see Figure 2)

Multili ngual database servers, font servers, logging and queu i ng mechanisms, and directory services

Mu ltil ingual synonym services

M u l t i l i ngual d iagnostic services

Since a system cannot provide all the services for every possible situation, registering the end users' needs and the system's capabil i ties in a global name service is essen tial. The name service mu st be con­

figured so that a multilingual server can identify the la nguage preferences of the cl ients that request ser­

v ices. This configuration al lows the servers to tag or convert data from the cl ient without the mono-1 ingual cl ient's active participation. Therefore, the name service database must be u pdated with the necessary preference data at client instal lation time.

Typical ly, system managers for d ifferent parts of the system are mono I ingual end users (see Figure 1) who need to do their job from a standard PC.

llfJI. 5 No. 3 Summer 19'):) Digital Tecbnical ]Olii'IICII

International Distributed Systems-Architectural and Practical Issues

Thus, both the normal and the diagnostic m anage­

ment interfaces to the system must behave as m u l t i­

l ingual servers, sendi ng error codes back to the PC to be interpreted in the local language. Although the quality of the translation of an error message is not an architectural issue, translations at the system management level are generally poor, and the sys­

tem design should accou nt for th is. Systems devel­

opers shou ld consider giv ing both an Engl ish and a local-language error message as well as giving easy-to-use pointers into local-language reference manuals.

Data errors wi l l occur more frequently because of the mi xtures of character sets in the system, and attention to the identification of the location and error type i s im portant. Logging to capture offending text and the operations that generated it is desirable.

Incremental Internationalization

Mu lti! ingual systems and international components can be bu ilt i ncremental ly. Probably the most pow­

erful approach is to provide the services to support mul tiple monolingual subsystems. Even new oper­

ating systems, such as the Windows NT system, that use Un icode internally neecl mechanisms for such support.25 Multidimensional improvements in a sys­

tem 's ability to support an increasing number of variations are poss ible. Some such im provements are ma king more servers multi lingual, supporti ng more mult i l i ngual data and end-user preferences, supporting more sophisticated text elements (the first release of the Win dows NT operating system will not support Unicode's joiners), as wel l as adding more character set support, locales, and user-defined text elements. The key point is that, l ike safe programming practices, multil ingual support in a d istributed system i s not an ·'ali-or­

nothing" endeavor.

Summary

Customer demand for multil ingual distributed systems is increas ing. Suppl iers must prov ide systems without i ncurring the costs of expen­

sive reengineering. This paper gives an overview of the architectural issues and progra mmi ng practices associated with im plementing these systems.

Modularity both in systems and in run-time l ibrarits al lows greater reuse of components and i ncremental improvements with regard to interna­

tional ization. Using the suggested safe software practices can lower recnginecring and

mainte-Digital TeciJnical Jounwl Vol. S No .. I Summer 1')93

nance costs and help avoid cost ly redesign problems. Providing m u ltil ingual services to mono­

l i ngual subsystems permits increment al improve­

ments while at the same time lowers costs through i ncreased reuse. Final ly. the registration of syn­

onyms, user preferences, locales. and services in a global name service makes the system cohesive.

Acknowledgments

I wish to thank Bob Ayers (Aclohe). JosL·ph Bosurgi (Univel), Asmus Freytag ( \1 icrosoft), Jim (;ray (Digital), and jan te Kidte ( D igital) for thl'ir helpfu l comments on earlier drafts. A special thanks to Digital's internationa l i zation team, whose contribu­

tions are always understated . In addition. I wou ld l i ke to acknowledge the Unicode Technical Com mittee, whose impact on the industry is pro­

found and growing; I have learned a great deal from fo llowing the work of this com mittee.

References

1 . D. Carte r, Writing Localizable Software fo r the Macintosb ( Reading. tviA : Addison-Wesley, 1991).

2. Producing International Products (Maynard.

MA: Digital Equi pment Corporat ion, 1989).

This internal document is unavailable to external readers.

3. Digital Guide to Developing international Software (Burl ington, MA: Digital Press,

1991).

4. S. Martin, " I nternational i zation Made Easy,"

OSF White Paper (Cambridge, MA: Open Soft­

ware Foundation, Inc., 199l).

5. S. Snyder et al., "I nterna tion al i zation in the OSF IKE-A Framework," May 1991 . This doc­

u ment was an electronic mail mc.ssage trans­

mitted on the Internet.

6. X/Open Po rtability Gu ide, Issue 3 ( Readi ng,

U. K. X/Open Company Ltd , 1989).

7. X/Open Internationalization Guide, Draft 4.3 ( Readi ng, U. K . : X/Open Company Ltd. , October 1990).

8. UNIX System V Release 4, Multi-National Language Supplernent (MNLS) Product Overview (Japan: American Te lephone and Telegraph, 1990).

61

Product Internationalization

9. Information Technology- Universal Coded Character Set (UCS) Draft International Standm·d, ISO/IEC 10646 (Geneva: Interna­

tional Organization for Standardization/Inter­

national Electrotechnical Commission, 1990).

10. A. Nakanishi , Writing Systems of the World, third printing (Rutland , Vermont. and Tokyo, Japan: Charles E. Tuttle Company, 198R).

1 1 . The Unicode Consortium, The Unicode Standard- Worldwide Character Encoding, Version 1 .0, Volume l (Reading, MA: Addison­

Wesley, 1991 ).

12. R . Haentjens, "The Ordering of Universal Character Strings," Digital Technical journal, vol . 5, no. 3 (Summer 1993, this issue): 43-52 . 13. Programming Lanf!.uages-C, ISO/lEC 9899:

1990(E) (Geneva: International Organization for Standardization/I nternational Electrotech­

nical Commission, 1990).

14. S. Mart in and M. Mori, Internationalization in OSF/1 Release 1. 1 (Cambridge, MA : Open Software Foundation, Inc. , 1992).

15. J. Becker, " Mu ltilingual Word Processing," Sci­

entific A merican, vol. 251 , no. 1 (Ju ly 1984) : 96-107

16. Coded Character Sets fo r Text Communica­

tion, Parts 1 and 2, ISO/IEC 6937 (Geneva:

62

In ternational Organ ization for Standardiza­

tion/International Electrotechnical Commis­

sion, 1983).

17. J Bertels and F. Bishop, " Unicode: A Un iversal Character Code," Digital Technical journal, vol. 5, no. 3 (Su m mer 1993, this issue): 21-31.

18. Go Comp uter Corporation, "Compaction Techniques," Second Unicode Implementors' Conference (1992).

19. J Becker, " Re : Updated [Problems wi th]

Unbound (Open) Repertoire Paper" (January 18, 1991) . This electronic mail message was sent to the Unicode mail ing l ist.

20. V Joloboff and W McMahon, X Windozu System, Version 11, Inpu t Method Specifica­

tion, Public Review Draft (Cambridge, MA : Massachuset ts Institute of Technology, 1990).

21. M . Davis, (Tal igent) correspondence to the Unicode Technical Com mittee, 1992.

22. M. Gasser et a!., " Digital Distributed Security Architecture" (Maynard, MA: Digital Equip­

ment Corporation, 1988). This i n ternal docu­

ment is unavailable to external readers.

23. H. Custer, Inside Windows NT ( Redmonc!, WA:

Microsoft Press, 1992).

Vol. 5 Nu. 3 Summer 1993 Digital Tecbnical jonnwl

Michael M. T. Yau

I

Supporting the Chinese, Japanese,

Im Dokument Digital Technical Journal (Seite 60-65)