(51) Int Cl.: G06T 9/00 ( )

(1)

Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in

European Patent Office Office européen des brevets (19)

1 3 64 343 B1

&

(11)

EP 1 364 343 B1

(12)

EUROPEAN PATENT SPECIFICATION

(45) Date of publication and mention of the grant of the patent:

19.07.2006 Bulletin 2006/29 (21) Application number: 02744894.3 (22) Date of filing: 15.02.2002

(51) Int Cl.:

G06T 9/00^(2006.01)

(86) International application number:

PCT/EP2002/001601

(87) International publication number:

WO 2002/069269 (06.09.2002 Gazette 2002/36) (54) FONT COMPRESSION AND RETRIEVAL

KOMPRIMIERUNG UND EXTRAKTION VON SCHRIFTTYPEN COMPRESSION ET LOCALISATION DE POLICE DE CARACTERE (84) Designated Contracting States:

AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

(30) Priority: 27.02.2001 US 794706 (43) Date of publication of application:

26.11.2003 Bulletin 2003/48

(73) Proprietor: TELEFONAKTIEBOLAGET LM ERICSSON (publ)

164 83 Stockholm (SE) (72) Inventors:

• SMEETS, Bernard S-240 10 Dalby (SE)

• ÄBERG, Jan S-222 20 Lund (SE)

(74) Representative: Petri, Stellan Ström & Gulliksson AB P O Box 793

220 07 Lund (SE) (56) References cited:

EP-A- 0 687 995 WO-A-99/21075 US-A- 5 058 187

• VINES P ET AL: "COMPRESSION TECHNIQUES FOR CHINESE TEXT" SOFTWARE PRACTICE &

EXPERIENCE,GB,JOHN WILEY & SONS LTD.

CHICHESTER, vol. 28, no. 12, 1 October 1998 (1998-10-01), pages 1299-1314, XP000778145 ISSN: 0038-0644

• CHANG K -Y ET AL: "CCIGS: A DATA

COMPRESSION SYSTEM FOR CHINESE FONTS AND BINARY IMAGES USING CLASSIFICATION TECHNIQUES" SOFTWARE PRACTICE &

EXPERIENCE,GB,JOHN WILEY & SONS LTD.

CHICHESTER, vol. 22, no. 12, 1 December 1992 (1992-12-01), pages 1027-1047, XP000655815 ISSN: 0038-0644

• MAKOTO NAGAO: "DATA COMPRESSION OF CHINESE CHARACTER PATTERNS"

PROCEEDINGS OF THE IEEE,US,IEEE. NEW YORK, vol. 68, no. 7, 1 July 1980 (1980-07-01), pages 818-829, XP000601742 ISSN: 0018-9219

• TSAY M -K ET AL: "DATA COMPRESSION ON MULTIFONT CHINESE CHARACTER PATTERNS"

IEEE TRANSACTIONS ON IMAGE PROCESSING, US,IEEE INC. NEW YORK, vol. 3, no. 2, 1 March 1994 (1994-03-01), pages 139-146, XP000440622 ISSN: 1057-7149

• JU R H ET AL: "GLOBAL STUDY ON DATA COMPRESSION TECHNIQUES FOR DIGITAL CHINESE CHARACTER PATTERNS" IEE PROCEEDINGS E. COMPUTERS & DIGITAL TECHNIQUES,GB,INSTITUTION OF

ELECTRICAL ENGINEERS. STEVENAGE, vol.

139, no. 1 PART E, 1992, pages 1-8, XP000288867 ISSN: 1350-2387

(2)

5

10

15

20

25

30

35

40

45

50

55

Description

BACKGROUND OF THE INVENTION Field of the Invention

[0001] The present invention relates generally to the compression and retrieval of data representing a font or other set of symbols; and, more particularly, to a method and apparatus for storing a large font, such as a Chinese or Japanese character set, in a compressed form while retaining access to individual symbols of the font.

Description of the Prior Art

[0002] In order to display messages in languages such as Chinese or Japanese on a CRT or an LCD display, a large set of symbols, or glyphs, is required. For example, the Chinese Unicode standard character set contains about 21,000 different Chinese symbols. Furthermore, each symbol is the size of at least some hundreds of pixels; and, as a result, to store a complete Chinese font requires a large amount of memory. Being able to store the glyphs in a more compact format than pure bitmaps will substantially reduce memory requirements.

[0003] For laser printers or high resolution displays, a font is usually stored as a series of points that are joined by curves. This brings the additional advantage of making font scaling possible, although for fonts stored in this way, some processing is needed to render the image itself.

For lower resolution displays, font scaling is not of inter- est, and it would be more efficient to store the font as a bitmap.

[0004] The majority of lossless data compression methods known in the art work in a sequential manner by referring back to data that has already been encoded.

Such methods are inappropriate for font compression, however; where, ideally, only a single glyph should be decompressed at a time. If sequential methods of this type are employed, some blocking of the glyphs is required, and a trade-off must be made between the two extremes of compressing the entire font as one block, thus losing random access capability, and compressing each symbol separately, in which case overall perform- ance becomes quite poor.

[0005] Instead of the above-mentioned sequential codes, a two-part code can also be used to compress and retrieve a font. Typically, the first part of such a code describes the statistical properties, or a model, of the data; and the second part encodes the data by a code derived from the model.

[0006] Exemplary of font compression and retrieval methods known in the prior art include those described in U.S. Patent Nos. 5,488,365; 5,058,187; 5,587,725; 5;

473;704: and 5,020,121; and PCT Publication No. WO 98/16902. In general, however, none of these prior art methods describes a font compression and retrieval tech- nique that provides complete random access of individual

symbols of the font, which is important to permit high- speed access of the symbols by modem high-speed equipment.

[0007] EP-A-0 687 995 discloses a system which op- erates with reference to a dictionary for registering a reg- istered data string as correlated to a registration number and for performing a data compressing process by re- placing a combination consisting of two or more character strings with a registration number.

[0008] WO 99/21075 discloses a system and method for implementing a user interface for use with Japanese characters, which includes an electronic device that contains a standard font and a font manager

SUMMARY OF THE INVENTION

[0009] The present Invention is directed to methods according to claim 1 and 14 and apparatuses according to claim 19 and 24 for the compression and retrieval of data representing a set of symbols; and, particularly, to a method and apparatus for compressing a font which comprises a large number of glyphs, such as a Chinese or a Japanese character set, in such a manner that each individual glyph of the character set can be separately accessed and decompressed.

[0010] A method for compressing data representing a set of symbols according to the present invention includes encoding each symbol of the set of symbols in the form of a two-part code wherein a first part of the two- part code is common for all encoded symbols of the set and a second part of the two-part code comprises encoded data representing a symbol of the set wherein each encoded symbol of the set can be separately accessed and decompressed.

[0011] The present invention provides a data compression method that is based on the use of a two-part code, however, the first part of the code is common for all symbols of the set of symbols, and this allows each encoded symbol to be separately accessed and decompressed.

The present invention, accordingly, provides the property of random access to the individual symbols of the set which, as indicated above, is a particularly important capability in modem high-speed equipment

[0012] According to a presently preferred embodiment of the invention, the set of symbols comprises a font of individual symbols or glyphs, and the data representing the glyphs includes a bitmap for each glyph. To encode a font, a statistical model of the set of glyphs is created (the first part of the two-part code or the "model"), and each glyph is then separately compressed with a code derived from the model (the second part of the two-part code or the "codeword"). The compressed glyphs are partitioned by codeword length, and one indexing table, sorted by an identifier for each glyph, is created for each partition. An encoded font will thus comprise a statistical model, a set of codewords, a set of indexing tables, a table of lengths for the indexing tables and, perhaps, aux- iliary tables used for decoding.

(3)

5

10

15

20

25

30

35

40

45

50

55

[0013] To decode a particular glyph, given the identifier for the glyph, the indexing tables are first searched for a matching entry. From the table lengths and the position in the table, the position or location of the particular glyph in the code set can be computed, and this permits the desired codeword for that glyph to then be extracted and decoded. Because, in the present invention, a two-part code is used where the first part of the code is common for all the encoded glyphs; indexing is greatly simplified inasmuch as for each glyph it is only necessary to locate the codeword for that particular glyph.

[0014] In accordance with one presently preferred embodiment of the invention, font compression is achieved utilizing an arithmetic encoder with a fixed probability table. This procedure provides near optimal compression of the glyphs, given the probability table, without the need of additional tables. According to an alternative embodiment of the invention, font compression is by means of a predictive encoding scheme with a fixed prediction table followed by a fixed Huffman coding. This procedure makes it possible to have a very fast decompression while retaining reasonable compression speeds. This embodiment is also particularly suitable for hardware implementation.

[0015] In general, with the present invention, complete random access to individual symbols of a set of symbols is gained at the cost only of using a two-part code with separate codewords instead of an adaptive one-part code as is known in the prior art. The additional cost in memory requirements is estimated to be no more than about 10 to 15 percent for a font of 10,000 symbols.

[0016] Yet further advantages and specific features of the invention will become apparent hereinafter in con- junction with the following detailed description of presently preferred embodiments thereof.

BRIEF DESCRIPTION OF THE DRAWINGS [0017]

FIGURE 1 schematically illustrates an encoder for a font compression scheme according to a first presently preferred embodiment of the invention;

FIGURE 2 schematically illustrates one example of a conditioning context of a pixel to assist in explaining the operation of the encoder of FIGURE 1;

FIGURE 3 schematically illustrates a decoder for retrieving data encoded by the encoder of FIGURE 1;

FIGURE 4 schematically illustrates an encoder for a font compression scheme according to a second presently preferred embodiment of the invention, and

FIGURES 5 and 6 are flow charts illustrating the ba- sic steps of the encoding and decoding procedures, respectively, for a font compression and retrieval method according to the present invention.

DETAILED DESCRIPTION OF PRESENTLY PRE- FERRED EMBODIMENTS

[0018] FIGURE 1 is a block diagram schematically illustrating the encoder structure of a compression scheme according to a first embodiment of the present invention for compressing data representing a set of symbols such as a font of Chinese or Japanese symbols or glyphs. Initially, it should be appreciated that the encoding procedures described can be carried out without time limitations so as to permit optimization of the size of the compressed data.

[0019] The encoder of FIGURE 1 is generally designated by reference number 10 and is composed of a plu- rality of modules. Initially, a two-dimensional bitmap 12 representing a symbol or glyph of a font, is converted to a sequence xⁿ= x₁, x₂...., x_n of bits by a serialize module 14 by scanning the bitmap according to some specified rule. Possible scan orders, for example, include row- wise, column-wise, diagonal or a more involved scan order as are well-known to those skilled in the art.

[0020] The sequence of bits output from the serialize module 14 is directed to an arithmetic encoder module 16 which may be a standard binary arithmetic encoding unit (see, for example, C.B. Jones, "An Efficient Coding System for Long Source Sequences", IEEE Transac- tions on Information Theory, vol. 27, no. 3, pp. 280-291, May, 1981). For efficiency, the arithmetic precision of the encoder should be matched with the precision of the probability table that will be discussed hereinafter. The bits are encoded sequentially as the bitmap is scanned.

[0021] The model that provides the coding probabilities for the arithmetic coding is illustrated by dashed block 18 and is designated in FIGURE 1 as a source model.

The source model is context based, i.e., the probability distribution of each pixel of the bitmap is determined by a conditioning context of surrounding pixel values. Thus, the model includes a context forming unit or module 22 which selects bits from previously encoded ones in the same bitmap to determine the context, which is represented as an integer. FIGURE 2 schematically illustrates one example of the correspondence between context pixels and bit positions. The conditioning context of any pixel must contain only pixels that appear earlier in the scan order. Its shape may vary depending on the pixel. Any context pixel outside the bitmap is set to zero.

[0022] The source model 18 also includes a probability table 24 which has one entry per possible context, containing the pixel probability conditioned on that context, stored with fixed precision.

[0023] Given the size and shape of the context, the probability table 24 is constructed by counting the occurrences of ones in each context, and normalizing by the number of occurrences of the context.

[0024] If only certain codeword lengths are allowed, for instance integer bytes, zeros are appended to the end of the output of the arithmetic encoder by byte alignment module 26. The output of the byte alignment module is

(4)

5

10

15

20

25

30

35

40

45

50

55

the codeword representing a symbol or glyph of the font.

[0025] Each glyph of the font is encoded utilizing the encoder of FIGURE 1 until the entire font has been encoded and stored in memory. Before a font is encoded, the scan order and the context forming function are cho- sen. Different sizes of contexts, scan orders, context forming functions, and precisions of the probability table can be tried in order to find the one yielding the best compression. The quantity that is minimized here to yield the best compression is the size of the codeword set plus the size of the probability table.

[0026] The codewords produced by the above procedure are sorted first by length and then by identifier, which is given for each glyph of the font, and an index table for each length is constructed as a list of identifiers sorted in ascending order. For each index table is stored in a length table the codeword length it corresponds to, and the table length.

[0027] The codewords are stored together with the index table and the length table. It should be noted that the information about the location and length of each codeword in memory is present only in the index and length tables, i.e., the codewords are stored sorted by length and identifier but without any separators.

[0028] When a glyph with given identifier I is to be decoded, the index tables are first searched one by one, using a binary search. If the identifier is found, the address of the corresponding codeword is found by sum- ming the product of the codeword length and the number of codewords of that length over all codeword lengths smaller than the one of the searched table (counting of the codewords should begin at zero), and adding the codeword length of the searched table times the position of the identifier in the table. Other search orders of the tables are also possible. For instance, one could search the tables in order of descending size, if desired; and it is not intended to limit the invention to any particular search order. It should also be understood that other searching methods can be used as well without departing from the scope of the present invention, and it is also not intended to limit the invention to any particular searching method.

[0029] Once the codeword has been located in memory, it is decoded by the decoding structure 30 illustrated in FIGURE 3. In FIGURE 3, the source model 18 is iden- tical to the source model 18 in the encoder 10 of FIGURE 1 (and thus has been given the same reference number), the arithmetic decoder 36 parallels the arithmetic encoder 16 in the usual way, and the image former 34 is simply the inverse of the serializer 14 in the encoder of FIGURE 1. The decoder 30 in FIGURE 3 provides as the output thereof the bitmap 32 of the desired glyph from the compressed font.

[0030] FIGURE 4 is a block diagram schematically illustrating the structure of an encoder 40 according to a second embodiment of the present invention. In FIGURE 4, the probability table of the source model 18 of the encoder 10 of FIGURE 1 is replaced by a prediction table

42 in a source model 48 in which each entry is one bit, indicating the most probable bit value in each context.

The predicted value for each bit is exclusive-ORed with the actual bit by unit 44, producing a bit stream that is encoded by a Huffman code in Huffman encoder module 46 (see D.A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes", Proc. IRE, vol. 40, pp 1098-1101, 1952.)

[0031] In this embodiment, in addition to the codewords, the index table and the length table, a description of the Huffman code must also be made available to the decoder. The optimization with respect to context size, etc. as described above with respect to the first embodiment, can be applied to this embodiment as well. The decoder structure for use with the encoder of this embodiment is also analogous to the decoder used with the encoder of the first embodiment (illustrated in FIGURE 3), and need not be further described herein.

[0032] FIGURES 5 and 6 are flowcharts which generally illustrate the steps of the encoding and decoding procedures, respectively, of the compression and retrieval methods of the present invention. Initially, with respect to FIGURE 5, to encode a font, two-dimensional bitmaps of the individual symbols or glyphs of the font are serialized at step 61 using the serializer 14 shown in FIGURES 1 or 4. A source or statistical model of the serialized data is created at step 62 using the context forming module 22 and either the probability table 24 of FIGURE 1 or the prediction table of FIGURE 4. The sequence of bits output from the serializer is then encoded in step 63 where each symbol or glyph of the font is independently encoded with a code derived from the statistical model by either the arithmetic encoder 16 of FIGURE 1 or the Huffman encoder 46 of FIGURE 4 to provide the encoded codeword set representing the font. The encoded font is then stored in memory at step 64 for later retrieval, for example. As indicated above, the codewords are stored together with the index table and the length table.

[0033] To decode the encoded symbols stored in memory; as illustrated in FIGURE 6, the index tables are first searched in step 71 until the identifier for the encoded symbol is found. The address of the stored encoded symbol is then found using the identifier, step 72; and, finally, the codeword is retrieved and decoded, step 83, using the decoder of, for example, FIGURE 3, to provide the decompressed bitmap of the selected symbol or glyph.

[0034] An important aspect of the present invention is that a two-part code is used wherein the first part of the code, i.e., the model, is common for all the encoded glyphs; and the second part of the code, i.e., the codeword, comprises the encoded data representing a glyph.

This simplifies indexing as for each glyph it is only necessary to locate the codeword. In the first embodiment described above, an arithmetic coder with a fixed probability table is used, which ensures near optimal compression of the glyphs, given the probability table, without the need for additional tables, as distinguished from Lem- pel-Ziv and Huffman coding schemes which perform

(5)

5

10

15

20

25

30

35

40

45

50

55

poorly on short data blocks and require extensive code tables, respectively.

[0035] By the use of predictive coding, as provided in the second embodiment described above, with a fixed prediction table followed by a fixed Huffman code, it becomes possible to have a very fast decompression while retaining reasonable compression This method is particularly suitable for hardware implementation.

[0036] In general, in the present invention, by using an indexing method with a length table, and indexing tables in which are performed at least one search; each table is reduced to a list of identifiers. With the present invention, the total size of the addressing tables are only mar- ginally larger than the space originally occupied by the identifiers; and, thus, we have gained the property of random access to the glyphs with only a slight increase in index table size.

[0037] It should be emphasized that the term "comprises/comprising" when used in this specification is tak- en to specify the presence of stated features, integers, steps or components, but does not preclude the presence or addition of one or more other features, integers, steps, components or groups thereof.

[0038] It should also be emphasized that while what has been described herein constitutes presently preferred embodiments of the invention, it should be recog- nized that the invention could take numerous other forms.

Accordingly, it should be understood that the invention should be limited only insofar as is required by the scope of the following claims.

Claims

1. A method for compressing data representing a set of symbols, comprising:

encoding each symbol of said set of symbols in the form of a two-part code by creating a first part of said two-part code, which includes a statistical model of said set of symbols, deriving a code from said statistical model, and encoding data representing each symbol of said set of symbols with the derived code to provide a second part of said two-part code for each symbol of said set,

wherein the first part of said two-part code is common for all encoded symbols of said set and the second part of said two-part code comprises encoded data representing a symbol of said set, and

wherein each encoded symbol of said set can be separately accessed and decompressed.

2. The method according to claim 1, wherein said data representing each symbol of said set comprises a two-dimensional bitmap of each said symbol.

3. The method according to claim 1 or 2, wherein said encoding step includes the step of determining the context of each pixel of a bitmap representing a symbol, and constructing a probability table of pixel values having one entry per possible context.

4. The method according to claim 3, wherein said encoding step comprises encoding each symbol using an arithmetic encoder with said probability table.

5. The method according to claim 4, wherein said encoding step includes the steps of determining the context of each pixel of a bitmap representing a symbol, and constructing a prediction table indicating the most probable bit value in each context.

6. The method according to claim 5, wherein the most probable bit value for each bit is exclusive-ORed with the actual bit producing a bit stream that is encoded by a Huffman code.

7. The method according to anyone of the previous claims, further including the step of storing said encoded data representing each symbol of said set in a memory.

8. The method according to claim 7, further including the step of including an identifier for each symbol of said set of symbols for identifying a location at which the encoded data representing each symbol is stored in the memory.

9. The method according to claim 8, further including the steps of sorting the encoded data representing each symbol of said set by length, and creating a set of indexing tables for each length, each indexing table including identifiers for encoded data of its respective length.

10. The method according to claim 9, further including the step of arranging the identifiers included in each indexing table in an ascending order.

11. The method according to claim 11, further including the step of creating a length table, said length table including information about each index table, including the length it corresponds to and the length of the index table.

12. The method according to anyone of the previous claims, wherein said set of symbols comprises a font, and wherein each symbol comprises a symbol of said font

13. The method according to claim 12, wherein said font comprises a font selected from the group consisting of a Chinese character set and a Japanese character set.

(6)

5

10

15

20

25

30

35

40

45

50

55

14. A method for retrieving encoded data representing a symbol of a set of symbols, which are stored in a memory, given an identifier for each symbol of said set of symbols, comprising:

providing an index table containing a list of identifiers for different symbols of said set of symbols;

searching said index table to locate an identifier for the symbol;

identifying a location at which said data representing said symbol of said set of symbols is stored in said memory using the identifier for said symbol; and

retrieving said data representing said symbol from said memory.

15. The method according to claim 14, wherein encoded data representing said symbols of said set of symbols are sorted first by length and then by identifier, in ascending order, and wherein said step of providing an index table includes providing an index table for each length, each index table containing a list of identifiers sorted in ascending order for symbols represented by encoded data of a particular length.

16. The method according to claim 15, wherein said searching step comprises searching said index tables for said identifier using a binary search.

17. The method according to anyone of the claims 14-16, wherein said method further includes the step of de- compressing the retrieved data.

18. The method according to anyone of the claims 14-17, wherein said encoded data comprises a two-dimensional bitmap of a symbol.

19. Apparatus for compressing data representing a set of symbols, comprising:

an encoder (10, 40) arranged to encode each symbol of said set of symbols in the form of a two-part code, and

source model means (18, 48) arranged to gen- erate a statistical model of said set of symbols, which a first part of said two-part code comprises,

wherein said encoder (10, 40) is arranged to encode data representing each symbol of said set of symbols with a code derived from said statistical model to provide a second part of said two-part code for each symbol of said set, and

wherein the first part of said two-part code is common for all encoded symbols of said set, and the second part of said two-part code comprises encoded data representing a symbol of said set, and wherein each

encoded symbol of said set can be separately accessed and decompressed.

20. The apparatus according to claim 19, wherein said data representing each said symbol comprises a bitmap of each said symbol, and wherein said encoder (10) includes:

an arithmetic encoder (16) for sequentially encoding pixels of said bitmap; and

the source model means (18) for providing coding probabilities for the arithmetic encoder, said source model means including a context forming unit (22) for forming a context for each pixel of the bitmap, and a probability table (24) containing the pixel probability of each pixel conditioned on the context formed by the context forming unit (22).

21. The apparatus according to claim 19, wherein said data representing each said symbol comprises a bitmap of each said symbol, and wherein said encoder (40) includes:

the source model means (48) for providing a predicted value for each pixel of the bitmap, a means (44) for exclusive-Oring the predicted value of each pixel with the actual bit to produce a bit stream, and

a Huffman coding unit (48) for encoding the bit stream.

22. The apparatus according to claim 21, wherein said source model means (48) includes:

a context forming unit (42) for forming a context for each pixel of the bitmap, and a prediction table (42) for providing the predicted value for each pixel in each context.

23. The apparatus according to any of the claims 19-22, wherein said set of symbols comprises a font.

24. Apparatus (30) for retrieving data representing a symbol of a set of symbols, which are stored in a memory, each of said symbols having an identifier, comprising:

a locator arranged to identify a location in said memory, at which said data representing a symbol of said set of symbols is stored, by searching an index table, which contains a list of identifiers for different symbols of said set of symbols, using said identifier for said symbol; and

a decoder (36) which is adapted to decode said symbol, wherein data representing each encoded symbol can be separately accessed and decoded.

(7)

5

10

15

20

25

30

35

40

45

50

55

25. The apparatus according to claim 24, wherein data representing said set of symbols are sorted by length, and wherein an index table is provided for each length, each index table is arranged to contain a list of identifiers for symbols represented by data of a particular length.

Patentansprüche

1. Verfahren zur Komprimierung von Daten, die einen Zeichensatz repräsentieren, mit:

Kodierung jedes Symbols des Zeichensatzes in der Form eines zweiteiligen Codes durch Erzeu- gung eines ersten Teiles des zweiteiligen Co- des, der ein statistisches Modell des Zeichen- satzes enthält,

Ableitung eines Codes aus dem statistischen Modell, und Kodierung von Daten, die jedes Zei- chen des Zeichensatzes repräsentieren, mit dem abgeleiteten Code, um einen zweiten Teil des zweiteiligen Codes für jedes Zeichen in dem Satz zu liefern,

wobei der erste Teil des zweiteiligen Codes für alle kodierten Zeichen des Satzes gleich ist, und der zweite Teil des zweiteiligen Codes kodierte Daten, die ein Zeichen des Satzes repräsentieren, umfassen, und

wobei jedes kodierte Zeichen des Satzes separat zugreifbar und dekomprimierbar ist.

2. Verfahren nach Anspruch 1, wobei die Daten, die jedes Symbol des Satzes repräsentieren, eine zweidimensionale Bitmap jedes Zeichens umfassen.

3. Verfahren nach Anspruch 1 oder 2, wobei der Ko- dierungsschritt den Schritt der Bestimmung des Kontextes jedes Pixels einer Bitmap, die ein Zeichen repräsentiert, und die Erstellung einer Wahrschein- lichkeitstabelle von Pixelwerten, die einen Eintrag pro möglichen Kontext aufweisen, enthält.

4. Verfahren nach Anspruch 3, wobei der Kodierungs- schritt eine Kodierung jedes Zeichens umfasst, wobei ein arithmetischer Kodierer zusammen mit der Wahrscheinlichkeitstabelle genutzt wird.

5. Verfahren nach Anspruch 4, wobei der Kodierungs- schritt den Schritt der Bestimmung des Kontextes jedes Pixels der Bitmap, die ein Zeichen repräsen- tiert, und eine Erstellung einer Vorhersagetabelle enthält, die auf den wahrscheinlichsten Bitwert in jedem Kontext hinweist.

6. Verfahren nach Anspruch 5, wobei der wahrschein- lichste Bitwert für jedes Bit Exclusiv-Oder mit dem

aktuellen Bit verknüpft wird, wodurch ein Bitstrom erzeugt wird, der durch einen Huffman-Code kodiert wird.

7. Verfahren nach einem der vorangegangenen An- sprüche, weiterhin den Schritt des Speicherns der kodierten Daten, die jedes Zeichen von dem Satz repräsentieren, im Speicher enthaltend.

8. Verfahren nach Anspruch 7, weiterhin den Schritt des Einbindens eines Bezeichners für jedes Zeichen des Zeichensatzes zur Identifikation eines Ortes an welchem die kodierten Daten, die ein Zeichen reprä- sentieren, im Speicher gespeichert sind.

9. Verfahren nach Anspruch 8, weiterhin die Schritte eines Sortierens der kodierten Daten, die jedes Sym- bol in dem Satz repräsentieren, nach Länge und Er- zeugen eines Satzes von Indextabellen für jede Län- ge enthaltend, wobei jede Indextabelle Bezeichner für kodierte Daten ihrer entsprechenden Länge ent- hält.

10. Verfahren nach Anspruch 9, weiterhin den Schritt des Anordnens nach aufsteigender Ordnung der Be- zeichner, die in jeder Indextabelle enthalten sind.

11. Verfahren nach Anspruch 11, weiterhin den Schritt des Erzeugens einer Längentabelle enthaltend, wobei die Längentabelle Informationen über jede In- dextabelle inklusive der Länge der es entspricht und der Länge der Indextabelle enthält.

12. Verfahren nach einem der vorangehenden Ansprü- che, wobei der Zeichensatz einen Font umfasst, und wobei jedes Zeichen ein Zeichen des Font umfasst.

13. Verfahren nach Anspruch 12, wobei der Font einen Font umfasst, der aus der Gruppe ausgewählt ist, die aus einen Chinesisch Zeichensatz und einen ja- panischen Zeichensatz besteht.

14. Verfahren zur Wiederherstellung kodierter Daten, die ein Zeichen eines Zeichensatzes repräsentieren, die in einem Speicher gespeichert sind, wobei ein Bezeichner für jedes Zeichen in dem Zeichensatz angegeben ist, mit:

Verfügbarmachen einer Indextabelle, die eine Liste von Bezeichnern für unterschiedliche Zei- chen des Zeichensatzes enthält;

Durchsuchen der Indextabelle, um einen Be- zeichner für ein Zeichen zu lokalisieren;

Identifizierung eines Ortes unter Nutzung des Bezeichners für das Zeichen an welchem die Daten, die das Zeichen in dem Zeichensatz re- präsentieren, in dem Speicher gespeichert sind;

und

(8)

5

10

15

20

25

30

35

40

45

50

55

Wiederherstellung der Daten, die ein Zeichen repräsentieren, aus dem Speicher.

15. Verfahren nach Anspruch 14, wobei kodierte Daten, die Zeichen eines Zeichensatzes repräsentieren, zu- erst nach Länge und dann nach Bezeichnern in aufsteigender Reihenfolge sortiert werden, und wobei der Schritt der Bereitstellung einer Indextabelle das Bereitstellen eine Indextabelle für jede Länge ent- hält, wobei jede Indextabelle, die eine Liste von Be- zeichnern in aufsteigender Folge für Zeichen enthält, die durch kodierte Daten einer bestimmten Länge repräsentiert werden..

16. Verfahren nach Anspruch 15, wobei der Suchschritt ein Durchsuchen der Indextabelle für den Bezeich- ner unter Nutzung einer binären Suche umfasst.

17. Verfahren nach einem der Schritte 14 bis 16, wobei das Verfahren weiterhin den Schritt eine Dekompri- mierung der wiederhergestellten Daten enthält.

18. Verfahren nach einem der Ansprüche 14 bis 17, wobei die kodierten Daten eine zweidimensionale Bit- map für ein Symbol umfassen.

19. Vorrichtung zur Komprimierung von Daten, die einen Zeichensatz repräsentieren, mit:

einen Kodierer (10, 40), der angepasst ist, jedes Zweichen eines Zeichensatzes in der Form eines zweiteiligen Codes zu kodieren und Quellenmodellmittel (14, 48), die angepasst sind, um ein statistisches Modell des Zeichen- satzes zu erzeugen, welches einen ersten Teil des zweiteiligen Codes umfasst,

wobei der Kodierer (10, 40) angepasst ist, Daten, die jedes Zeichen in dem Zeichensatz repräsentie- ren, mit einem Code, der von dem statistischen Mo- dell abgeleitet ist, zu kodieren, um einen zweiten Teil des zweiteiligen Codes für jedes Zeichen des Zei- chensatzes bereitzustellen, und

wobei der erste Teil des zweiteiligen Codes für alle kodierten Zeichen des Satzes gleich ist, und der zweite Teil des zweiteiligen Codes kodierte Daten umfasst, die ein Zeichen des Satzes repräsentieren, und wobei jedes kodierte Zeichen des Satzes separat zugreifbar und dekomprimierbar ist.

20. Vorrichtung nach Anspruch 19, wobei die Daten, die jedes Zeichen repräsentieren, eine Bitmap jedes Zeichens umfassen, und wobei der Kodierer (10) Folgendes enthält:

einen arithmetischen Kodierer (16) zur sequen- ziellen Kodierung von Pixeln der Bitmap; und Quellenmodellmittel (18) zur Bereitstellung von

Kodierungswahrscheinlichkeiten für den arithmetischen Kodierer, wobei die Quellenmodell- mittel eine Kontext bildende Einheit (22) zur Bil- dung eines Kontextes für jeden Pixel der Bitmap und eine Wahrscheinlichkeitstabelle (24) ent- hält, die eine Pixelwahrscheinlichkeit für jeden Pixel enthält, die vom Kontext abhängt, der von der Kontext bildenden Einheit (22) gebildet wur- de.

21. Vorrichtung nach Anspruch 19, wobei die Daten, die jedes Zeichen repräsentieren, eine Bitmap jedes Zeichens umfassen, und wobei der Kodierer (40) Folgendes umfasst:

die Quellenmodellmittel (48) zur Bereitstellung eines vorhergesagten Wertes für jedes Pixel der Bitmap,

ein Mittel (44) zum Exklusiv-Oder-Verbinden des vorhergesagten Wertes jedes Pixel mit dem tatsächlichen Bit, um einen Bitstrom zu erzeugen, und

eine Huffmann-Kodiereinheit (48) zur Kodie- rung des Bitstroms.

22. Vorrichtung nach Anspruch 21, wobei die Quellen- modellmittel (48) enthalten:

eine Kontext bildende Einheit (48) zu Bildung eines Kontextes für jedes Pixel der Bitmap und eine Vorhersagetabelle (42), um einen vorhergesagten Wert für jedes Pixel in jedem Kontext vorherzusagen.

23. Vorrichtung nach einem der Ansprüche 19 bis 22, vorbei der Zeichensatz einen Font umfasst.

24. Vorrichtung (30) zu Wiederherstellung von Daten, die ein Zeichen eines Zeichensatzes repräsentieren, der in einem Speicher gespeichert ist, wobei jeder der Zeichen einen Bezeichner aufweist, mit:

einem Lokalisierer, der angepasst ist, um einen Ort in dem Speicher durch eine Durchsuchung einer Indextabelle zu identifizieren, an dem Da- ten gespeichert sind, die ein Zeichen des Zei- chensatzes repräsentieren, wobei die Indexta- belle eine Liste von Bezeichnern an für unter- schiedlich Zeichen des Zeichensatzes enthält, und wobei der Bezeichner für das Zeichen genutzt wird; und

einen Decoder (36), der angepasst ist, um das Zeichen zu dekodieren, wobei Daten, die das kodierte Zeichen repräsentieren, separat zugreifbar und dekodierbar sind.

25. Vorrichtung nach Anspruch 24, wobei Daten, die den Zeichensatz repräsentieren, nach Länge sortiert

(9)

5

10

15

20

25

30

35

40

45

50

55

sind, und wobei eine Indextabelle für jede Länge ver- fügbar ist, wobei jede Indextabelle angepasst ist, eine Liste von Bezeichnern für Zeichen, die durch Da- ten einer bestimmten Länge repräsentiert sind, zu enthalten.

Revendications

1. Procédé destiné à compresser des données repré- sentant un ensemble de symboles, comprenant :

l’encodage de chaque symbole dudit ensemble de symboles sous la forme d’un code à deux parties en créant une première partie dudit code à deux parties qui comprend un modèle statistique dudit ensemble de symboles, la dérivation d’un code depuis ledit modèle statistique et l’encodage de données représentant chaque symbole dudit ensemble de symboles grâce au code dérivé pour fournir une seconde partie dudit code à deux parties pour chaque symbole dudit ensemble,

dans lequel la première partie dudit code à deux parties est commune à tous les symboles encodés dudit ensemble et la seconde partie dudit code à deux parties comprend des données encodées représen- tant un symbole dudit ensemble, et

dans lequel chaque symbole encodé dudit ensemble est accessible et peut être décompressé séparé- ment.

2. Procédé selon la revendication 1, dans lequel lesdites données représentant chaque symbole dudit ensemble comprennent un bitmap bidimensionnel de chacun desdits symboles.

3. Procédé selon la revendication 1 ou 2, dans lequel ladite étape d’encodage comprend l’étape consistant à déterminer le contexte de chaque pixel d’un bitmap représentant un symbole et la construction d’une table de probabilité de valeurs de pixel possé- dant une entrée par contexte possible.

4. Procédé selon la revendication 3, dans lequel ladite étape d’encodage comprend l’encodage de chaque symbole en utilisant un encodeur arithmétique avec ladite table de probabilité.

5. Procédé selon la revendication 4, dans lequel ladite étape d’encodage comprend les étapes consistant à déterminer le contexte de chaque pixel d’un bitmap représentant un symbole et à construire une table de prévision indiquant la valeur de bit la plus probable dans chaque contexte.

6. Procédé selon la revendication 5, dans lequel la va-

leur de bit la plus probable pour chaque bit est af- fectée d’un OU-exclusif avec le bit réel, produisant un flux de bits qui est encodé par un code de Huff- man.

7. Procédé selon l’une quelconque des revendications précédentes, comprenant de plus l’étape consistant à stocker lesdites données encodées représentant chaque symbole dudit ensemble dans une mémoire.

8. Procédé selon la revendication 7, comprenant de plus l’étape consistant à inclure un identifiant pour chaque symbole dudit ensemble de symboles pour identifier un emplacement dans lequel les données encodées représentant chaque symbole sont stoc- kées dans la mémoire.

9. Procédé selon la revendication 8, comprenant de plus les étapes consistant à trier les données enco- dées représentant chaque symbole dudit ensemble par longueur et à créer un ensemble de tables d’indexation pour chaque longueur, chaque table d’indexation comprenant des identifiants pour des don- nées encodées de sa longueur respective.

10. Procédé selon la revendication 9, comprenant de plus l’étape consistant à agencer les identifiants compris dans chaque table d’indexation dans un ordre ascendant.

11. Procédé selon la revendication 11, comprenant de plus l’étape consistant à créer une table de longueurs, ladite table de longueurs comprenant une information sur chaque table d’index, comprenant la longueur à laquelle elle correspond et la longueur de la table d’index.

12. Procédé selon l’une quelconque des revendications précédentes, dans lequel ledit ensemble de symboles comprend une police, et dans lequel chaque symbole comprend un symbole de ladite police.

13. Procédé selon la revendication 12, dans lequel ladite police comprend une police choisie à partir du grou- pe composé d’un ensemble de caractères chinois et d’un ensemble de caractères japonais.

14. Procédé destiné à récupérer des données encodées représentant un symbole parmi un ensemble de symboles, qui sont stockés dans une mémoire, étant donné un identifiant pour chaque symbole dudit ensemble de symboles, comprenant :

la fourniture d’une table d’index contenant une liste d’identifiants pour différents symboles dudit ensemble de symboles ;

la recherche dans ladite table d’index pour situer un identifiant pour le symbole ;

(10)

5

10

15

20

25

30

35

40

45

50

55

l’identification d’un emplacement dans lequel lesdites données représentant ledit symbole dudit ensemble de symboles sont stockées dans ladite mémoire en utilisant l’identifiant pour ledit symbole ; et

la récupération desdites données représentant ledit symbole depuis ladite mémoire.

15. Procédé selon la revendication 14, dans lequel les données encodées représentant lesdits symboles dudit ensemble de symboles sont triées d’abord par longueur puis par identifiant, dans un ordre ascendant, et dans lequel ladite étape consistant à fournir une table d’index comprend la fourniture d’une table d’index pour chaque longueur, chaque table d’index contenant une liste d’identifiants triés dans un ordre ascendant pour des symboles représentés par des données encodées d’une longueur particulière.

16. Procédé selon la revendication 15, dans lequel ladite étape de recherche comprend la recherche dans lesdites tables d’index dudit identifiant en utilisant une recherche binaire.

17. Procédé selon l’une quelconque des revendications 14 à 16, dans lequel ledit procédé comprend de plus l’étape consistant à décompresser les données ré- cupérées.

18. Procédé selon l’une quelconque des revendications 14 à 17, dans lequel lesdites données encodées comprennent un bitmap bidimensionnel d’un symbole.

19. Dispositif destiné à compresser des données repré- sentant un ensemble de symboles, comprenant :

un encodeur (10, 40) configuré pour encoder chaque symbole dudit ensemble de symboles sous la forme d’un code en deux parties, et un moyen de modèle source (18, 48) configuré pour générer un modèle statistique dudit ensemble de symboles qu’une première partie dudit code à deux parties comprend,

dans lequel ledit encodeur (10, 40) est configuré pour encoder des données représentant chaque symbole dudit ensemble de symboles, grâce à un code dérivé à partir dudit modèle statistique, pour fournir une seconde partie dudit code à deux parties pour chaque symbole dudit ensemble, et

dans lequel la première partie dudit code à deux parties est commune à tous les symboles encodés dudit ensemble et la seconde partie dudit code à deux parties comprend des données encodées représen- tant un symbole dudit ensemble, et dans lequel chaque symbole encodé dudit ensemble est accessible et peut être décompressé séparément.

20. Dispositif selon la revendication 19, dans lequel lesdites données représentant chacun desdits symboles comprennent un bitmap de chacun desdits symboles et dans lequel ledit encodeur (10) comprend : un encodeur arithmétique (16) pour encoder sé- quentiellement des pixels dudit bitmap ; et le moyen de modèle source (18) pour fournir des probabilités de codage pour l’encodeur arithmétique, ledit moyen de modèle source comprenant une unité de formation de contexte (22) destinée à constituer un contexte pour chaque pixel du bitmap et une table de probabilité (24) contenant la probabilité de pixel de chaque pixel conditionnée sur le contexte constitué par l’unité de formation de contexte (22).

21. Dispositif selon la revendication 19, dans lequel lesdites données représentant chacun desdits symboles comprennent un bitmap de chacun desdits symboles, et dans lequel ledit encodeur (40) comprend : le moyen de modèle source (48) destiné à fournir une valeur prévue pour chaque pixel du bitmap,

un moyen (44) pour affecter d’un OU-exclusif la valeur prévue de chaque pixel avec le bit réel pour produire un flux de bits, et

une unité de codage de Huffman (48) pour encoder le flux de bits.

22. Dispositif selon la revendication 21, dans lequel ledit moyen de modèle source (48) comprend :

une unité de formation de contexte (42) destinée à constituer un contexte pour chaque pixel du bitmap et une table de prévision (42) destinée à fournir la valeur prévue pour chaque pixel dans chaque contexte.

23. Dispositif selon l’une quelconque des revendications 19 à 22, dans lequel ledit ensemble de symboles comprend une police.

24. Dispositif (30), destiné à récupérer des données re- présentant un symbole parmi un ensemble de symboles, qui sont stockées dans une mémoire, chacun desdits symboles possédant un identifiant, comprenant :

un dispositif de repérage configuré pour identifier un emplacement dans ladite mémoire, dans lequel ladite donnée représentant un symbole dudit ensemble de symboles est stockée, en re- cherchant dans une table d’index qui contient une liste d’identifiants pour différents symboles dudit ensemble de symboles, en utilisant ledit identifiant pour ledit symbole ; et

(11)

5

10

15

20

25

30

35

40

45

50

55

un décodeur (36) qui est adapté pour décoder ledit symbole, où les données représentant chaque symbole encodé sont accessibles et peu- vent être décodées séparément.

25. Dispositif selon la revendication 24, dans lequel les données représentant ledit ensemble de symboles sont triées par longueur, et dans lequel une table d’index est fournie pour chaque longueur, chaque table d’index étant configurée pour contenir une liste d’identifiants pour des symboles représentés par des données d’une longueur particulière.

(12)

(13)

(14)

(15)