Coded data generation or conversion – Digital code to digital code converters – To or from number of pulses
Reexamination Certificate
2002-05-13
2004-05-18
Jeanglaude, Jean (Department: 2819)
Coded data generation or conversion
Digital code to digital code converters
To or from number of pulses
C341S050000, C341S066000, C341S090000
Reexamination Certificate
active
06737994
ABSTRACT:
FIELD OF THE INVENTION
The invention disclosed broadly relates to the field of information handling systems, and more particularly to the field of data compression.
BACKGROUND OF THE INVENTION
Expressing the same digital information with fewer bits is a continuing challenge in the field of information technology. This is particularly the case with the various standard character sets have been adopted for expressing alphanumeric characters in digital form. One case is Unicode, a superset of the ASCII (American Standard Code for Information Interchange) character set that uses two or more bytes for each character so that it can house the alphabets of most of the world's languages. Under the Unicode scheme, as in others, a unique number represents each character. Other character sets use a different set of digital numbers to represent characters. Unicode generally requires more data to specify alphanumeric characters than ASCII because it can express characters in various alphabets. There is thus a need for a compression method for Unicode that is well suited for certain classes of applications, such as large databases. There is a further need for a compression method that compresses small strings well, such as individual fields in a database. These are situations where compression mechanisms such as LZW (Lempel-Ziv-Welch) do not work well because they are better suited to large bodies of text. In addition, there is a need for one very important characteristic: binary comparison. For many applications, it is very important that databases be able to have the same binary order for compressed Unicode fields as they do for uncompressed fields. Other encoding schemes such as SCSU use essentially random binary order, which makes them unsuitable in many applications.
SUMMARY OF THE INVENTION
Briefly, according to the invention, a system and method for encoding an input sequence of code points to produce an output sequence of bytes include the steps of:
receiving a plurality of values, each value representing a code point (character) in the input sequence;
calculating a signed delta value for each code point in the input sequence, wherein each delta value is determined by subtracting the value of a base code point from the value of the current code point to produce the delta value for the current code point;
encoding each delta value into a set of bytes wherein small deltas are encoded in a small number of bytes and larger delta values are encoded in successively larger numbers of bytes;
selecting a lead byte value for the output sequence so that the binary order of the output sequence is the same as the binary order of the input sequence;
writing to the output sequence each delta value for each code point in the input sequence.
REFERENCES:
patent: 5784071 (1998-07-01), Tang et al.
patent: 5793381 (1998-08-01), Edberg et al.
patent: 6204782 (2001-03-01), Gonzalez et al.
Davis Mark Edward
Scherer Markus Walter
Buchenhorner Michael J.
International Business Machines - Corporation
Jeanglaude Jean
Strimaitis Romualdas
LandOfFree
Binary-ordered compression for unicode does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Binary-ordered compression for unicode, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Binary-ordered compression for unicode will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3248148