Method and computer program product for implementing text...

Coded data generation or conversion – Digital code to digital code converters – To or from packed format

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C341S011000, C341S050000, C341S055000, C341S061000, C341S063000

Reexamination Certificate

active

06373409

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to the data processing field, and more particularly, relates to a method and computer program product for implementing text conversion table compression.
DESCRIPTION OF THE RELATED ART
Typical data compression techniques in use today employ standard algorithms for analyzing and compacting information. In many applications, these techniques work well. However, it is far more efficient if the compression technique being used possesses foreknowledge of the format of the data. In this case, the technique can be tailored to suit the patterns present in the data.
Many applications developed by International Business Machines Corp. are real time national language support (NLS) enabled. That is, the applications need to be able to convert character data between different character sets and encoding schemes on-the-fly. A common way of achieving this conversion is to maintain a conversion table that maps characters in one codepage to those in another. Often, many different character sets and encoding schemes are supported by any given application, so it follows that the application needs to possess conversion tables for each of the sets it supports.
With performance in mind, for any given pair of codepages X and Y, it is best to have a corresponding pair of mappings where:
M
1
:
X
->
Y
M
2
:
Y
->
X
That is, M
1
maps a character in X to its corresponding character in Y, and M
2
maps a character in Y to its corresponding character in X.
With an object-oriented programming environment, these mappings can be realized as array-type data structures. If the characters in each character set are treated as or assigned unique integers, then the mapping becomes trivial. For M
1
, each character of type X(CX) becomes an index into M
1
's array, an index at which the contents is the corresponding Y value and vice-versa for M
2
:
M
1
(
CX
)=
CY
M
1
(
CY
)=
CX
For example, character data in the EBCDIC format must be converted into Unicode for a Java program to manipulate the character data usefully. The reverse is true when Unicode characters from the Java program must be converted into EBCDIC format for other programs.
With performance in mind, the fastest way to convert a string of EBCDIC characters into a string of Unicode characters is to use the technique described above. That is use a direct table lookup for each character. For a given string of n characters, this provides a O(n) solution.
To achieve a set level of performance, the conversion tables themselves must be created so that there is very little overhead in the lookup. That is, they must be fully expanded to provide a one-to-one mapping of any character indices used. There are some exceptions to this. Depending on the codepage being converted, some character sets cannot always provide a one-to-one mapping for all characters due to linguistic differences, but in general, a one-to-one mapping is accepted.
In a product, such as IBM AS/400 Toolbox for Java, there are many supported codepages, each requiring its own two conversion tables: Unicode->codepage and codepage->Unicode. For double-byte character sets, sometimes known as graphic character sets, such as Japanese and Korean, these tables can become rather large. Among the double-byte character sets, as well as Unicode, each character is assigned a 16-bit integer value. As a result, the entire character set will comprise 2
16,
or 65536, distinct values. With each value taking up 2 bytes itself, one conversion table will comprise 128 KB of memory, not including the overhead associated with creating such an array in an object-oriented environment. At two tables per codepage, the total is now up to 256 KB. If the Toolbox included 10 double-byte languages, for example, this would mean an increase in the size of the product by more than 2 MB.
It is desirable to reduce the size of conversion tables, for example, for viable and timely transmission of the application over the Internet, and for reduced memory requirements for local storage. A need exists for an effective technique for compressing the text conversion tables at build time.
SUMMARY OF THE INVENTION
A principal object of the present invention is to provide a method and computer program product for implementing text conversion table compression. Other important objects of the present invention are to provide such method and computer program product for implementing text conversion table compression substantially without negative effect; and that overcome many of the disadvantages of prior art arrangements.
In brief, a method and computer program product are provided for implementing text conversion table compression. For implementing text conversion table compression, a character sequence is loaded from a full-size conversion table. The character sequence is checked for one of plurality of character patterns. Responsive to identifying one of the plurality of character patterns, the character sequence is compressed into a compressed conversion table for the identified one character pattern. Responsive to failing to identify one of the plurality of character patterns, the character sequence is copied into the compressed conversion table.
In accordance with features of the invention, the character sequence from the full-size conversion table is checked for one of the plurality of character patterns including a repeating character sequence, a ramping character sequence, and a repeating high byte character sequence.


REFERENCES:
patent: 4899147 (1990-02-01), Schiavo et al.
patent: 4988998 (1991-01-01), O'Brien
patent: 5049881 (1991-09-01), Gibson et al.
patent: 5229768 (1993-07-01), Thomas
patent: 6236341 (2001-05-01), Dorward et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and computer program product for implementing text... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and computer program product for implementing text..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and computer program product for implementing text... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2881601

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.