Method and apparatus using a list of <minimum, size, gap,...

Computer graphics processing and selective visual display system – Computer graphics processing – Character generating

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C345S472300, C345S556000, C345S601000, C345S636000

Reexamination Certificate

active

06717580

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The invention relates to character mapping. More particularly, the invention relates to a method and apparatus using a list of <minimum, size, gap, offset> quadruple to encode Unicode characters in an upper/lower case character mapping.
2. Description of the Prior Art
The Unicode standard, version 2.0 (UCS2) specifies character coding. The majority of the Unicode characters map to one Unicode for each character case mapping. Therefore, a solid implementation of a basic 1-to-1 character case mapping is an important foundation for a general locale sensitive, contextual based Unicode string case mapping. There are 65,536 (63,488 if one ignores surrogates) possible bit combinations of UCS2. The Unicode 3.0 standard allocates 57,709 characters. There are 10,617 entries in the Unicode character database (based on the Unicode 3.0 standard). There are 1,398 UCS2 characters which have a different case (see, for example, FIGS.
1
-
3
).
It is highly desirable to provide basic 1-to-1 character case mapping information while using only small amount of memory and at a reasonable speed. Character mapping in Unicode is, at best, an onerous task. One problem with mapping Unicode characters is that of mapping between upper and lower case characters.
The problem may be stated as follows:
Given one Unicode string as input, output the upper (or lower) case string:
“unicode”->“UNICODE”
Various attempts have been made to solve this problem. These approaches include: a conditional offset; a flat array; an indirect array; a list of a mapping pair; and a compact array.
Conditional Offset
This approach is used widely in the early 7-bit US-ASCII implementation.
Data Structure & Algorithm
To upper case:
output=((
‘a
’<=input)&&(input<=‘
z
’))?(input−‘
a’+‘A
’):input;
To lower case:
output=((
‘A’<=
input)&&(input<=‘
Z
’))?(input−‘
A’+‘a’
):input;
Result
This algorithm is very simple and compact. It basically compresses the case mapping information into three bytes for 7-bits US-ASCII. This algorithm is based on the assumption of the following important characteristics of the 7-bits US-ASCII definition:
1. All the lower case characters are encoded in a continuous range;
2. All the upper case characters are encoded in a continuous range; and
3. All the lower case characters have an equal offset to corresponding upper case characters.
The assumptions above do not fit for most other character sets (charset). The same approach applied to ISO-8859-1 requires three if statements. This approach is not general enough for other charsets. For Unicode, the required number of if statements is too big for any practical implementation of this approach.
Flat Array
The conditional offset approach lacks flexibility and is not general enough for other charsets. In the early 1980's 8-bit charsets, such as ISO-8859-1, came into use. With the increased use of these charsets came a more flexible approach to solving the problem of character mapping for upper/lower case characters,. i.e. the flat array. The flat array requires considerable memory, but provides improved flexibility and performance. In fact, the flat array is still the preferred implementation for most single byte charsets.
Data Structure & Algorithm Use a flat array to contain the information. Use the input byte as the array index. Return the value of the array element as the output.
output=ToUpper[input];
Result
This approach uses more memory than the conditional offset approach. However, the performance is faster. It is a good choice when the total possible number of inputs is limited to less than or equal to 256 inputs. This approach is not practical for any multi-byte charset, such as Shift_JIS or Big 5. It therefore is not practical for use with Unicode. If this approach is used for multibyte charsets, then a significant amount of memory is required for the array.
For a single byte, the required memory for both an upper case and a lower case mapping is:
2[upper and lower]×1[sizeof(char)=1 bytes]×256[28=256]=512 bytes.
For Unicode or other two byte charset, the required memory becomes:
2[upper and lower]×2[sizeof(UCS2)=2 bytes]×65536[216=65535]=262,144 bytes.
While this approach is preferred for most single byte charset implementations, it is not a practical approach to implement Unicode or other multi-byte charsets.
Indirect Array
After studying the distribution of the case characters in Unicode, it can be noted that all of the characters are encoded in several localized regions of the 16-bits space. If the 16-bits space is divided equally into 256 blocks, each block has 256 Unicodes, and all of the case characters are encoded in eleven of these blocks (see, for example, FIGS.
1
-
3
).
The remaining 245 blocks do not encode any 1-to-1 case characters. Thus, it is possible to reduce the size of the memory required by the flat array approach by applying one level of indirection. The indirect array approach uses one flat array for each block and saves 245 flat arrays for those blocks which do not have 1-to-1 case characters.
The distribution of case characters in these blocks is set forth in Table 1 below.
TABLE 1
Distribution of Case Characters
U+_xx
00
01
02
03
04
05
1E
1F
21
24
FF
Total
Number of
114
239
69
97
230
76
241
193
35
52
52
1398
case
sensitive
characters
Data Structure & Algorithm
if(array[input >>8])
output=array[input>>8][input & 0x0FF]
else
output=input;
Result
This approach requires:
2[2=lower and upper]×11[“cut−
d; −f
1,13,14 UnicodeData-Latest.txt|egrep−
v
“;;$”|cut−
c
1−2|uniq|
wc−I”=>
11]×2[sizeof(UCS2)=2]×256[block size is 256]=11,264 bytes
for all the eleven blocks. Additional memory may be needed for the first index array. However, this step could be replaced by several if statements. The performance of this algorithm is slower than both algorithms above, but the amount of memory that is required is greatly reduced.
List of Mapping Pair
Because it is known that there are only 1,389 1-to-1 case characters, one can also compress this information into an array of <input, output> pairs and use a binary search to find the desired information.
Data Structure & Algorithm
if( entry=BinarySearch(input, list))
output=entry->out;
else
output=input;
Result
Because a search process is required, the algorithm is slower, but the amount of memory required is only:
 2[2=in and out]×2[sizeof(UCS2)=2]×1398[Total number of entries in UnicodeData-Latest.txt which contains case information is 1,398 (see above)]×2=5,592 bytes.
Because there are 1,398 entries, the depth of the binary search is 10 or 11 [2
10
<1,398<2
11
].
Compact Array
Java and ICU use a compact array to encode Unicode character properties in general. They also apply the same approach for case mapping.
Data Structure & Algorithm
output=array2 [array 1 [input>>WINDOW_SIZE_BITS]][input & WINDOW_MASK];
See FIG.
4
. In ICU 1.3.1 (icu/source/common/uchar.c), this approach first checks the case of the input character itself. This check is done by one compact array (array indices and values in uchar.c ). For ToUpper operation, if the case of the input character is lower case, then this approach uses a second compact array (caseIndex and caseValue in uchar.c) to determine the value of the other case.
Result
In ICU-icu/source/common/uchar.c [based on ICU-1.3.1], this approach uses 64 as the window size. Table 2 shows the size of the table.
TABLE 2
Table Size
To decide
To map to the
Array Name
upper/lower case
“other” case
in uchar.c

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus using a list of &lt;minimum, size, gap,... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus using a list of &lt;minimum, size, gap,..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus using a list of &lt;minimum, size, gap,... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3274437

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.