Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-06-29
2001-06-05
Coleman, Eric (Department: 2783)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06243701
ABSTRACT:
TECHNICAL FIELD
This invention relates to systems and methods for sorting character strings (e.g., words and names), and particularly, to sorting character strings that contain combinations of accented and unaccented characters.
BACKGROUND
Conventional sorting algorithms are designed to sort character strings (e.g., words, phrases, names, etc.) alphabetically according to the characters within the strings. However, in some languages, non-character symbols or marks are often added to characters to modify the pronunciation of the characters or the string as a whole. One common type of pronunciation modifier is an accent. Accents are common in many non-English languages, such as Danish, Latin, German, and Japanese.
Computerized sorting routines have a drawback in that they may mishandle character strings that contain a combination of accented and unaccented characters. Consider the Japanese case. The Japanese language includes three character sets: Kanji, Hiragana, and Katakana. The latter two character sets—Hiragana and Katakana—are collectively known as Kana characters. Kana characters include special accented characters known as “dakuten” and “handakuten” characters.
In each of the Hiragana and Katakana character sets, there are twenty dakuten characters and five handakuten characters. Dakuten characters appear identical to a companion set of Kana characters except for a small double slash accent that appears in the upper right hand corner of the character. Handakuten characters appear identical to five of the dakuten characters except for replacing the small double slash accent with a small circle accent.
Conventional sorting routines are effective at sorting Kanji-only character strings and Kana-only character strings. However, problems arise when Kanji and Kana characters are mixed in the string. The sorting routines give more weight to differences between Kanji characters in two character strings than that of dakuten and handakuten characters. As a result, the sorting routines often yield strings that are ordered incorrectly and not reflecting how such character strings would appear in a Japanese dictionary or telephone book.
Accordingly, there is a need to improve processes for sorting accented characters. In the Japanese case, the goal is to sort the strings identically to how they would be listed in a Japanese dictionary or telephone book.
SUMMARY
This invention concerns a technique for sorting character strings containing characters that are either unmodified or modified by one or more pronunciation modifiers (e.g., accents). The technique involves creating an expanded character string containing the characters in their base form (without the pronunciation modifiers) and ordinal values indicating whether the base characters are unmodified or modified with one of the one or more pronunciation modifiers. The process forms the base characters by removing the pronunciation modifiers from the character string. Ordinal values are then assigned to corresponding ones of the base characters, whereby the ordinal values differentiate among the base characters that are unmodified and those that are modified. The ordinal values also differentiate among the base characters that are modified by different pronunciation modifiers. The process concatenates the base characters and their corresponding ordinal values to form the expanded character string.
Once the character strings are expanded, the process sorts the expanded character strings. The process first sorts the strings according to the base characters and secondly according to the ordinal values.
REFERENCES:
patent: 4587628 (1986-05-01), Archer
patent: 4873625 (1989-10-01), Archer
patent: 4939639 (1990-07-01), Lee
patent: 5615366 (1997-03-01), Hansen
patent: 5926787 (1999-07-01), Bennett
Barker Guy
Boone Daniel
Shields Kevin Timothy
Shih Yung-Ho
Coleman Eric
Lee & Hayes PLLC
Microsoft Corporation
LandOfFree
System and method for sorting character strings containing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for sorting character strings containing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for sorting character strings containing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2503008