System and method for sorting character strings containing...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06243701

ABSTRACT:

TECHNICAL FIELD
This invention relates to systems and methods for sorting character strings (e.g., words and names), and particularly, to sorting character strings that contain combinations of accented and unaccented characters.
BACKGROUND
Conventional sorting algorithms are designed to sort character strings (e.g., words, phrases, names, etc.) alphabetically according to the characters within the strings. However, in some languages, non-character symbols or marks are often added to characters to modify the pronunciation of the characters or the string as a whole. One common type of pronunciation modifier is an accent. Accents are common in many non-English languages, such as Danish, Latin, German, and Japanese.
Computerized sorting routines have a drawback in that they may mishandle character strings that contain a combination of accented and unaccented characters. Consider the Japanese case. The Japanese language includes three character sets: Kanji, Hiragana, and Katakana. The latter two character sets—Hiragana and Katakana—are collectively known as Kana characters. Kana characters include special accented characters known as “dakuten” and “handakuten” characters.
In each of the Hiragana and Katakana character sets, there are twenty dakuten characters and five handakuten characters. Dakuten characters appear identical to a companion set of Kana characters except for a small double slash accent that appears in the upper right hand corner of the character. Handakuten characters appear identical to five of the dakuten characters except for replacing the small double slash accent with a small circle accent.
Conventional sorting routines are effective at sorting Kanji-only character strings and Kana-only character strings. However, problems arise when Kanji and Kana characters are mixed in the string. The sorting routines give more weight to differences between Kanji characters in two character strings than that of dakuten and handakuten characters. As a result, the sorting routines often yield strings that are ordered incorrectly and not reflecting how such character strings would appear in a Japanese dictionary or telephone book.
Accordingly, there is a need to improve processes for sorting accented characters. In the Japanese case, the goal is to sort the strings identically to how they would be listed in a Japanese dictionary or telephone book.
SUMMARY
This invention concerns a technique for sorting character strings containing characters that are either unmodified or modified by one or more pronunciation modifiers (e.g., accents). The technique involves creating an expanded character string containing the characters in their base form (without the pronunciation modifiers) and ordinal values indicating whether the base characters are unmodified or modified with one of the one or more pronunciation modifiers. The process forms the base characters by removing the pronunciation modifiers from the character string. Ordinal values are then assigned to corresponding ones of the base characters, whereby the ordinal values differentiate among the base characters that are unmodified and those that are modified. The ordinal values also differentiate among the base characters that are modified by different pronunciation modifiers. The process concatenates the base characters and their corresponding ordinal values to form the expanded character string.
Once the character strings are expanded, the process sorts the expanded character strings. The process first sorts the strings according to the base characters and secondly according to the ordinal values.


REFERENCES:
patent: 4587628 (1986-05-01), Archer
patent: 4873625 (1989-10-01), Archer
patent: 4939639 (1990-07-01), Lee
patent: 5615366 (1997-03-01), Hansen
patent: 5926787 (1999-07-01), Bennett

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for sorting character strings containing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for sorting character strings containing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for sorting character strings containing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2503008

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.