Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Reexamination Certificate
1998-10-07
2002-03-12
Edouard, Patrick N. (Department: 2644)
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
C707S793000
Reexamination Certificate
active
06356866
ABSTRACT:
FIELD OF THE INVENTION
This invention relates in general to inputting text of an Asian language for operation by a program module, such as a word processing program, and in particular to using an Input Method Editor (IME) to convert an input string representing text of the Asian language into the proper characters for that language.
BACKGROUND OF THE INVENTION
Providing text to a program module, such as a word processing program, is straightforward when a written language has one small character set. For example, the English language system uses twenty-six alphabet characters. Typical keyboards for conventional desktop computers have approximately 101 keys, so each English language alphabet character is assigned to a different key. To enter a word into an electronic document, an author depresses the keys that correspond to the letters of the words. The keystrokes are sent from the keyboard to the word processing program running on the computer.
In contrast to the English language system, some language systems, including East Asian languages, such as Japanese, Chinese, and Korean, have significantly more characters than there are keys on a keyboard. For example, the Japanese language system uses thousands of pictographic, Chinese-derived Kanji characters. The large number of Kanji characters precludes assigning each Kanji character to a different key. The process is further complicated because Japanese text also incorporates three other character sets. The most common is Hiragana, a character set of 46 phonetic syllable characters. Katakana (46 phonetic syllable characters) and Romaji (the 26 character Latin alphabet) are used for words whose origins are neither Japanese nor Chinese. Thus, Japanese computer users require front-end input processing to select the desired character from the appropriate character set for entry into an electronic document. Similarly, other East Asian language computer users, such as a Chinese user, also require front-end input processing to support the entry of characters into an electronic document.
Focusing on electronic document processing issues for Japanese users, typists can work modally, switching from character set to character set and specifying characters by a series of one or more keystrokes. However, the sheer size of the Kanji character set makes this approach impractical for typists to master. Instead, typists use a front-end processor, commonly known as an Input Method Editor (IME), to produce Japanese text from phonetic input. Typically, these front-end input processors convert Romaji alphabet strings into their sound-alike kana (Hiragana and/or Katakana) characters, or accept text directly entered in a kana character set, and then process the kana into Japanese text in a separate step.
Japanese IME conversion is error-prone for two main reasons: homophones and ambiguous word breaks. First, Japanese, like English, contains words that sound alike and might even be appropriate in the same context; for an English example, “I want these two” and “I want these too.” Second, Japanese typists typically do not delimit words; the IME must decide how to group the kana characters into words. Because of this possibility for conversion error, the IME must allow the user to choose among alternate conversions after she has proofread the IME's conversion.
From a user's perspective, the traditional method for Japanese IME operation involves three basic steps. First, the user types a phonetic phrase, in kana or Romaji. This phrase is typically very short because the typist knows that shorter phrases are more successfully converted. Second, the user stops typing and hits the “convert” key. Third, the user proofreads the conversion.
If the conversion is inaccurate, the user can depress the convert key again. The IME reconverts to the next most likely character set. If this is still not the desired character set, the user hits the convert key a third time. On the third conversion attempt, the IME presents a prioritized list of possible conversions. If the desired conversion is absent from the list, the user might manually select desired Japanese pictographs using another conversion mechanism. Once satisfied, the user approves the conversion and returns to typing. The converted text is then given “determined” status, i.e., the input string is discarded and the converted text is maintained.
This IME model has two main drawbacks: reduced typing speed and increased learning time. Speed is compromised because the typist must use extra keystrokes to convert text. Additionally, the input rhythm for inputting characters broken because the typist must proofread at each conversion, or lose the opportunity to choose among alternate conversions. Learning time is increased because prior IME systems typically require user training and experience to gain optimum performance from the IME.
The “IME '97” front-end input processor marketed by Microsoft Corporation of Redmond, Washington offers an improved solution. With this option, text is automatically converted when the IME detects a viable phrase, and automatically determined if the user continues typing for several lines without converting. However, alternate conversions are unavailable for determined text as in the traditional IME model described above.
Accordingly, there is a need in the art for a method for an IME that operates as an automated background process and avoids the editing difficulties of “determined” text. There is a further need for a background input processor for converting kana to Japanese text and for generating alternate conversions for converted text positions to support efficient error conversion.
SUMMARY OF THE INVENTION
Generally described, the present invention meets the needs of Asian computer users for both background text processing and convenient and flexible error corrections. An Input Method Editor (IME) can convert an input string representing text of an East Asian language, such as Japanese, Chinese or Korean, into the proper characters for that language. The present invention is equally applicable to other large-character-set languages comprising of nonphonetic characters.
The present invention provides a computer-implemented method for converting phonetically-coded input into the proper characters of a selected language for use by a program module, such as a word processor, running on a computer system. The input string is converted into a language text string automatically, i.e., without explicit conversion events prompted by the user.
The present invention also can support a reconversion operation to address inaccurate text conversions. For example, when two or more distinct phrases contain the same phonetic syllables, the automatic conversion may produce an incorrect section of text. The user may correct these conversion mistakes by accessing alternate conversions of any section of text at any time. When text is selected for reconversion, a corresponding all-phonetic string is identified. This phonetic string is used to generate the list of alternate conversions for the selected text. To produce a corrected conversion, the user may select among the alternate conversions provided, or perform a manual conversion by explicitly selecting characters.
For an IME system compatible with the Japanese language, phonetically-coded Japanese character strings are typically entered in Romaji (the same character set used by English) and immediately converted to kana, usually Hiragana. For example, a user typing the letter “k” will see “k” displayed on the screen of a display device. When she follows “k” with “a”, forming the syllable “ka”, the corresponding kana character replaces the “k” in the user's display device. The user sees a constant shift of Romaji to kana characters on the display device. The phonetic input in its intermediate, pre-conversion state will be referred to as a phonetically-coded string, such as a kana string or kana characters. The invention, however, is also applicable to non-Romaji phonetic input methods. For example, voice recognition software an
Oliver David C.
Pratley Christopher H.
Rucker Erik J.
Urata Kentaro
Edouard Patrick N.
Merchant & Gould
Microsoft Corporation
LandOfFree
Method for converting a phonetic character string into the... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for converting a phonetic character string into the..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for converting a phonetic character string into the... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2868653