Chinese character conversion apparatus using syntax information

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S258000, C704S231000, C704S270000, C704S008000, C704S009000, C704S003000, C707S793000, C707S793000

Reexamination Certificate

active

06587819

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an art of Chinese character conversion, and more particularly to a Chinese character conversion apparatus using syntax information which converts a phonogram string into a Chinese character by utilizing attribute information related to part of speech of a word.
2. Description of the Related Art
Ten thousand Chinese characters or more are used for documents written in Chinese. In the computer processing of the Chinese language which includes a Chinese word processor, the most important problem is that Chinese characters are input or converted accurately at a high speed by a document creator and the like. Examples of conventional means for inputting intended Chinese characters into a conversion apparatus include speech recognition, character recognition, a keyboard and the like. Since the input by means of the keyboard is the most reliable, the keyboard has been widely put into practical use.
A method for inputting Chinese characters using the keyboard is divided into two methods. One is a method using a reading (pronunciation) of Chinese characters and the other is a method using a shape of Chinese characters. In the input method using the shape, the input rules should be previously registered, and also it takes a considerably much time to register the input rules. Furthermore, it takes a long time to become accustomed to operate for users. On the other hand, the input method using the reading of the Chinese characters has widely been employed also in a Japanese word processor. The method is natural and easy to learn operation. Therefore, it is supposed that the reading input method would be the mainstream of the Chinese character input method in the future. The present invention relates to a Chinese character conversion apparatus which employs the reading input method.
For example, Taiwanese Patent Publication No. 089476 has disclosed a Chinese character conversion apparatus using a reading input method according to the prior art.
FIG. 6
is a diagram showing the structure of this Chinese character conversion apparatus.
In
FIG. 6
, an input section
100
inputs phonograms such as a pinyin, a zhuyin, Roman letters and the like which are intended to be converted into Chinese characters by the creator of Chinese document. The input section
100
can input any length (the number of phonograms) of characters. A word dictionary
180
stores phonogram strings and words to be converted corresponding to the phonograms. An NCHAR register
140
stores the number of syllables of the input phonogram string.
A PTR register
120
and a NP register
130
is used when the phonogram strings are converted into words, respectively. The PTR register
120
stores a position in the input phonogram string from which the conversion into a Chinese character starts. The NP register
130
stores a conversion word length on the conversion of the input phonogram string into a word, that is, the number of Chinese characters or syllables which constitute the word (In Chinese, one Chinese character has one syllable in principle.).
A comparator
150
controls a conversion controller such that by decreasing the value of the NP register
130
by one after the completion of conversion processing of a word having a certain length or a certain number of Chinese characters, conversion to Chinese character is performed preferentially for a word having a number which is decreased by one.
The conversion controller
160
sequentially shifts the set position of the PTR register
120
backward from the initial position of an input phonogram string, to verify whether or not there is a syllable which has been already converted into a Chinese character based on the number of Chinese characters or syllables constituting a word which is a conversion object set by the NP register
130
. If the conversion has not been carried out yet and a corresponding word is registered in the dictionary
180
, the controller
160
converts the word into a corresponding word in a dictionary
180
.
A dictionary searching section
170
searches the dictionary
180
by using, as a key, a syllable string sent from the conversion controller
160
. An output section
190
outputs the result of conversion carried out by the conversion controller
160
.
In the Chinese character conversion apparatus described above, however, a correct conversion rate is about 9%. The remaining 4% of erroneous conversion includes no word registration (40.2%), the mistake of word boundary detection (8.0%), the erroneous selection of homonymic characters and words (33.9%), broken sound character and tone conversion, and the like. It is the most difficult to solve the problems of the word boundary detection and the selection of homonymic characters and words.
For this reason, it is desired to implement a Chinese character conversion apparatus using syntax information which can prevent the erroneous conversion caused by the mistake of word boundary detection and the erroneous selection of homonymic characters and words as described above. The present invention is provided to solve the problems.
The result of investigation (versatile fields, 1800000 characters in total) is shown below, which indicates a frequency in use of words in Taiwan, 1985.
TABLE 1
word of
word of 1
word of
word of
word of
5 or more
character
2 chars.
3 chars.
4 chars.
chars.
Total
Number of
Quantity
845356
451048
12274
5506
220
1314404
word uses
%
64.3
34.3
0.9
0.4
0.0
100.0
Quantity
Quantity
3751
22941
2374
2010
83
31159
of words
%
12.0
73.6
7.6
6.4
0.2
100.0
Referring to the number of characters, words having two or more characters occupy 88%, and words having one character occupy 12%. Referring to the number of use of words (frequency in use), the words having two or more characters occupy only 35.7%, and the words having one character occupy 64.3%. Referring to the number of characters, the number of the words having two or more characters is greater than that of the words having one character. Referring to the frequency in use of the words, the number of the words having one character is greater than that of the words having two or more characters. Actually, most of dummy words of the Chinese language which have a high frequency in use (the stem of a word, the tail of a word, a postpositional particle, a constant particle, a pronoun, an ordinal number particle, an adverb, a continuation particle, a prepositional particle, a postpositional particle, an interjection) is composed of one character. Since the words having one character are included in longer words in accordance with the rule of the longest match method in the “Chinese character conversion apparatus”, they cannot be converted.
For this reason, in the case where the word boundary detection is carried out, the erroneous results are frequently obtained. Moreover, the selection of homonymic characters is frequently mistaken also in accordance with the rule of the selection of the homonymic characters based on the frequency in use, or the rule where a previous word is converted with priority (there are words having the same reading which can be converted before and after).
In consideration of the above-mentioned problems, it is an object of the present invention is to provide a Chinese character conversion apparatus using syntax information which gives a speech part attribute (a noun, a verb and the like) is given to each word stored in a dictionary and verifies and modifies the selection of wrong homonymic characters and words corresponding to the retrieval of compound characters.
SUMMARY OF THE INVENTION
In order to achieve the above object, the present invention provides a Chinese character conversion apparatus using syntax information comprises a compound character dictionary, a word dictionary, a syllable cut out section, a dictionary searching section, a compound character detecting section, a speech part attribute processing section, and a conversion controller.
The compound character dictionary stores phonetic symbols of Chinese compound charact

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Chinese character conversion apparatus using syntax information does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Chinese character conversion apparatus using syntax information, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Chinese character conversion apparatus using syntax information will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3007036

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.