Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Patent
1996-06-28
1999-10-05
Thomas, Joseph
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
707531, G06F 1727
Patent
active
059638931
ABSTRACT:
A word breaking facility operates to identify words within a Japanese text string. The word breaking facility performs morphological processing to identify postfix bound morphemes and prefix bound morphemes. The word breaking facility also performs opheme matching to identify likely stem characters. A scoring heuristic is applied to determine an optimal analysis that includes a postfix analysis, a stem analysis, and a prefix analysis. The morphological analyses are stored in an efficient compressed format to minimize the amount of memory they occupy and maximize the analysis speed. The morphological analyses of postfixes, stems, and prefixes is performed in a right-to-left fashion. The word breaking facility may be used in applications that demand identity of selection granularity, autosummarization applications, content indexing applications, and natural language processing applications.
REFERENCES:
patent: 5268840 (1993-12-01), Chang et al.
patent: 5477448 (1995-12-01), Golding et al.
patent: 5485372 (1996-01-01), Golding et al.
patent: 5497319 (1996-03-01), Chong et al.
patent: 5521816 (1996-05-01), Roche et al.
patent: 5528491 (1996-06-01), Kuno et al.
patent: 5535121 (1996-07-01), Roche et al.
patent: 5537317 (1996-07-01), Schabes et al.
patent: 5778361 (1998-07-01), Nanjo et al.
patent: 5799269 (1998-08-01), Schabes et al.
Abe, Masahiro et al., "A Kana-Kanji Translation system for Non-Segmented Input Sentences Based on Syntactic and Semantic Analysis," Zeitschrift fuer Werkstofftechnik--Journal of Materials Technology. Materials Technology and Testing, Aug. 25, 1986, pp. 280-285.
Nobuyasu Itoh, Japanese Language Model Based on Bigrams and its Application to On-Line Character Recognition, Pattern Recognition, vol. 28, No. 2, Feb. 1, 1995, pp. 135-140.
Takeda, Koichi et al. "CRITAC-An Experimental System for Japanese Text Proofreading," IBM Journal of Research and Development, vol. 2, No. 2, Mar. 1988, pp. 201-216.
Teller, Virginia et al. "A Probablilistic Algorithm for Segmenting Non-Kanji Japanese Strings," Proceedings Tenth National Conference on Artifical Intelligence, Jul. 12-16 1992, vol. 1, Jul. 31, 1994, pp. 742-747.
Itoh et al., "Sub-Phonemic Optimal Path Search for Concatenative Speech Synthesis," 4th European Conference on Speech Communication and Technology, Madrid, Sep., 1995, pp. 577-580.
Kurohashi et al., "Improvements of Japanese Morphological Analyzer JUMAN," Proceedings of SNLR, 1994, pp. 22-28.
Hisamitsu, Toru, and Yoshihiko Nitta, "An Efficient Treatment of Japanese Verb Inflection for Morphological Analysis," Coling 1994 Proceedings, vol. 1, pp. 194-200.
Halstead, Jr. Patrick H.
Suzuki Hisami
Microsoft Corporation
Thomas Joseph
LandOfFree
Identification of words in Japanese text by a computer system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Identification of words in Japanese text by a computer system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Identification of words in Japanese text by a computer system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1183197