Method for segmenting non-segmented text using syntactic parse

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06968308

ABSTRACT:
Embodiments of the present invention provide a method and apparatus for segmenting text by providing orthographic and inflectional variations to a syntactic parser. Under the present invention, possible segments are first identified in the sequence of characters. At least two of the identified segments overlap each other. For at least one of the segments, an alternative sequence of characters is identified. In some cases, this alternative sequence is formed through inflectional morphology, which identifies a different lexical form for a word identified by the segment. In some cases, the alternative sequence represents an orthographic variant of a word identified by the segment. The identified segments and the alternative segments are then passed to a syntactic analyzer, which produces one or more syntactic parses. The segments found in the resulting parses represent the segmentation of the input sequence of characters.

REFERENCES:
patent: 3969700 (1976-07-01), Bollinger et al.
patent: 4942526 (1990-07-01), Okajima et al.
patent: 5029084 (1991-07-01), Morohasi et al.
patent: 5168533 (1992-12-01), Kato et al.
patent: 5299125 (1994-03-01), Baker et al.
patent: 5305396 (1994-04-01), Betts et al.
patent: 5448474 (1995-09-01), Zamora
patent: 5469354 (1995-11-01), Hatakeyama et al.
patent: 5778361 (1998-07-01), Nanjo et al.
patent: 5806021 (1998-09-01), Chen et al.
patent: 5835888 (1998-11-01), Kanevsky et al.
patent: 5917941 (1999-06-01), Webb et al.
patent: 5946648 (1999-08-01), Halstead, Jr. et al.
patent: 5963893 (1999-10-01), Halstead, Jr. et al.
patent: 6101492 (2000-08-01), Jacquemin et al.
patent: 6175834 (2001-01-01), Cai et al.
patent: 6311152 (2001-10-01), Bai et al.
patent: 6760695 (2004-07-01), Kuno et al.
patent: WO 98/08169 (1998-02-01), None
patent: WO 99/62001 (1999-12-01), None
Sproat et al., “A Stochastic Finite-State Word-Segmentation Algorithm for Chinese”. Computational Linguistics, vol. 22:3, Sep. 1996.
Yeh et al., “Rule-Based Word Identification For Mandarin Chinese Sentences—A Unification Approach”, forComputer Processing of Chinese & Oriental Languages, vol. 5, No. 2 (Mar. 1991).
Nie et al., “Unknown Word Detection and Segmentation of Chinese Using Statisatical and Heuristic Knowledge”, forCommunication of COLIPS, vol. 5, Nos. 1 & 2, p. 47-57 (Dec. 1995).
Teller et al., “A Probabilistic Algorithm for Segmenting Non-Kanji Japanese Strings”, for Natural Language Processing (Jul. 31, 1994).
“Method of Segmenting Texts into Words” forIBM Technical Disclosure Bulletin, vol. 39, No. 11, pp. 115-118 (Nov. 1996).
Chen et al., “Word Identification for Mandarin Chinese Sentences” Proceedings of the 14thInternational Conference on Computational Linguistics, pp. 101-107, Nantes, France (Coling '92).
Wu et al., “Chinese Text Segmentation for Text Retrieval: Achievements and Problems”, Journal of the American Society for Information Science, 44(9) : 532-542 (1993).
Chang et al., “A Multiple-Corpus Approach to Recognition of Proper Names in Chinese Texts”, Computer Processing of Chinese and Oriental Languages, vol. 8, No. 1, pp. 75-85 (Jun. 1994).
Sproat et al., “A Stochastic Finite-State Word Segmentation Algorithm for Chinese”, Computational Linguistics, vol. 22, No. 3, pp. 377-404 (1996).
Gan et al., “A statistically Emergent Approach for Language Processing: Application to Modeling Context Effects in Ambiguous Chinese Word Boundary Perception”, Computational Linguistics, vol. 22, No. 4, pp. 531-553 (1996).
Guo, J., “Critical Tokenization and it Properties”, Computational Linguistics, vol. 23, No. 4, pp. 569-596 (1997).
Yuan et al., “Splitting-Merging Model for Chinese Word Tokenization and Segmentation”, Department of Information Systems & Computer Sciences, National University of Singapore (No date).
Huang et al., “A Quick Method for Chinese Word Segmentation”, IEEE Conf. of Intelligent Processing Systems, p. 1773-1776 (Oct. 28-31, 1997).
Fan et al., “Automatic Word Identification in Chinese Sentences by the Relaxation Technique”, Computer Processing of Chinese and Oriental Languages, vol. 4, No. 1, pp. 33-56 (Nov. 1988).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for segmenting non-segmented text using syntactic parse does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for segmenting non-segmented text using syntactic parse, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for segmenting non-segmented text using syntactic parse will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3476346

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.