Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Patent
1996-07-19
1999-03-30
Thomas, Joseph
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
704 8, 707531, 707536, G06F 1727
Patent
active
058901035
ABSTRACT:
This invention improves information retrieval by providing a tokenizing apparatus and method that parses natural language text in a manner that increases the throughput of an information retrieval or natural language analysis system. The tokenizer includes a parser that extracts characters from the stream of text, an identifying element for identifying a token formed of characters in the stream of text that include lexical matter, and a filter for assigning tags to those tokens requiring further linguistic analysis. The tokenizer, in a single pass through the stream of text, determines the further linguistic processing suitable to each particular token contained in the stream of text.
REFERENCES:
patent: 4724523 (1988-02-01), Kucera
patent: 4730270 (1988-03-01), Okajima et al.
patent: 4771401 (1988-09-01), Kaufman et al.
patent: 4862408 (1989-08-01), Zamora
patent: 4864501 (1989-09-01), Kucera et al.
patent: 4864502 (1989-09-01), Kucera et al.
patent: 4868750 (1989-09-01), Kucera et al.
patent: 4914590 (1990-04-01), Loatman et al.
patent: 4964044 (1990-10-01), Kumano et al.
patent: 4991094 (1991-02-01), Fagan et al.
patent: 5111398 (1992-05-01), Nunberg et al.
patent: 5224038 (1993-06-01), Bespalko
patent: 5229936 (1993-07-01), Decker et al.
patent: 5243520 (1993-09-01), Jacobs et al.
patent: 5251129 (1993-10-01), Jacobs et al.
patent: 5278980 (1994-01-01), Pedersen et al.
patent: 5282265 (1994-01-01), Rohra Suda et al.
patent: 5331556 (1994-07-01), Black, Jr. et al.
patent: 5383120 (1995-01-01), Zernik
patent: 5423032 (1995-06-01), Byrd et al.
patent: 5475588 (1995-12-01), Schabes et al.
patent: 5523946 (1996-06-01), Kaplan et al.
patent: 5642522 (1997-06-01), Zaenen et al.
patent: 5680628 (1997-10-01), Carus et al.
patent: 5704060 (1997-12-01), Del Monte
patent: 5708825 (1998-01-01), Sotomayor
Brill, Eric, "A Simple Rule-Based Part of Speech Tagger", Third Conf. Applied Natural Lang. Processing, Proceedings of the Conference (1992).
Frakes, W. and Baeza-Yates, R. (eds), Information Retrieval Data Structures and Algorithms, PTR Prentice-Hall, Inc.,ch. 7, 102-130 (1992).
Frakes, W. and Baeza-Yates, R. (eds), Information Retrieval Data structures and Algorithms, PTR Prentice-Hall, Inc., ch. 8, 131-151 (1992).
Schwarz, C., "Automatic Syntactic Analysis of Free Text", J. Am. Soc. Info. Sci. 41(6):408-417 (1990).
Lernout & Hauspie Speech Products N.V.
Thomas Joseph
LandOfFree
Method and apparatus for improved tokenization of natural langua does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for improved tokenization of natural langua, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for improved tokenization of natural langua will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1225168