Method and apparatus for improved tokenization of natural langua

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704 8, 707531, 707536, G06F 1727

Patent

active

058901035

ABSTRACT:
This invention improves information retrieval by providing a tokenizing apparatus and method that parses natural language text in a manner that increases the throughput of an information retrieval or natural language analysis system. The tokenizer includes a parser that extracts characters from the stream of text, an identifying element for identifying a token formed of characters in the stream of text that include lexical matter, and a filter for assigning tags to those tokens requiring further linguistic analysis. The tokenizer, in a single pass through the stream of text, determines the further linguistic processing suitable to each particular token contained in the stream of text.

REFERENCES:
patent: 4724523 (1988-02-01), Kucera
patent: 4730270 (1988-03-01), Okajima et al.
patent: 4771401 (1988-09-01), Kaufman et al.
patent: 4862408 (1989-08-01), Zamora
patent: 4864501 (1989-09-01), Kucera et al.
patent: 4864502 (1989-09-01), Kucera et al.
patent: 4868750 (1989-09-01), Kucera et al.
patent: 4914590 (1990-04-01), Loatman et al.
patent: 4964044 (1990-10-01), Kumano et al.
patent: 4991094 (1991-02-01), Fagan et al.
patent: 5111398 (1992-05-01), Nunberg et al.
patent: 5224038 (1993-06-01), Bespalko
patent: 5229936 (1993-07-01), Decker et al.
patent: 5243520 (1993-09-01), Jacobs et al.
patent: 5251129 (1993-10-01), Jacobs et al.
patent: 5278980 (1994-01-01), Pedersen et al.
patent: 5282265 (1994-01-01), Rohra Suda et al.
patent: 5331556 (1994-07-01), Black, Jr. et al.
patent: 5383120 (1995-01-01), Zernik
patent: 5423032 (1995-06-01), Byrd et al.
patent: 5475588 (1995-12-01), Schabes et al.
patent: 5523946 (1996-06-01), Kaplan et al.
patent: 5642522 (1997-06-01), Zaenen et al.
patent: 5680628 (1997-10-01), Carus et al.
patent: 5704060 (1997-12-01), Del Monte
patent: 5708825 (1998-01-01), Sotomayor
Brill, Eric, "A Simple Rule-Based Part of Speech Tagger", Third Conf. Applied Natural Lang. Processing, Proceedings of the Conference (1992).
Frakes, W. and Baeza-Yates, R. (eds), Information Retrieval Data Structures and Algorithms, PTR Prentice-Hall, Inc.,ch. 7, 102-130 (1992).
Frakes, W. and Baeza-Yates, R. (eds), Information Retrieval Data structures and Algorithms, PTR Prentice-Hall, Inc., ch. 8, 131-151 (1992).
Schwarz, C., "Automatic Syntactic Analysis of Free Text", J. Am. Soc. Info. Sci. 41(6):408-417 (1990).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for improved tokenization of natural langua does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for improved tokenization of natural langua, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for improved tokenization of natural langua will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-1225168

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.