Natural language processing system for semantic vector represent

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

707 1, 707 3, 707101, 707532, G06F 1730, G06F 1720, G06F 1722

Patent

active

058730567

ABSTRACT:
A natural language processing system uses unformatted naturally occurring text and generates a subject vector representation of the text, which may be an entire document or a part thereof such as its title, a paragraph, clause, or a sentence therein. The subject codes which are used are obtained from a lexical database and the subject code(s) for each word in the text is looked up and assigned from the database. The database may be a dictionary or other word resource which has a semantic classification scheme as designators of subject domains. Various meanings or senses of a word may have assigned thereto multiple, different subject codes and psycholinguistically justified sense meaning disambiguation is used to select the most appropriate subject field code. Preferably, an ordered set of sentence level heuristics is used which is based on the statistical probability or likelihood of one of the plurality of codes being the most appropriate one of the plurality. The subject codes produce a weighted, fixed-length vector (regardless of the length of the document) which represents the semantic content thereof and may be used for various purposes such as information retrieval, categorization of texts, machine translation, document detection, question answering, and generally for extracting knowledge from the document. The system has particular utility in classifying documents by their general subject matter and retrieving documents relevant to a query.

REFERENCES:
patent: 4358824 (1982-11-01), Glickman et al.
patent: 4495566 (1985-01-01), Dickinson et al.
patent: 4580218 (1986-04-01), Raye
patent: 4803642 (1989-02-01), Muranaga
patent: 4823306 (1989-04-01), Barbic et al.
patent: 4839853 (1989-06-01), Deerwester et al.
patent: 4849898 (1989-07-01), Adi
patent: 4868733 (1989-09-01), Fujisawa et al.
patent: 4972349 (1990-11-01), Kleinberger
patent: 4994967 (1991-02-01), Asakawa
patent: 5020019 (1991-05-01), Ogawa
patent: 5056021 (1991-10-01), Ausborn
patent: 5099426 (1992-03-01), Carlgren et al.
patent: 5122951 (1992-06-01), Kamiya
patent: 5128865 (1992-07-01), Sadler
patent: 5140692 (1992-08-01), Morita
patent: 5146405 (1992-09-01), Church
patent: 5151857 (1992-09-01), Matsui
patent: 5162992 (1992-11-01), Williams
patent: 5168565 (1992-12-01), Morita
patent: 5197005 (1993-03-01), Shwartz et al.
patent: 5237503 (1993-08-01), Bedecarrax et al.
patent: 5285386 (1994-02-01), Kuo
patent: 5297039 (1994-03-01), Kanaegami et al.
patent: 5325298 (1994-06-01), Gallant
patent: 5331556 (1994-07-01), Black, Jr. et al.
patent: 5371807 (1994-12-01), Register et al.
patent: 5418951 (1995-05-01), Damashek
patent: 5541836 (1996-07-01), Church et al.
patent: 5619709 (1997-04-01), Caid et al.
patent: 5675819 (1997-10-01), Schueteze
patent: 5694592 (1997-12-01), Driscoll
Meteer et al, "POST: Using Probabilities in Language Processing," Proc. 12th Intl. Conf. on A.I. vol. 12, Aug. 1991, pp. 960-964.
Liddy et al, Proc. Workshop on Natural Language Learning, IJCAI, Sydney, Australia, 1991, pp. 50-57, entitled "An Intelligent Seimantic Relation Assignor: Preliminary Work.".
Stephen I. Gallant, "A Practical Approach for Representing Context and for Performing Word Sense Disambiguation Using Neural Network," Neural Computation 3, pp. 293-309, Massachusetts Institute of Technology, 1991.
Yorick Wilks et al., "Providing Machine Tractable Dictionary Tools," Machine Translation, pp. 98-154, Jun. 1990.
Gerard Salton et al., Introduction to Modern Information Retreival, Mc-Graw-Hill Book Company, pp. 118-155, Apr. 1983.
Ellen M. Voorhees et al., "Vector Expansion in a Large Collection," Siemans Coporate Research, Inc., Princeton, New Jersey, Unknown.
Scott Deerwester et al., "Indexing by Latent Semantic Analysis," Journal of the American Society for Information Science, 41(6), pp. 391-407, 1990.
Hinrich Schutze, "Dimensions of Meaning," Proceedings Supercomputer '92, IEEE, pp. 787-796, Nov. 1992.
Gregory Grefenstette, "Use of Syntactic Context to Produce Term Association Lists for Text Retrieval," 18th Ann Int'l SIGIR '92, ACM, pp. 89-97, Jun. 1992.
Susan T. Dumais, "LSI meets TREC: A Status Report," NIST Special Publication 500-207, The First Text REtrieval Conference (TREC-1), pp. 137-152, Mar. 1993.
Elizabeth D. Liddy et al., "Statistically Guided Word Sense Disambiguation," Proceedings of the AAAI Fall 1992 Symposium on Probalistic Approach to Natural Language Processing, pp. 98-107, Oct. 1992.
Elizabeth D. Liddy et al., "Use of Subject Field Codes from a Machine-Readable Dictionary for Automatic Classification of Documents," Proceedings of the 3rd ASIS SIG/CR Classification Research Workshop, Pittsburgh, PA, pp. 83-100, Oct. 1992.
Elizabeth D. Liddy et al., "DR-Link's Linguistic Conceptual Approach to Document Detection," Proceedings of TExt Retrieval Conference (TREC), 13 pages, Nov. 1992.
Elizabeth D. Liddy et al., "An Overview of DR-Link and its Approach to Document Filtering," Proceedings of the Human Language and Technology Workshop, 5 pages, Mar. 1993.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Natural language processing system for semantic vector represent does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Natural language processing system for semantic vector represent, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Natural language processing system for semantic vector represent will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2071899

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.