Data processing: structural design – modeling – simulation – and em – Simulating electronic device or electrical system – Software program
Reexamination Certificate
1999-04-28
2003-08-19
Jones, Hugh (Department: 2123)
Data processing: structural design, modeling, simulation, and em
Simulating electronic device or electrical system
Software program
C703S002000, C707S793000, C707S793000, C704S001000, C704S009000
Reexamination Certificate
active
06609087
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to data processing systems and, more particularly, to an improved fact recognition system.
BACKGROUND OF THE INVENTION
Conventional fact recognition systems recognize facts contained in input data and populate a data store, like a database, with the recognized facts. As used herein, the term “fact” refers to a relationship between entities, such as people, places, or things. For example, upon receiving the input data “John Smith is the president of XYZ Corp.,” a fact recognition system identifies the fact that the president of XYZ Corp. is John Smith and stores this fact into a database. Thus, fact recognition systems automatically extract facts from input data so a user does not have to read the input data.
To recognize facts, conventional systems utilize rules. An example of one such rule follows:
<person-name> a|the <job-name> of <company-name>
This rule is used to extract the fact that a person holds a particular job at a particular company. These rules are created by knowledge engineers, experts in the field of fact recognition. The knowledge engineers generate a large number of rules, and the system then applies these rules to a stream of input data to recognize the facts contained therein. If any part of the input stream matches a rule, the system extracts the fact and stores it into the database. Although conventional systems provide beneficial functionality by storing facts retrieved from input data, these systems suffer from a number of drawbacks because (1) very few knowledge engineers exist who can create the rules, (2) the development of the systems takes a long time as rule creation is a very tedious and time-consuming task, and (3) the systems are not very accurate in recognizing facts. It is therefore desirable to improve fact recognition systems.
DISCLOSURE OF THE INVENTION
In accordance with methods and systems consistent with the present invention, an improved fact recognition system is provided that automatically learns from syntactic language examples and semantic language examples, thus facilitating development of the system. The language examples are rather simplistic and can be provided by a lay person with little training, thus relieving the need for knowledge engineers. Furthermore, the learning performed by the improved fact recognition system results in a collection of probabilities that is used by the system to recognize facts in a typically more accurate manner than conventional systems.
In accordance with methods consistent with the present invention, a method is provided in a data processing system. This method receives syntactic language examples and receives semantic language examples. Furthermore, this method creates a model from both the syntactic language examples and the semantic language examples and uses the model to determine the meaning of a sequence of words.
In accordance with methods consistent with the present invention, a method is provided in a data processing system. This method receives a collection of probabilities that facilitate fact recognition, receives an input sequence of words reflecting a fact, and identifies the fact reflected by the input sequence of words using the collection of probabilities.
In accordance with systems consistent with the present invention, a computer-readable memory device encoded with a data structure is provided. This data structure contains a collection of probabilities for use in recognizing facts in input data.
In accordance with systems consistent with the present invention, a data processing system is provided that comprises a memory and a processor. The memory includes a statistical model with probabilities reflecting likely syntactic structure for sequences of one or more words and likely semantic information for the sequences. The memory also includes a training program for generating the statistical model and a search program for receiving a sentence reflecting a fact and for using the statistical model to recognize the fact. The processor runs the training program and the search program.
REFERENCES:
patent: 5406480 (1995-04-01), Kanno
patent: 5418717 (1995-05-01), Su et al.
patent: 5752052 (1998-05-01), Richardson et al.
patent: 5839106 (1998-11-01), Bellegarda
patent: 5841895 (1998-11-01), Huffman
patent: 5926784 (1999-07-01), Richardson et al.
patent: 6006221 (1999-12-01), Liddy et al.
patent: 6029195 (2000-02-01), Herz
patent: 6112168 (2000-08-01), Corston et al.
patent: 6167369 (2000-12-01), Schultz
patent: 6243669 (2001-06-01), Horiguchi et al.
patent: 6278967 (2001-08-01), Akers et al.
patent: 6278968 (2001-08-01), Franz et al.
patent: 6304870 (2001-10-01), Kushmerick et al.
“Empirical Methods in Information Extraction” Claire Cardie, American Association of Artificial Intelligence (AAAI), Winter 1997, 0738-4602-1997.*
“Information Extraction” Jim Cowie, Communications of the ACM, Jan. 1996/vol. 39, No. 1.*
“A system for Discovering Relationships by Feature Extraction from Text Databases” Jack G. Conrad, Springer-Verloag NY,NY pp. 260-270, 1994.*
W. Lehnert et al., UMass/Hughes: Description of the Circus System Used for MUC-5, Proc. Of Fifth Message Understanding Conference (MUC-50) 1993, pp. 277-291.
Michael Collins, Three Generative, Lexicalised Models for Statistical Parsing, Proc. Of the 35thAnnual Meeting of the Association for Computational Linguistics, 1997, pp. 16-23.
Eric Brill, Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach, Proc. Of the 31stAnnual Meeting of the Association for Computational Linguistics, 1993, pp. 259-265.
Scott W. Bennett et al., Learning to Tag Multilingual Texts Through Observation, Proc. Of the Second Conf. On Empirical Methods in Natural Language Processing, 1997, pp. 109-115.
Chinatsu Aone et al., SRA: Description of the IE2System Used for MUC-7, available at http://www.muc.saic.com/proceedings/muc_7_toc.html.
Roman Yangarber et al., NYU: Description of the Proteus/PET System as Used for MUC-7 ST, available at http://www.muc.saic.com/proceedings/muc_7_toc.html.
K. Humphreys et al., University of Sheffield: Description of the LaSIE-II System as Used for MUC-7, available at http://www.muc.saic.com/proceedings/muc_7_toc.html.
Terry Patten et al., Description of the TASC System Used for MUC-7, available at http://www.muc.saic.com/proceedings/muc_7_toc.html.
Mitchell P. Marcus et al., Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, vol. 19, No. 2, pp. 313-330 (1993).
Daniel H. Younger, Recognition and Parsing of Context-Free Languages in Time n3, Information and Control, 10, 189-208 (1967).
Fox Heidi Jennifer
Miller Scott
Ramshaw Lance Arthur
Weischedel Ralph Mark
Ferris Fred
Genuity Inc.
Jones Hugh
Suchyta Leonard Charles
Weixel James K.
LandOfFree
Fact recognition system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Fact recognition system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fact recognition system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3128484