Method and system for performing information extraction and...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method and system for performing information extraction and... Method and system for performing information extraction and...

: 2001-11-09
: 2004-05-25
: Homere, Jean R. (Department: 2177)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: C707S793000, C707S793000, C707S793000
: Reexamination Certificate
: active
: 06741986
: ABSTRACT:

COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the xerographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the U.S. Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
The present invention relates to the field of information extraction and storage and more specifically to techniques for managing a distributed information acquisition and information storage process.
There has been and will continue to be an explosion in the volume and complexity of information available to information consumers. However, due to the magnitude of disparate information available in the public domain, information consumers are typically able to access, comprehend, and meaningfully use only a very small percentage of the available information. This is primarily because the information is typically buried in articles which may be contained in magazines, journals, papers, newspapers, books, notebooks, etc. or is stored in digital format in information stores such as databases, digital libraries, etc. Unless otherwise stated, the term “article” as used in this application should be construed to include any transcribed or printed information, or information available in digital format, or combinations or portions thereof. The information in an article may include text, graphics, charts, audio information, video information, multimedia information, and other types of information in various formats. An article may be published or unpublished. Since these articles could number in the hundreds and thousands, they cannot all be accessed, read, and understood by an information consumer in a practical timeframe. While several data warehousing techniques have been used to integrate information from various articles, these techniques are not flexible enough to keep up with the proliferation of available information. They also rarely help with the information overload problem. In fact, by aggregating data, these data warehousing techniques often make the information overload problem worse.
One field that has seen a tremendous explosion of information in the past decade is the life sciences field which has benefited from the exponential growth in the identification and functional characterization of genes in the biological sciences. A decade ago a laboratory notebook was often sufficient for “data warehousing.” A researcher could rely on his or her deep understanding of a handful of genes to make informed decisions regarding his or her research. Today, the influx of information and the blurring of traditional biological research boundaries have outstripped the ability of a researcher to fully assimilate, synthesize, and evaluate research data. The primary impediment for a researcher is not the lack of information; rather it is the large quantity and unstructured format used to store the information. To evaluate results of large-scale experiments, researchers rely heavily on published research literature to identify the key information that is critical for them to make informed decisions. The vast number of articles, the unstructured format of the information, and the inability of the researchers to query on specific experimental results dictates that the review of the literature may take several days, weeks, or even more of a researcher's time. In addition to being very time intensive, the accumulation of knowledge by the researcher is not easily transferable to other researchers because it is not in an easily accessible format.
Based on the above, there is a need for techniques which can extract information from the various sources and store it in a format which can be easily accessed or queried by an information consumer. It is also desirable that the techniques be flexible enough to keep pace with the proliferation of information. Further, it is also desirable that the techniques be adaptable to extract and store information related to various domains and fields.
SUMMARY OF THE INVENTION
The present invention discusses techniques for extracting information from a plurality of articles and for storing the extracted information in an information store. According to an embodiment, the present invention identifies a plurality of articles from which information is to be extracted. The present invention also identifies a plurality of information extractors for extracting information from the plurality of articles. A database is provided for storing information related to the plurality of articles and the plurality of information extractors. According to this embodiment, the present invention assigns the plurality of articles to the plurality of information extractors for information extraction. The present invention receives information extracted by an information extractor from an article assigned to the information extractor. The extracted information is then stored in the information store.
According to an embodiment of the present invention, the information store is a knowledge base which is configured to store the extracted information according to an ontology. In this embodiment, information may be extracted from articles using a fact-based model.
According to another embodiment, the present invention enables quality control processing to be performed on the information extracted by the information extractor before the extracted information is stored in the information store. According to this embodiment, the present invention enables a content reviewer to review the extracted information received from the information extractor. The present invention may receive information from the content reviewer identifying errors associated with the extracted information.
According to an embodiment, the present invention determines, from the information received from the content reviewer, an error count indicating number of errors in the extracted information received from the information extractor. If the error count is above a threshold error count level, the article may be reassigned to the information extractor for information extraction. If the error count is equal to or below the threshold error level, the present invention may provide services enabling the content reviewer to change the extracted information received from the information extractor to correct the errors.
According to another embodiment, the present invention calculates the compensation due to information extractors for extracting information from the articles. The compensation amount for an information extractor may be calculated based on several criteria such as the number of errors in the information extracted by the information extractor, a quality score assigned to the article, and other metrics information captured during quality control processing.
According to yet another embodiment, the information store is configured to store the extracted information according to an information model. In this embodiment, the present invention allows reviewers to review the extracted information and make changes, if any, to the information model to accommodate the extracted information. In this embodiment, the present invention may allow a reviewer to review the extracted information and new concepts introduced by the extracted information and to provide information identifying changes, if any, to be made to the information model. According to a specific embodiment, the information provided by the reviewer may then be reviewed by a second reviewer. After the second reviewer has approved of the changes, the information model may be changed. In a specific embodiment, the information store is a knowledge base which is configured to store the extracted information according to an ontology. The present invention provides services enabling ontologists to review new concepts and to make changes to the ontology to accommodate the new concepts. Other information models may also be used in conjuncti

Affiliated with

Chen Richard O.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Cho Raymond J.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Felciano Ramon M.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Norman Philippa

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Richards Daniel R.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Homere Jean R.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Ingenuity Systems, Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wilson Sonsini Goodrich & Rosati

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for performing information extraction and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for performing information extraction and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for performing information extraction and... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3258488

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure