System and method for piecemeal relevance evaluation

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06598045

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to information retrieval. In particular, the present invention relates to evaluating the relevance of document transmissions that potentially consist of a variety of topics.
The primary purpose of the invention is to help people deal with information overload. With the increasing development of communications technology, it is possible for people to feel the opposing forces of being, on the one hand, highly dependent on critical information, and on the other hand, overloaded with information to the extent that there is a need to reduce exposure to the flood of information. As result of this conflict, people may find themselves in the position of needing to examine quickly large numbers of documents, with a significant penalty for missing critical information contained in those documents.
Various established tools exist for measuring the importance of documents to an individual. This technology, often referred to as relevance technology, allows a computer to make judgments about the importance to an individual of news articles, technical articles, mail messages or the like. This technology has proven useful for categorization and prioritization of presentation, both of which are necessary to help a user deal with a flood of information. But because of the inherent uncertainty of the relevance measure, the user who needs information prioritized still must spend time perusing many documents. Documents that are rated as highly relevant must be perused to see what, if any, useful information they contain. Documents that are rated as mildly relevant or less must be perused to make sure that nothing important is missed. Thus, nearly every article needs to be examined in some depth.
Existing relevance technology assumes that documents are homogeneous in content and relevance, and so a single relevance value is calculated for an entire document. This is because the technology was developed initially for relatively short documents such as wire-service items. As documents become longer and more varied in content, a single relevance value may be affected by separate sections of the document that contain references to unrelated topics, including some that are highly relevant and others of little or no interest to the user. This variability in content means that a single relevance number may result in either false-negative or false-positive evaluations. The only safe strategy for a reader of larger documents is to read most of the document, regardless of relevance evaluations.
Currently, either an entire document or a selected sub-set is evaluated for relevance. This can have the effect of causing the relevance to be misjudged. This misjudging of relevance can take various forms. For example, the relevance evaluation can be diluted if two unrelated sections of the document are evaluated together. This is because one section may be highly relevant while the other section contains material that results in a negative evaluation. In general, the user would want to be apprised of the relevant material, even when surrounded by irrelevant material. An example of this is the “What's News” section in the Wall Street Journal. This article typically contains several unrelated items that should, logically, be evaluated separately. For example, the first paragraph of the “What's News” section might focus on the topic “Endangered Species,” while the second paragraph of the “What's News” section might focus on the topic “Gulf War Syndrome,” and subsequent paragraphs might focus on topics entirely unrelated to any others. Therefore, while one might find the “Endangered Species” discussion highly relevant to one's needs, the entire document might not receive a high relevance value due to dilution from other topics.
In many cases, rather than evaluating the entire document, known relevance algorithms may evaluate only the first paragraph of each document. This is justified by the general understanding that news material is usually written in a particular style that insures that the relevant material is near the beginning of the document. Of course, this is not true of articles like “What's News.” Therefore, again using “What's News” as an example, if only the first paragraph of a document is examined for relevance to “Endangered Species,” then the document discussed above would receive a maximal relevance value even though only one paragraph discusses “Endangered Species.” If only the first paragraph of the article discussed above is examined for relevance to “Gulf War Syndrome,” the document will receive a minimal relevance value even though the article does, in fact, discuss this topic.
SUMMARY OF THE INVENTION
The present invention introduces systems and methods for evaluating the relevance of transmitted data. In one embodiment of the present invention, a topic and a document are received, and the document is divided into various pieces. The relevance of each piece is evaluated with respect to the received topic, and these individual evaluations are combined into a surrogate representation of the relevance.


REFERENCES:
patent: 4247906 (1981-01-01), Corwin et al.
patent: 5638543 (1997-06-01), Pedersen et al.
patent: 5642502 (1997-06-01), Driscoll
patent: 5724571 (1998-03-01), Woods
patent: 5778397 (1998-07-01), Kupiec et al.
patent: 5794178 (1998-08-01), Caid et al.
patent: 5799304 (1998-08-01), Miller
patent: 5870740 (1999-02-01), Rose et al.
patent: 5892842 (1999-04-01), Bloomberg
patent: 5907840 (1999-05-01), Evans
patent: 5963940 (1999-10-01), Liddy et al.
patent: 6026388 (2000-02-01), Liddy et al.
patent: 6182066 (2001-01-01), Marques
patent: 6185592 (2001-02-01), Boguraev et al.
patent: 6339437 (2002-01-01), Nielsen
patent: 6389436 (2002-05-01), Chakrabarti et al.
patent: 791883 (1996-02-01), None
Kwok, K. L., Experiments with a Component Theory of Probablistic Information Retrieval Based on Single Terms as Document Components, ACM Transaction On Information Systems, pp. 363-385. Oct. 1990.*
John J. Light, Inventor, Patent Application forMethod for Characterizing a Document Set Using Evaluation Surrogates, File No. 042390.P3825, SM 132697.
John J. Light, Inventor, Patent Application forMethod for Recognizing Compound Terms in a Document, File No. 042390.P3824, SM132698.
John J. Light, Inventor, Patent Application forMethod for Measuring Thresholded Relevance of a Document, File No. 042390.P3826, SM 132699.
Marti A. Hearst,TileBars: Visualization of Term Distribution Information in Full Text Information Access, CHI'95 Mosaic of Creativity, May 7-11, 1995.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for piecemeal relevance evaluation does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for piecemeal relevance evaluation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for piecemeal relevance evaluation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3092068

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.