Method and apparatus for score normalization for information...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06651057

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to the field of information retrieval. More particularly, the invention relates to an apparatus and method for score normalization for information retrieval applications.
BACKGROUND OF THE INVENTION
Information retrieval (IR) systems have been developed that allow users to identify particular documents of interest from among a larger number of documents. IR systems are useful for finding an article in a digital library, a news document in a broadcast repository, or a particular web site on the worldwide web. To use such systems, the user specifies a query containing several words or phrases specifying areas of interest, and the system then retrieves documents it determines may satisfy the query.
An IR system typically ranks documents with some measure (e.g., score) by the likelihood of relevance to a query. The ranking order is useful in determining whether one document is more relevant than another. Most applications, however, have the selection of relevant documents as their final goals. A ranking order by itself does not provide an indication of whether a document is actually relevant to the query. A large number of documents that are low on the ranking order invariably are provided as a result of the query, despite the fact that these documents probably are not very relevant.
In order to make a decision on the selection of documents that are relevant to the query, a threshold on the scores may be utilized. Scores above the threshold are designated as relevant, and scores below the threshold are designated as not relevant. Previous systems generally use an ad-hoc approach to picking the threshold, such as looking at the top few documents in the ranking order and then setting an arbitrary score to be the threshold.
This method of choosing thresholds, however, makes it difficult to come up with a consistent decision threshold across queries, because the scores assigned documents for one query do not generally relate to the scores assigned documents for a different query. This results in a degradation of system performance for the task. The alternative is to set the threshold for each query, but this is impracticable. Accordingly, there is presently a need for a system that normalizes scores so that a decision threshold is consistent across different queries.
SUMMARY OF THE INVENTION
A method consistent with the present invention normalizes a score associated with a document. Statistics relating to scores assigned to a set of training documents not relevant to a topic are determined. Scores represent a measure of relevance to the topic. After the various statistics have been collected, a score assigned to a testing document is normalized based on those statistics. The normalized score is then compared to a threshold score. Subsequently, the testing document is designated as relevant or not relevant to the topic based on the comparison.
Another method consistent with the present invention normalizes a score associated with a document. A query that includes a topic is received. Next, statistics relating to scores assigned to a set of training documents not relevant to a topic are determined. Scores represent a measure of relevance to the topic. After the various statistics have been collected, a score assigned to a testing document is normalized based on those statistics.
Another method consistent with the present invention searches for documents relevant to a topic. A query including a topic is sent to a processor. The processor determines statistics relating to scores assigned to a set of training documents not relevant to a topic, normalizes a to score assigned to a testing document based on the statistics, and designates the testing document as relevant or not relevant to the topic based on the normalized score. Results are then received from the processor indicating a document relevant to the topic.
An apparatus consistent with the present invention normalizes a score associated with a document. The apparatus includes a memory having program instructions and a processor responsive to the program instructions. The processor determines statistics relating to scores assigned to a set of training documents not relevant to a topic, the scores representing a measure of relevance to the topic; normalizes a score assigned to a testing document based on the statistics; compares the normalized score to a threshold score; and designates the testing document as relevant or not relevant to the topic based on the comparison.


REFERENCES:
patent: 6289353 (2001-09-01), Hazlehurst et al.
patent: 6345252 (2002-02-01), Beigi et al.
Yiming Yang, Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval, Aug. 1994, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in informatio retrieval.*
J. Allan, et al. “Topic Detection and Tracking Pilot Study Final Report.”Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Feb., 1998.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for score normalization for information... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for score normalization for information..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for score normalization for information... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3168326

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.