Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-07-31
2002-06-11
Popovici, Dov (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06405188
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to data processing systems and, more particularly, to an improved information retrieval system.
BACKGROUND OF THE INVENTION
Information retrieval (IR) systems have been developed that allow users to identify particular documents of interest from among a larger number of documents. IR systems are useful for finding an article in la digital library, a news story in a broadcast repository, or a particular web site on the worldwide web. To use such systems, the user specifies a query containing several words or phrases specifying areas of interest, and the system then retrieves documents it determines may satisfy the query.
Conventional IR systems use an ad hoc approach for performing information retrieval. Ad hoc approaches match queries to documents by identifying documents that contain the same words as those in the query. In one conventional IR system, an ad hoc weight is assigned to each matching word, the weight being computed from an ad hoc function of the number of times the word occurs in the document divided by the logarithm of the number of different documents in which the word appears. This ad hoc function was derived through an empirical process of attempting retrievals using the system and then modifying the weight computation to improve performance. Because conventional information retrieval systems use an ad hoc approach, accuracy suffers.
SUMMARY OF THE INVENTION
Methods and systems consistent with the present invention provide an improved IR system that performs information retrieval by using probabilities. When performing information retrieval, the improved IR system utilizes both the prior probability that a document is relevant independent of the query as well as the probability that the query was generated by (would be used to retrieve) a particular document given that the particular document is relevant. By using these probabilities, the improved IR system retrieves documents in a more accurate manner than conventional systems which are based on an ad hoc approach.
In accordance with methods consistent with the present invention, a method in a data processing system having information items is provided. This method receives a query containing a query word from a user, determines a likelihood that at least one of the information items is relevant given the query word, and provides an indication that the at least one information item is likely relevant to the query word.
In accordance with systems consistent with the present invention, a data processing system is provided containing a secondary storage device with documents, a memory with a query engine, and a processor configured to run the query engine. The query engine is configured to receive a query with query words indicating a relevant one of the documents and configured to utilize a formula to determine which among the documents is the relevant document. The formula is based on a model for how the query words were generated to express a need for the relevant document.
REFERENCES:
patent: 5488725 (1996-01-01), Turtle et al.
patent: 5594897 (1997-01-01), Goffman
patent: 5696964 (1997-12-01), Cox et al.
patent: 5822731 (1998-10-01), Schultz
patent: 5905980 (1999-05-01), Masuichi et al.
patent: 5930803 (1999-07-01), Becker et al.
patent: 5950189 (1999-09-01), Cohen et al.
patent: 6192360 (2001-02-01), Dumais et al.
patent: 6301571 (2001-10-01), Tatsuoka
Jiang et al., Sequential Bayesian Learning of CDHMM based on Finite Mixture Approximation of Its Prior/Posterior Density, 1997, IEEE, pp. 373-380.*
Pullen et al., “A New Approach to GPS Integrity Monitoring Using Prior Probability Models and Optimal Threshold Search”, 1994, IEEE, pp. 739-746.
Leek Timothy R.
Miller David R. H.
Schwartz Richard M.
Genuity Inc.
Leonard Charles Suchyta
Pardo Thuy
Popovici Dov
Weixal James K.
LandOfFree
Information retrieval system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information retrieval system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information retrieval system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2931804