Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-06-04
2001-11-06
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06314419
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed toward the field of information retrieval systems, and more particularly towards generating feedback to a user of an information retrieval system to facilitate the user in re-formulating the query.
2. Art Background
An information retrieval system attempts to match a user query (i.e., the user's statement of information needs) to locate information available to the system. In general, the effectiveness of information retrieval systems may be evaluated in terms of many different criteria including execution efficiency, storage efficiency, retrieval effectiveness, etc. Retrieval effectiveness is typically based on document relevance judgments. These relevance judgments are problematic since they are subjective and unreliable. For example, different judgement criteria assigns different relevance values to information retrieved in response to a given query.
There are many ways to measure retrieval effectiveness in information retrieval systems. The most common measures used are “recall” and “precision.” Recall is defined as the ratio of relevant documents retrieved for a given query over the number of relevant documents for that query available in the repository of information. Precision is defined as the ratio of the number of relevant documents retrieved over the total number of documents retrieved. Both recall and precision are measured with values ranging between zero and one. An ideal information retrieval system has both recall and precision values equal to one.
One method of evaluating the effectiveness of information retrieval systems involves the use of recall-precision graphs. A recall-precision graph shows that recall and precision are inversely related. Thus, when precision goes up recall typically goes down and vice-versa. Although the goal of information retrieval systems is to maximize precision and recall, most existing information retrieval systems offer a trade-off between these two goals. For certain users, high recall is critical. These users seldom have means to retrieve more relevant information easily. Typically, as a first choice, a user seeking high recall may expand their search by broadening a narrow boolean query or by looking further down a ranked list of retrieved documents. However, this technique typically results in wasted effort because a broad boolean search retrieves too many unrelated documents, and the tail of a ranked list of documents contains documents least likely to be relevant to the query.
Another method to increase recall is for users to modify the original query. However, this process results in a random operation because a user typically has made his/her best effort at the statement of the problem in the original query, and thus is uncertain as to what modifications may be useful to obtain a better result.
For a user seeking high precision and recall, the query process is typically a random iterative process. A user starts the process by issuing the initial query. If the number of documents associated with the information retrieval system is large (e.g., a few thousand), then the hit-list due to the initial query does not represent the exact information the user intended to obtain. Thus, it is not just the non-ideal behavior of information retrieval systems responsible for the poor initial hit-lists, but the user also contributes to degradation of the system by introducing error. User error manifests itself in several ways. One way user error manifests itself is when the user does not know exactly what he/she is looking for, or the user has some idea what he/she is looking for but doesn't have all the information to specify a precise query. An example of this type of error is one who is looking for information on a particular brand of computer but does not remember the brand name. For this example, the user may start by querying for “computers.”
A second way user error manifests itself is when the user is looking for some information generally interesting to the user but can only relate this interest via a high level concept. A world wide web surfer is an example of such a user. For example, the user may wish to conduct research on recent issues related to “Middle East”, but does not know the recent issues to search. For this example, if a user simply does a search on “Middle East”, then some documents relevant to the user, which deal with current issues in the “petroleum industry”, will not be retrieved. The query feedback of the present invention guides users to formulate the correct query in the least number of query iterations as possible.
Another problem in obtaining high recall and precision is that users often input queries that contain terms that do not match the terms used to index the majority of the relevant documents and almost always some of the unretrieved relevant documents (i.e., the unretrieved relevant documents are indexed by a different set of terms than those used in the input query). This problem has long been recognized as a major difficulty in information retrieval systems. See Lancaster, F. W. 1969. “MEDLARS: Reports on the Evaluation of its Operating Efficiency.” American documentation, 20(1), 8-36. As is explained fully below, the query feedback of the present invention solves the problem of matching user input queries to identify the relevant documents by providing feedback of relevant terms that may be used to reformulate the input query.
SUMMARY OF THE INVENTION
An information retrieval system generates query feedback terminology based on an input query and a corpus of documents. Specifically, a set of query feedback terms are identified through a plurality of documents for potential use as query feedback in the information retrieval system. To process a query, which includes at least one query term, co-occurrence signatures for the query feedback terms of a set are generated. The co-occurrence signatures comprise a plurality of entries, such that each entry depicts a co-occurrence distance between two query feedback terms of the set as they appear in the corpus of documents. Thus, the co-occurrence signatures depict patterns of semantic distance and conceptual proximity among the different query feedback terms. The signatures are processed, via an updated singular value decomposition technique, to reduce the number of the entries in the signature while preserving the semantic distance and conceptual proximity characteristic among the signatures. Query feedback terms with co-occurrence signatures are selected that compare with co-occurrence signatures of the query term in a predetermined manner, and thereafter displayed as query feedback terms as at least a partial response to the user query.
REFERENCES:
patent: 5542090 (1996-07-01), Henderson et al.
patent: 5794237 (1998-08-01), Gore, Jr.
patent: 5974412 (1999-10-01), Hazlehurst et al.
patent: 6029195 (2000-02-01), Herz
patent: 6078917 (2000-06-01), Paulsen, Jr. et al.
patent: 6175829 (2001-01-01), Li et al.
Claroff et al., “ImageRover: A Content-Based Image Browser for the World Wide Web”, IEEE 1997, pp. 2-9.
Amsbury Wayne
Oracle Corporation
Pardo Thuy
Stattler Johansen & Adell, LLP
LandOfFree
Methods and apparatus for generating query feedback based on... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for generating query feedback based on..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for generating query feedback based on... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2595806