Ranking of query feedback terms in an information retrieval...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C706S045000, C704S009000

Reexamination Certificate

active

06363378

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is directed toward the field of information retrieval systems, and more particularly towards ordering or ranking query feedback presented to a user.
2. Art Background
An information retrieval system attempts to match user queries (i.e., the users statement of information needs) to locate information available to the system. In general, the effectiveness of information retrieval systems may be evaluated in terms of many different criteria including execution efficiency, storage efficiency, retrieval effectiveness, etc. Retrieval effectiveness is typically based on document relevance judgments. These relevance judgments are problematic since they are subjective and unreliable. For example, different judgment criteria assigns different relevance values to information retrieved in response to a given query.
There are many ways to measure retrieval effectiveness in information retrieval systems. The most common measures used are “recall” and “precision.” Recall is defined as the ratio of relevant documents retrieved for a given query over the number of relevant documents for that query available in the repository of information. Precision is defined as the ratio of the number of relevant documents retrieved over the total number of documents retrieved. Both recall and precision are measured with values ranging between zero and one. An ideal information retrieval system has both recall and precision values equal to one.
One method of evaluating the effectiveness of information retrieval systems involves the use of recall-precision graphs. A recall-precision graph shows that recall and precision are inversely related. Thus, when precision goes up recall typically goes down and vice-versa. Although the goal of information retrieval systems is to maximize precision and recall, most existing information retrieval systems offer a trade-off between these two goals. For certain users, high recall is critical. These users seldom have means to retrieve more relevant information easily. Typically, as a first choice, a user seeking high recall may expand their search by broadening a narrow boolean query or by looking further down a ranked list of retrieved documents. However, this technique typically results in wasted effort because a broad boolean search retrieves too many unrelated documents, and the tail of a ranked list of documents contains documents least likely to be relevant to the query.
Another method to increase recall is for users to modify the original query. However, this process results in a random operation because a user typically has made his/her best effort at the statement of the problem in the original query, and thus is uncertain as to what modifications may be useful to obtain a better result.
For a user seeking high precision and recall, the query process is typically a random iterative process. A user starts the process by issuing the initial query. If the number of documents in the information retrieval system is large (e.g., a few thousand), the hit-list due to the initial query does not represent the exact information the user intended to obtain. Thus, it is not just the non-ideal behavior of information retrieval systems responsible for the poor initial hit-lists, but the user also contributes to degradation of the system by introducing error. User error manifests itself in several ways. One way user error manifests itself is when the user does not know exactly what he/she is looking for, or the user has some idea what he/she is looking for but doesn't have all the information to specify a precise query. An example of this type of error is one where the user is looking for information on a particular brand of computer but does not remember the brand name. For this example, the user may start by querying for “computers.” A second way user error manifests itself is when the user is looking for some information generally interesting to the user but can only relate this interest via a high level concept. An on-line world wide web surfer is an example of such a user. For example, the user may wish to conduct research on recent issues related to “Middle East”, but does not know the recent issues to search. For this example, if a user simply does a search on “Middle East”, then some documents relevant to the user, which deal with current issues in the “petroleum industry”, will not be retrieved.
Another problem in obtaining high recall and precision is that users often input queries that contain terms that do not match the terms used to index the majority of the relevant documents and almost always some of the unretrieved relevant documents (i.e., the unretrieved relevant documents are indexed by a different set of terms than those used in the input query). This problem has long been recognized as a major difficulty in information retrieval systems. See Lancaster, F. W. 1969. “MEDLARS: Reports on the Evaluation of its Operating Efficiency.” American documentation, 20(1), 8-36.
Prior art query feedback systems, used to supplement replaced terms in the original query, are an attempt to improve recall and/or precision in information retrieval systems. In these prior art systems, the feedback terms are often generated through statistical means (i.e., co-occurrence). Typically, in co-occurrence techniques, a set of documents in a database is examined to identify patterns that “co-occur.” For example, if in a particular document set, the term “x” is frequently found near the term “y,” then term “y” is provided as a feedback term for a query that contains term “x.” Thus, co-occurrence techniques identify those terms having a physical proximity in a document set. Unfortunately, physical proximity in the document set does not always indicate that the terms connote similar concepts (i.e., physical proximity is a poor indicator of conceptual proximity).
Once the feedback terms are identified through statistical means, the terms are displayed, on an output display, to help direct the user to reformulate a new query. Typically, the feedback terms are ranked (i.e., listed in an order) using the same measure of physical proximity originally used to identify the feedback terms. For example, if term “y” appears physically closer to term “x” than term “z,” then term “y” is ranked or listed before term “z.” Since physical proximity is often a poor indicator of conceptual proximity, this technique of ranking query feedback terms is poor. Therefore, it is desirable to implement a query feedback technique in an information retrieval system that does not utilize statistical or co-occurrence methods. In addition, to the extent that the statistical ranking methods generate a useful order, these methods are only suitable when statistical methods are used to identify query feedback terms. Accordingly, it is also desirable to utilize query feedback ranking techniques with a more general methodology applicable to all types of systems that generate query feedback.
SUMMARY OF THE INVENTION
An information retrieval system processes user input queries, and identifies query feedback, including ranking the query feedback, to facilitate the user in re-formatting a new query The information retrieval system includes a knowledge base that comprises a plurality of nodes, depicting terminological concepts, arranged to reflect conceptual proximity among the nodes. The information retrieval system processes the queries to identify a document hit list related to the query, and to generate query feedback terms. Each document includes a plurality of themes or topics that describes the overall thematic content of the document. The topics or themes are then mapped or linked to corresponding nodes of the knowledge base. At least one focal node is selected from the knowledge base, wherein a focal point node represents a concept, as defined by the relationships in the knowledge base, conceptually most representative of the topics or themes. The query feedback terms are also mapped or linked to nodes of the knowledge base. To identify a rank

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Ranking of query feedback terms in an information retrieval... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Ranking of query feedback terms in an information retrieval..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Ranking of query feedback terms in an information retrieval... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2879427

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.