Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-06-02
2003-10-28
Metjahic, Safet (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06640218
ABSTRACT:
TECHNICAL FIELD
The invention relates to estimating the usefulness of an item in a collection of information.
BACKGROUND
One context in which selection of items from a collection of information (e.g., a database) is useful is a “search engine.” A typical search engine takes an alphanumeric query from a user (a “search string”) and returns to the user a list of one or more items from the database that satisfy some or all of the criteria specified in the query.
Although search engines have been in use for many years, for example in connection with the Westlaw® or Lexis® & legal databases, their use has risen dramatically with the development of the World Wide Web. Because the World Wide Web comprises a very large number of discrete items, which come from heterogeneous sources, and which are not necessarily known in advance to the user, search engines that can identify relevant Web-based information resources in response to a user query have become important tools for doing Web-based research.
With tens or hundreds of millions of individual items potentially accessible over the Web, it is not unusual for a single query to a search engine to result in the return of hundreds or thousands of items of varying quality from which the user must manually select those that may be truly useful. This manual evaluation can be a time consuming and frustrating process.
One approach to managing the large number of potentially relevant items returned by a search engine is for the engine to rank the items for relevance before displaying them. Specifically, the items may be ranked according to some relevance metric reflecting how well the intrinsic features of a particular item (e.g., its textual content, its location, the language in which it is written, the date of its creation, etc.) match the search criteria for the particular search. A number of relevance metrics are described, e.g., in Manning and Schuitze, “Foundations of Statistical Natural Language Processing”, MIT Press, Cambridge, Mass. (1999) pp. 529-574 and U.S. Pat. No. 6,012,053.
Ranking items based on a measure of relevance to a search query is, however, often an imperfect measure of the actual relative usefulness of those items to users. In particular, a relevance metric may not take into account certain factors that go into a user's ultimate evaluation of the usefulness of a particular item: e.g., how well the item is written or designed, the reliability or authority of the source of the information in the item, or the user's prior familiarity with the item. Thus, a search engine presented with a query for “History of the United States” might consider an encyclopedia article by a well-known historian and a term paper written by a high school student to be of equal relevance, even though the former is far more likely to be useful to most users than the latter.
Ranking items by relevance is also susceptible to “spoofing.” “Spoofing” refers to attempting to artificially improve the apparent relevance of a particular item with respect to particular search criteria by altering the item in a misleading way. For example, it is common for search engines to evaluate the relevance of a Web page based on the contents of meta-tags. Meta-tags are lists of keywords that are included in the HTML source of a Web page but which are not normally displayed to users by Web browsers. Web site operators who wish to increase the number of visits to their Web sites commonly attempt to spoof search engines by creating meta-tags that contain either keywords that are not truly indicative of the displayed contents of the page (but which are likely to match a large number of queries), or by creating meta-tags that include multiple instances of arguably appropriate keywords (thus inflating the relative importance of that keyword to the Web page).
Some search techniques have attempted to incorporate information about subjective user preferences within a relevance metric. One such method entails modifying the relevance score of an individual item (with respect to a search term or phrase) according to how often the item is selected when displayed in response to a query containing the search term or phrase. However, this technique may provide unsatisfactory results under conditions of sparse data (i.e., where the individual items were selected by users in response to queries containing the search term or phrase a relatively small number of times).
SUMMARY
The present invention provides a system and method for estimating the usefulness of an item in a collection of information.
In general, in one aspect, a first measure of the usefulness of the item with respect to the first set of criteria is determined. A measure of the quality of the item is determined. A second measure of the usefulness of the item is determined based on the first measure of usefulness and the measure of quality.
Embodiments of the invention may have one or more of the following features.
A measure of the relevance of the item to the first set of criteria is determined. A selection rate of the item is predicted based on the measure of relevance.
Opportunities for user selection of the item are provided. The actual overall popularity of the item is determined. The overall popularity of the item is predicted. The measure of quality of the item is determined based on the actual popularity of the item and the predicted overall popularity of the item.
A plurality of sets of items containing the item is displayed. A choice of the item from at least one of the sets of displayed items is received from a user.
At least one set of items containing the item is displayed ranked in accordance to a relevance metric.
At least one set of items containing the item is displayed ranked in accordance to a measure of the usefulness of the respective items.
Users are provided with opportunities to present sets of criteria. Respective measures of the relevance of the item to respective sets of criteria presented by users are determined. Respective selection rates of the item are predicted based on the respective measures of relevance. The overall popularity of the item is predicted based on the respective predicted selection rates.
Respective measures of the popularity of the respective sets of criteria among users are determined. The overall popularity of the item is predicted based on the respective predicted selection rates and the respective measures of the popularity of the respective sets of criteria.
The rank of the item in a list of items relevant to the set of criteria and ranked according to a relevance metric is determined.
The number of times that the item was selected by a user during a pre-determined period of time is determined.
The collection of information comprises a catalog of information resources available on a public network.
The collection of information comprises a catalog of information available on the World Wide Web.
Data concerning selection of the item by users is collected. An anti-spoof criterion is applied to the data. The actual overall popularity of the item is decreased based on the results of applying the anti-spoof criterion to the data.
Respective first measures of the usefulness of respective other items in the collection of information with respect to the first set of criteria are determined. Respective measures of the quality of the respective other items are determined. Respective second measures of usefulness of the respective other items are determined based on the respective first measures of usefulness and the respective measures of quality. The item and the other items are displayed ranked according to the respective second measures of usefulness.
Items from a collection of information are displayed ranked according to a relevance metric that is different from the second measures of usefulness.
The items displayed according to the relevance metric are from a different collection of information than the items displayed according to respective second measures of usefulness.
The first set of criteria is based on a search criterion received from a user.
The first
Beeferman Douglas H.
Golding Andrew R.
Fish & Richardson P.C.
Lycos, Inc.
Metjahic Safet
Nguyen Cam-Linh
LandOfFree
Estimating the usefulness of an item in a collection of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Estimating the usefulness of an item in a collection of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Estimating the usefulness of an item in a collection of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3120775