Generalized term frequency scores in information retrieval...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06507839

ABSTRACT:

TECHNICAL FIELD
This invention relates to techniques for organizing material on computer networks for retrieval, and more particularly to methods of indexing material of interest to a user.
BACKGROUND OF THE INVENTION
Computer networks have become increasingly important for the storage and retrieval of documents and other material.
The Internet, of which the World Wide Web is a part, includes a series of interlinked computer networks and servers around the world. Users of one server or network connected to the Internet may send information to, or access information on, other networks or servers connected to the Internet by the use of various computer programs which allow such access, such as Web browsers. The information is sent to, or received from, a network or server in the form of packets of data.
The World Wide Web port ion of the Internet comprises a subset of interconnected Internet sites which may be characterized as including information in a format suitable for graphical display on a computer screen. Each site may include one or more separate pages. Pages, in turn, may include links to other pages within the site, or to pages in other Web sites, facilitating the user's rapid movement from one page or site to another.
In view of the quantity of information and material available on computer networks such as the Web, and for other reasons as well, automated or semi-automated techniques for retrieving information that is thought to be relevant to a user at a given time may be employed. These techniques may be utilized in response to a specific user request, as when a search query by a user seeks information. These techniques also may be utilized when a user is accessing certain material, in order to make available material that it is thought may be of interest to a user who has accessed the original material. These techniques may also be utilized when a user, given access to particular material, requests other similar material. Other situations when these information retrieval techniques may be employed will also be apparent to one of ordinary skill in the art.
Some information retrieval techniques such as are employed in these circumstances choose documents for retrieval from among documents in a collection based upon the occurrence of specified terms in the documents in the collection. (Hereinafter, for simplicity, “document” shall be used to refer to the items, such as Web pages or Web sites, in the collection being analyzed.) There are a variety of different techniques for specifying the terms to be used. (A “term” may be any word, number, acronym, abbreviation or other collection of letters, numbers and symbols which may be found in a fixed order in a document.) In some methods, a search may be made among the documents in the collection for some or all of the terms in a search query generated by the user. In other methods, a search may be made for some or all of the text of a given document. (In some methods, all terms except certain common words, referred to as stop words, such as “the” or “and”, may be included in the search.) In other methods, a search may be made for index terms which have been associated with that document by various means. Still other methods will use a combination of the above techniques, and further approaches to selecting terms for which a search is to be made will be familiar to one of ordinary skill in the art.
After a list of terms for which a search is to be made has been compiled, many information retrieval techniques then proceed by calculating scores for each document in the collection over which the search is being made, based upon the occurrence of the terms on the list in the documents. These scores which are calculated may be referred to as term frequency scores, insofar as the score assigned to a document depends on the frequency of occurrence of terms in the document.
There are a variety of different formulae which may be used to calculate these term frequency scores, including for example the Robertson's term frequency score (RTF). Term frequency score formulae may assign varying weights to terms found in a document, depending upon such factors as the relative rareness or commonness of the term. Other factors which may be used to vary the weight assigned to a term in calculating a term frequency score will also be apparent to one of ordinary skill in the art.
Documents in a collection which is being searched may be divided into different sections or segments, such as an introduction or summary, a main body, footnotes, captions, and the like. Other divisions of documents will be apparent to one of ordinary skill in the art.
A Web site may permit a user to obtain lists of relevant items of interest, such as Web sites, other documents or names of merchants carrying merchandise in particular categories. The site may be organized so that an item of interest may be considered to be in more than one category. The site may be organized so that the categories presented to the user may vary, depending on a term or terms specified by the user. If this approach is utilized, the user may input terms that relate to the merchandise in which he is interested, such as “automobiles”, and in return he may be presented with several categories, such as “automobiles, manufacturers” or “automobiles, sales” or “automobiles, service.” The categories presented may be chosen by any one of a number of techniques that will be familiar to one of ordinary skill in the art.
It may be desirable present additional material to a user who is searching for items of interest. For example, it may be desirable to present the user with banner advertisements which relate to the item of interest for which he is searching.


REFERENCES:
patent: 5206949 (1993-04-01), Cochran et al.
patent: 5781904 (1998-07-01), Oren et al.
patent: 5835087 (1998-11-01), Herz et al.
patent: 5845278 (1998-12-01), Kirsch et al.
patent: 5956722 (1999-09-01), Jacobson et al.
patent: 6026388 (2000-02-01), Liddy et al.
patent: 6070158 (2000-05-01), Kirsch et al.
patent: 6233575 (2001-05-01), Agrawal et al.
patent: 6269368 (2001-07-01), Diamond
Leistensnider et al., “A simple probabilistic approach to classification and routing”, IEEE, 1997, pp. 750-754.*
Application No: 09/596,583, Automatic Index Term Augmentation in Document Retrieval; filed Jun. 19, 2000; Non Pending.
Application No: 09/596,644, “Semi-Automatic Index Term Augmentation in Document Retrieval” Filed Jun. 19, 2000, Non Pending.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Generalized term frequency scores in information retrieval... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Generalized term frequency scores in information retrieval..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Generalized term frequency scores in information retrieval... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3045893

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.