Method and apparatus for retrieving documents based on...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C709S202000, C382S190000

Reexamination Certificate

active

06360215

ABSTRACT:

FIELD OF THE INVENTION
The present invention generally relates to data processing. The invention relates more specifically to retrieving a document from among several electronic documents based on information not derived from the literal content of the document.
BACKGROUND OF THE INVENTION
Hypertext systems now enjoy wide use. One particular hypertext system, the World Wide Web (“Web”), provides global access over public packet-switched networks to a large number of hypertext documents. The Web has grown to contain a staggering number of documents, and the number of documents continues to increase. The number of documents available through the Web is so large that to use the Web in a practical way almost always requires a search service, search engine, or similar service.
Certain search engines, however, have limited utility because the search results they produce include documents that are not relevant to the search query. In particular, many search engines return search results that list documents that are not genuinely related to the search query. One reason that search engines return such poor-quality results is that the search engines are easy to deceive. The search engines use “spider” programs that “crawl” to Web servers around the world, locate documents, index the documents, and follow hyperlinks to other documents. The index may comprise a list of all words encountered by the “spider” in all the documents, in which each word in the list is associated with a reference to each of the documents that contains that word. Unfortunately, the “spiders” cannot discriminate among documents that genuinely use a particular word and documents that contain the word, but are really about something else.
For example, a Web document that contains sexually-oriented or pornographic material may also contain one or more words that are unrelated to the sexual material, but are intended to cause the document to be indexed by search engines under those words, thereby luring unsuspecting browsers to the document. A pornographic document that contains a decoy word intended to lure male viewers, such as “CORVETTE,” for example, followed by sexual material, would be indexed by a search engine under the word “CORVETTE”. The decoy words may be embedded in invisible metatags or rendered in white characters on a white background, so as to be invisible when the document is displayed by the browser. This practice is called “spamming” a search engine or an indexing system. Searchers who submit a query to the search engine or indexing system that seeks information about the motion picture “Bambi” would receive the pornographic page in the search results. This is undesirable and has led to criticism of the utility of search engines and indexing systems.
As a result, the search results returned by the search engine often contain references to the documents that are totally unrelated, in terms of genuine content, to the scope of a search query. In the World Wide Web context, search engines that suffer from this problem include the Yahoo!® Web site, the Excite® Web site, the Infoseek® Web site, and others.
Accordingly, in this field there is a need for a system or mechanism that can eliminate extraneous references from search engine search results.
There is a particular need for a system or mechanism that can combat “spamming” of an indexing system or search engine system.
There is also a need for a mechanism that can associate words, search terms, or editorial matter, other than words appearing in the content of a document, with the document in an index.
There is a particular need for such a system that can carry out a search for a document based on words, search terms, or editorial matter other than the literal content of a group of documents.
SUMMARY OF THE INVENTION
The foregoing needs, and other needs and objects that will become apparent from the following disclosure, are fulfilled by the present invention, which comprises, in one aspect, a method of selecting electronic documents from among a plurality of electronic documents, the method comprising the steps of storing a tag word in an index in association with information identifying an electronic document, in which the tag word comprises data that is not derived from content of the electronic document; receiving a search query; modifying the search query to create a modified search query by adding to the search query a search term that references the tag word; and creating a set of search results by searching the index based on the modified search query.
One feature of this aspect is that the step of storing includes the steps of receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words; and storing, in the index, information associating each of the one or more tag words with the documents in the index that satisfy the criteria associated with the tag words. Another feature is that the step of storing includes the steps of receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, and in which at least a portion of the data is expressed in a wildcard format; retrieving a location identifier of each of the documents that are indexed in the index; matching each location identifier to each of the criteria; and when one location identifier matches one of the criteria, storing, in the index, information associating such location identifier with one or more of the tag words.
In another feature, the step of storing includes the steps of receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with one or more tag words, and in which one of the specifications is expressed in a wildcard format; retrieving a location identifier of each of the documents that are indexed in the index; matching each location identifier to each of the specifications by interpreting the one of the specifications that is in the wildcard format according to one or more wildcard format rules; and when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with one or more of the tag words. In another feature, storing includes the steps of storing a hash value representing the tag word in a record of the index; and storing an indirect reference to information identifying one or more of the documents that contain the tag word.
Another aspect of the invention provides a method of restricting access to an electronic document that is stored among a plurality of documents, the method comprising the steps of storing a tag word in an index in association with information identifying the electronic document, in which the tag word indicates that access to the electronic document is restricted; receiving a search query that requests the electronic document; modifying the search query to create a modified search query by adding a search term that excludes from the modified search query all documents that contain the tag word; and creating a set of search results by searching the index based on the modified search query. One feature of this aspect is that the step of modifying comprises the step of modifying, automatically and using a software component of a browser, the search query to create a modified search query by adding a search term that excludes from the modified search query all documents that contain the tag word.
Another feature of this aspect is that the modified search query selects only those electronic documents that satisfy the original search query that also contain the tag word. A related feature is that the modified search query selects only those electronic documents that satisfy the original search query that do not contain the tag word.
In another aspect, the invention provides a method of processing queries that select an electronic document from among a plural

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for retrieving documents based on... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for retrieving documents based on..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for retrieving documents based on... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2890294

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.