Logical division of files into multiple articles for search...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06256622

ABSTRACT:

FIELD OF THE INVENTION
The present invention is generally directed to the retrieval of information from electronic documents, and more particularly to a method and system for logically dividing and resolving a single document, or similar type of file, into multiple articles for analysis and presentation by a search engine.
BACKGROUND OF THE INVENTION
As the use of computers to create and manage all types of data continues to grow, the ability to access and retrieve information on a particular topic rapidly becomes unmanageable. This phenomenon is evident within organizations of all sizes, and is particularly noticeable in environments where the sources of information are potentially infinite. With its vast array of computers that are inter-connected around the world, the internet best exemplifies this problem. To permit users to easily locate sources of relevant information, therefore, the use of search engines has become almost ubiquitous.
In general, two types of search engines are employed to find relevant data that could be located in a variety of places. One type of search engine analyzes and indexes the contents of the various information sources before a search is conducted. When a user requests a search on a particular topic, the search engine only needs to refer to the index in order to quickly locate relevant documents and the like. The other type of search engine analyzes the contents of the information sources at the time that the search is being conducted. Although this type of search engine exhibits slower performance, because the available information is not preprocessed, it has the capability to return more current information in an environment where the data is being updated on a relatively frequent basis.
Typically, either type of search engine functions to retrieve all documents or files that match criteria specified by the user. Depending upon the capabilities of the search engine, it may return only those documents which exactly match the criteria that has been specified, or it may return a larger collection of documents which are equivalent to, or otherwise related to, the documents that exactly match the search criteria.
In many situations, the number of documents which meet the user's request can be quite voluminous. For instance, a typical search that is conducted on the internet might return hundreds, or even thousands, of “hits”, i.e., documents which match the user's search criteria. To assist the user in reviewing these documents, therefore, many search engines attempt to rank them according to their relevance.
One particular technique that is commonly used to rank documents relies upon the frequency of occurrence of criteria-matching information within a document. For instance, the number of times that a user-specified term appears in a given document can be compared to the total number of words in the document, to determine a relevance ratio. Using this approach, a single-page document in which the user-specified term appears several times will have a much higher ranking than a multi-page document in which the user-specified term appears only once or twice. After the relevance of each of the retrieved documents is determined, using such an approach, they are presented to the user in a manner indicative of their respective relevance rankings.
While this approach to the searching and presentation of documents assists the user in sorting through vast amounts of information, it does not always present the user with the information that is most relevant to his or her request. For example, a large document might contain an entire section that is devoted to the specific topic in which a user is interested. However, if that section forms only a small portion of the overall document, its relevance ranking could end up being relatively low. If the user operates on the assumption that only the most highly ranked documents presented by the search engine are likely to be of real interest, he or she may never get to the document which is, in fact, right on point.
Typically, users view the results of a search through some form of browser, which enables them to navigate between each of the documents that was located during the search. When the user selects a particular document from the list of those which have been retrieved, the browser displays the beginning of the document, normally the top half of the first page. Based upon the information contained in this displayed portion, the user may decide to look further into the document, or proceed to the next document that turned up in the search.
As is very often the case, the portion of a document which is relevant to a user's inquiry may not be evident from viewing the top half of the first page. This situation is particularly evident in an example of the type described above, in which a large document may contain an entire section devoted to the particular topic of interest. If that section is buried deep in the document, the user may never take the time to scan far enough into the document to discover this fact. Consequently, the user may end up missing the very document which is most relevant to his or her inquiry.
Accordingly, it is desirable to provide a system for searching and retrieving documents which is capable of identifying the portions of documents that are truly relevant to a user's request, regardless of the size of the document. Further along these lines, it is desirable to provide such a system in which the relevant portion of the document is immediately displayed to the user, thereby alleviating the user of the need to take the time to scan through voluminous documents to determine their possible relevance.
SUMMARY OF THE INVENTION
In accordance with the present invention, these objectives are achieved by logically dividing and resolving a single file, such as a single document, into multiple articles that can be individually recognized and ranked by search engines. Using this approach, the search engine is able to separately evaluate each of several articles within a file, to determine whether they meet the criteria for a given search. Each article can be separately provided to the user with an indication of its particular relevance to the search criteria, as well as the relevance of the document as a whole.
In one embodiment of the invention, the segmentation of a document into separate articles is based upon standard tokens that are used in document mark-up languages. An example of such a token is a particular level of header tag that is employed in HTML documents. Whenever such a token is encountered, the portion of the document which follows the token is considered to be a new article. Using this approach, currently existing documents which use standard mark-up language techniques can be readily divided into articles, without any modification thereof.
In a further embodiment of the invention, a unique marker is inserted at the beginning and end of each article in the document. The search engine can employ these markers to delineate the various articles from one another. As a further feature, additional markers can be included within each article, to assist in the navigation of the document. If a particular article meets the criteria for a given search, and the user provides an indication that he or she wishes to view the article, the additional marker can cause the browser to immediately display the beginning of the article, rather than the first page of the document in which the article may be embedded. Using this approach, therefore, the author of the document can be explicit in the designation of the parts which represent separate articles, and the user can be instantly provided with the relevant information, thereby avoiding the need to scan through lengthy documents.
Further features of the invention, and the advantages provided thereby, are discussed in detail hereinafter with reference to particular embodiments illustrated in the accompanying drawings.


REFERENCES:
patent: 5504891 (1996-04-01), Motoyama
patent: 5625767 (1997-04-01), Bartell
patent: 5737619 (19

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Logical division of files into multiple articles for search... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Logical division of files into multiple articles for search..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Logical division of files into multiple articles for search... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2557193

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.