Method of clustering electronic documents in response to a...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method of clustering electronic documents in response to a... Method of clustering electronic documents in response to a...

: 2000-09-28
: 2002-03-26
: Von Buhr, Maria N. (Department: 2171)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: Reexamination Certificate
: active
: 06363379
: ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention is directed to a method for clustering electronic documents in response to a search query. More specifically, the invention is directed to a method in which a cluster of documents is provided as a search result when the search query has not completely matched any documents, but, portions of the query are found to match a number of documents.
With the proliferation of electronic information sources it has been necessary to provide searching capabilities to enable users to look for information of interest in large collections of documents. It is well known to provide search engines for searching for pages on the World Wide Web. These pages are commonly referred to as unstructured documents. Examples of such search engines include Yahoo, Infoseek and others. It is also known to conduct searches across structured documents which may be found in databases. Several tools exist for searching in structured documents as well. Such searching often involves a forms-based interface for specifying attribute/value pairs (e.g., in a database such as white pages the attribute/value pair could be name/phone numbers).
In connection with searches of unstructured documents such as on the World Wide Web, the search engines do an effective job of finding many possible matches. However, the number of matches is often quite large and it is difficult to retrieve each of the documents to locate the few of particular interest. As an example, a search engine like Altavista or Lycos returns a ranked list of documents in response to a keyword-based query and the score of the document is based on the “similarity” of the document to the query keywords. Consider an example query of “rosehips cancer” where the user wants to discovery if rosehips (the tiny fruits left after rose petals fall) can help in the cure for cancer. The term counts of rosehips and cancer range in the tens of thousands. Given this as a result it is difficult for the user to search among the documents to find other information which might appear infrequently in the documentation. For example, it is difficult to obtain from this set documents that deal with using rosehips in cancer treatment unless the terms are found in the same document. It would be very useful if such sets of related documents could be automatically clustered and returned in response to the query, that is if a split match of the query (multiple documents that together satisfy the query) could be provided.
In a similar vein, in connection with sets of structured and unstructured documents it is possible that information is present partially in a structured document and partially in an unstructured document. Presently, there are no search mechanisms to locate such information.
SUMMARY OF THE INVENTION
The present invention provides a method for clustering documents in answer to a query, joining those documents that share infrequently occurring terms. More specifically, in accordance with the present invention, the search engine provides a ranked list of document clusters rather than individual documents in response to a query. Each document returned by the search as part of the answer list is required to match some or all of the query words and hence would have been part of the list of documents returned by the traditional approach. However, the present invention further computes an inter-document similarity beyond the computation of the documents to the query keywords. This enables the creation of document clusters.
In accordance with the method of the present invention, a universe of documents is first searched using an inverted index to locate documents that match the query keywords. Second, the similarity of document pairs is computed based on the occurrence of infrequently occurring words in the vicinity of query keywords in documents. Documents are clustered and assigned scores based on the diversity of matches of documents in the cluster to the query keywords and the similarity between pairs of documents in the cluster.
In a further embodiment of the present invention, the capability of finding split matches across structured and unstructured documents is also provided. In this embodiment the clusters constitute pairings of unstructured documents and structured documents which are compared to one another and scored in a manner similar to that described above. The paired documents are then ranked in order again relying on the concept of the diversity of matches of documents in the cluster to the query keywords and the similarity between pairs of documents in the cluster.

REFERENCES:
patent: 5542090 (1996-07-01), Henderson et al.
patent: 5598557 (1997-01-01), Doner et al.
patent: 5659766 (1997-08-01), Saund et al.
patent: 5675819 (1997-10-01), Schuetze
patent: 5787420 (1998-07-01), Tukey et al.
patent: 5787421 (1998-07-01), Nomiyama
patent: 5787422 (1998-07-01), Tukey et al.
patent: 5819258 (1998-10-01), Vaithyanathan et al.
patent: 5845278 (1998-12-01), Kirsch et al.
patent: 5857179 (1999-01-01), Vaithyanathan et al.
patent: 5864855 (1999-01-01), Ruocco et al.
patent: 5926812 (1999-07-01), Hilsenrath et al.

Affiliated with

Jacobson Guy

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Krishnamurthy Balachander

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Srivastava Divesh

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

AT&T Corp.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kenyon & Kenyon

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Von Buhr Maria N.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of clustering electronic documents in response to a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of clustering electronic documents in response to a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of clustering electronic documents in response to a... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2849610

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure