Computer method and apparatus for clustering documents and autom

Data processing: database and file management or data structures – Database design – Data structure types

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

395794, G06F 1730

Patent

active

058571795

ABSTRACT:
A computer method and apparatus determines keywords of documents. An initial document by term matrix is formed, each document being represented by a respective M dimensional vector, where M represents the number of terms or words in a predetermined domain of documents. The dimensionality of the initial matrix is reduced to form resultant vectors of the documents. The resultant vectors are then clustered such that correlated documents are grouped into respective clusters. For each cluster, the terms having greatest impact on the documents in that cluster are identified. The identified terms represent key words of each document in that cluster. Further, the identified terms form a cluster summary indicative of the documents in that cluster.

REFERENCES:
patent: 4839853 (1989-06-01), Deerwester et al.
patent: 5263120 (1993-11-01), Bickel
patent: 5343554 (1994-08-01), Koza et al.
patent: 5481712 (1996-01-01), Silver et al.
patent: 5559940 (1996-09-01), Hutson
patent: 5619709 (1997-04-01), Caid et al.
Jain, A.K., et al., "Algorithms for Clustering Data," Michigan State University, Prentice Hall, Englewood Cliffs, New Jersey 07632, pp. 96-101 (1988).
Faloutsos, C., et al., "A Survey of Information Retrieval and Filtering," University of Maryland, College Park, MD 20742, pp. 1-22 (no date given).
Cutting, D.R., et al., "Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections," Proceedings of the Fifteenth Annual International ACM SIGIR Conference, pp. 318-329 (Jun. 1992).
Cutting, D.R., et al., "Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections," Proceedings of the Sixteenth Annual International ACM SIGIR Conference, pp. 1-9 (Jun. 1993).
Faber, V., "Clustering and the Continuous k-Means Algorithm," (No Date Given).
Singhal, A., "Length Normalizatin in Degraded Text Collections," Department of Computer Science,, Cornell University, Ithaca, NY 14853, pp. 1-19 (no date given).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Computer method and apparatus for clustering documents and autom does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Computer method and apparatus for clustering documents and autom, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer method and apparatus for clustering documents and autom will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-869596

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.