Data processing: database and file management or data structures – Database design – Data structure types
Patent
1996-09-09
1999-01-05
Black, Thomas G.
Data processing: database and file management or data structures
Database design
Data structure types
395794, G06F 1730
Patent
active
058571795
ABSTRACT:
A computer method and apparatus determines keywords of documents. An initial document by term matrix is formed, each document being represented by a respective M dimensional vector, where M represents the number of terms or words in a predetermined domain of documents. The dimensionality of the initial matrix is reduced to form resultant vectors of the documents. The resultant vectors are then clustered such that correlated documents are grouped into respective clusters. For each cluster, the terms having greatest impact on the documents in that cluster are identified. The identified terms represent key words of each document in that cluster. Further, the identified terms form a cluster summary indicative of the documents in that cluster.
REFERENCES:
patent: 4839853 (1989-06-01), Deerwester et al.
patent: 5263120 (1993-11-01), Bickel
patent: 5343554 (1994-08-01), Koza et al.
patent: 5481712 (1996-01-01), Silver et al.
patent: 5559940 (1996-09-01), Hutson
patent: 5619709 (1997-04-01), Caid et al.
Jain, A.K., et al., "Algorithms for Clustering Data," Michigan State University, Prentice Hall, Englewood Cliffs, New Jersey 07632, pp. 96-101 (1988).
Faloutsos, C., et al., "A Survey of Information Retrieval and Filtering," University of Maryland, College Park, MD 20742, pp. 1-22 (no date given).
Cutting, D.R., et al., "Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections," Proceedings of the Fifteenth Annual International ACM SIGIR Conference, pp. 318-329 (Jun. 1992).
Cutting, D.R., et al., "Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections," Proceedings of the Sixteenth Annual International ACM SIGIR Conference, pp. 1-9 (Jun. 1993).
Faber, V., "Clustering and the Continuous k-Means Algorithm," (No Date Given).
Singhal, A., "Length Normalizatin in Degraded Text Collections," Department of Computer Science,, Cornell University, Ithaca, NY 14853, pp. 1-19 (no date given).
Adler Mark R.
Hill Christopher G.
Vaithyanathan Shivakumar
Black Thomas G.
Dagg David A.
Digital Equipment Corporation
Ho Buay Lian
LandOfFree
Computer method and apparatus for clustering documents and autom does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer method and apparatus for clustering documents and autom, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer method and apparatus for clustering documents and autom will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-869596