Data processing: presentation processing of document – operator i – Presentation processing of document – Layout
Reexamination Certificate
2000-09-26
2008-09-30
Hong, Stephen (Department: 2178)
Data processing: presentation processing of document, operator i
Presentation processing of document
Layout
Reexamination Certificate
active
07430717
ABSTRACT:
A method and structure for clustering documents in datasets which include clustering first documents and a first dataset to produce first document classes, creating centroid seeds based on the first document classes, and clustering second documents in a second dataset using the centroid seeds, wherein the first dataset and the second dataset are related. The clustering of the first documents in the first dataset forms a first dictionary of most common words in the first dataset and generates a first vector space model by counting, for each word in the first dictionary, a number of the first documents in which the word occurs, and clusters the first documents in the first dataset based on the first vector space model, and further generates a second vector space model by counting, for each word in the first dictionary, a number of the second documents in which the word occurs. Creation of the centroid seeds includes classifying second vector space model using the first document classes to produce a classified second vector space model and determining a mean of vectors in each class in the classified second vector space model, the mean includes the centroid seeds.
REFERENCES:
patent: 5317507 (1994-05-01), Gallant
patent: 5675819 (1997-10-01), Schuetze
patent: 5832182 (1998-11-01), Zhang et al.
patent: 5857179 (1999-01-01), Vaithyanathan et al.
patent: 5864855 (1999-01-01), Ruocco et al.
patent: 5999927 (1999-12-01), Tukey et al.
patent: 6012058 (2000-01-01), Fayyad et al.
patent: 6298174 (2001-10-01), Lantrip et al.
“Computer Oriented Approaches To Pattern Recognition,” by William S. Meisel, Academic Press (1972), pp. 144-146.
Cutting et al., “Scatter/Gather: A Cluster-Based Approach to Browsing Large Document Collections”, Proc. Of the Annual International ACM SIGIR Conference, vol. 15, No. 21, 1992, pp. 318-329.
Al-Daoud et al., “New Methods for the Initialisation of Clusters”, Pattern Recognition Letters Elservier Netherlands, vol. 17, No. 5. 1996, pp. 451-454.
Steinbach et al., “A Comparison of Document Clustering Techniques”, Technical Report, 2000, pp. 1-20.
Pena et al., “An Empirical Comparison of four initialization methods for the K-Means Algorithm”, Pattern Recognition Letters, vol. 20, No. 10, 1999, pp. 1027-1040.
Jain et al., “Data Clustering: A Review”, ACM Computing Surveys, vol. 31, No. 3, September, pp. 264-323, 1999.
Marina Meila, “An Experimental Comparison of Several Clustering and Initialization Methods”, Technical Report, 1998, pp. 1-22.
Bollacker et al., “A Scalable Method for Classifier Knowledge Reuse”, International Conference of Houston, TX, vol. 3, 1997, pp. 1474-1478.
Bradely et al., “Refining Initial Points for K-Means Clustering” International Conference, 1998, pp. 91-99.
Gibb & Rahman, LLC
Hong Stephen
International Business Machines - Corporation
Stork Kyle R
LandOfFree
Method for adapting a K-means text clustering to emerging data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for adapting a K-means text clustering to emerging data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for adapting a K-means text clustering to emerging data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3987600