Data processing: database and file management or data structures – Database and file access
Reexamination Certificate
2011-04-19
2011-04-19
Le, Debbie (Department: 2168)
Data processing: database and file management or data structures
Database and file access
C707S752000, C707S765000
Reexamination Certificate
active
07930282
ABSTRACT:
A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.
REFERENCES:
patent: 6424971 (2002-07-01), Kreulen et al.
patent: 6804670 (2004-10-01), Kreulen et al.
patent: 6986104 (2006-01-01), Green et al.
patent: 2002/0169783 (2002-11-01), Kreulen et al.
patent: 2005/0022106 (2005-01-01), Kawai et al.
patent: 2004288168 (2004-10-01), None
Berry, J. and Linoff, A; “Data Mining Techniques for Marketing, Sales, and Customer Support.” John Wiley & Sons, Inc., New York, 1996, pp. 187-215.
Fox, C.; “Lexical Analysis and Stoplists.” 1992, pp. 102-130.
Honrado, A.; Leon, R.; O'Donnel, R.; and Sinclair, D.; “A Word Stemming Algorithm for the Spanish Language.” Seventh International Symposium on String Processing Information Retrieval; SPIRE 2000; pp. 139-145.
Salton, G. and Buckley, C.; “Term-Weighting Approaches in Automatic Text Retrieval.” Information Processing & Management, vol. 24, No. 5, 1988, pp. 513-523.
Salton, G. and McGill, M. J.; “Introduction to Modern Retrieval.” McGraw-Hill Book Company, New York, 1983. pp. 52-73.
Spangler, S. and Kreulen, J.; “Interactive Methods for Taxonomy Editing and Validation.” Proceedings of the Conference on Information and Knowledge Mining; CIKM 2002; 8 pages.
Spangler, S.; Kreulen, J.; and Lessler, J.; “Generating and Browsing Multiple Taxonomies Over a Documents Collection.” Journal of Management Information Systems, vol. 19, No. 4, Spring 2003, pp. 191-212.
Can, F. and Ozkarahan, E. A.; “Concepts of the Cover Coefficient-Based Clustering Methodology.” 1985, pp. 204-211.
Harabagiu, S. and Lacatusu, F.; “Topic Themes for Multi-Document Summarization.” SIGIR'05, Aug. 15-19, 2005, pp. 202-209.
Hardy, H., et al. “Cross-Document Summarization by Concept Classification.” SIGIR '02, Aug. 11-15, 2002, pp. 121-128.
Cantor & Colburn LLP
International Business Machines - Corporation
Le Debbie
Mobin Hasanul
LandOfFree
Document clustering based on cohesive terms does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Document clustering based on cohesive terms, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Document clustering based on cohesive terms will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2643745