Document clustering based on cohesive terms

Data processing: database and file management or data structures – Database and file access

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S752000, C707S765000

Reexamination Certificate

active

07930282

ABSTRACT:
A method and a storage medium, that includes instructions for causing a computer to implement the method, for document categorization is presented. The method includes identifying terms occurring in a collection of documents, and determining a cohesion score for each of the terms. The cohesion score is a function of a cosine difference between each of the documents containing the term and a centroid of all the documents containing the term. The method further includes sorting the terms based on the cohesion scores. The method also includes creating categories based on the cohesion scores of the terms, wherein each of the categories includes only documents (i) containing a selected one of the terms and (ii) that have not already been assigned to a category. The method still further includes moving each of the documents to a category of a nearest centroid, thereby refining the categories.

REFERENCES:
patent: 6424971 (2002-07-01), Kreulen et al.
patent: 6804670 (2004-10-01), Kreulen et al.
patent: 6986104 (2006-01-01), Green et al.
patent: 2002/0169783 (2002-11-01), Kreulen et al.
patent: 2005/0022106 (2005-01-01), Kawai et al.
patent: 2004288168 (2004-10-01), None
Berry, J. and Linoff, A; “Data Mining Techniques for Marketing, Sales, and Customer Support.” John Wiley & Sons, Inc., New York, 1996, pp. 187-215.
Fox, C.; “Lexical Analysis and Stoplists.” 1992, pp. 102-130.
Honrado, A.; Leon, R.; O'Donnel, R.; and Sinclair, D.; “A Word Stemming Algorithm for the Spanish Language.” Seventh International Symposium on String Processing Information Retrieval; SPIRE 2000; pp. 139-145.
Salton, G. and Buckley, C.; “Term-Weighting Approaches in Automatic Text Retrieval.” Information Processing & Management, vol. 24, No. 5, 1988, pp. 513-523.
Salton, G. and McGill, M. J.; “Introduction to Modern Retrieval.” McGraw-Hill Book Company, New York, 1983. pp. 52-73.
Spangler, S. and Kreulen, J.; “Interactive Methods for Taxonomy Editing and Validation.” Proceedings of the Conference on Information and Knowledge Mining; CIKM 2002; 8 pages.
Spangler, S.; Kreulen, J.; and Lessler, J.; “Generating and Browsing Multiple Taxonomies Over a Documents Collection.” Journal of Management Information Systems, vol. 19, No. 4, Spring 2003, pp. 191-212.
Can, F. and Ozkarahan, E. A.; “Concepts of the Cover Coefficient-Based Clustering Methodology.” 1985, pp. 204-211.
Harabagiu, S. and Lacatusu, F.; “Topic Themes for Multi-Document Summarization.” SIGIR'05, Aug. 15-19, 2005, pp. 202-209.
Hardy, H., et al. “Cross-Document Summarization by Concept Classification.” SIGIR '02, Aug. 11-15, 2002, pp. 121-128.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Document clustering based on cohesive terms does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Document clustering based on cohesive terms, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Document clustering based on cohesive terms will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2643745

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.