Augmenting a training set for document categorization

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S046000

Reexamination Certificate

active

07457801

ABSTRACT:
A method and system for augmenting a training set used to train a classifier of documents is provided. The augmentation system augments a training set with training data derived from features of documents based on a document hierarchy. The training data of the initial training set may be derived from the root documents of the hierarchies of documents. The augmentation system generates additional training data that includes an aggregate feature that represents the overall characteristics of a hierarchy of documents, rather than just the root document. After the training data is generated, the augmentation system augments the initial training set with the newly generated training data.

REFERENCES:
patent: 5895470 (1999-04-01), Pirolli et al.
patent: 6792475 (2004-09-01), Arcuri et al.
patent: 6826576 (2004-11-01), Lulich et al.
patent: 7043468 (2006-05-01), Forman et al.
An Intelligent Web-Page Classifier with fair Feature-Subset Selection, Hahn-Ming Lee, chih-Ming Chen and Chia-Chen Tan, 2001 IEEE.
Tseng, Yuen-Hsien and Da-Wei Juang, “Document-Self Expansion for Text Categorization,” SIGIR '03, Toronto, Canada, ACM, Jul. 28-Aug. 1, 2003, pp. 399-400.
Baker, L. Douglas and Andrew Kachites McCallum, “Distributional Clustering of Words for Text Classification,” SIGIR '98, Melbourne, Australia, ACM, 1998, pp. 96-103.
Calvo, Rafael A., Jae-Moon Lee and Xiaobo Li, “Managing content with automatic document classification,” Journal of Digital Information, vol. 5, No. 282, 2004.
Dumais, Susan and Hao Chen, “Hierarchical Classification of Web Content,” SIGIR 2000, 8 pages.
Dumais, Susan, John Platt, David Heckerman and Mehran Sahami, “Inductive Learning Algorithms and Representations for Text Categorization,” Proceedings of the Seventh International Conference on Information and Knowledge Management, 1998, 8 pages.
Feng, Guang, Tie-Yan Liu, Xu-Dong Zhang, Tao Qin, Bin Gao and Wei-Ying Ma, “Level-Based Link Analysis,” AP Web, 2005, 12 pages.
Huang, Chien-Chung, Shui-Lung Chuang and Lee-Feng Chien, “LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora,” WWW 2004, New York, ACM, May 17-22, 2004, pp. 184-192.
Iwayama, Makoto, Atsushi Fujii, Noriko Kando and Yuzo Marukawa, “An Empirical Study on Retrieval Models for Different Document Genres: Patents and Newspaper Articles,” SIGIR '03, Toronto, Canada, ACM, Jul. 28-Aug. 1, 2003, pp. 251-258.
Joachims, Thorsten, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proceedings of the European Conference on Machine Learning ECML, Springer 1998, 7 pages.
Joachims, Thorsten, “Transductive Inference for Text Classification using Support Vector Machines,” Proceedings of the International Conference on Machine Learning ICML, 1999, 10 pages.
Lam, Wai and Chao Yang Ho, “Using A Generalized Instance Set for Automatic Text Categorization,” SIGIR '98, Melbourne, Australia, ACM 1998, pp. 81-89.
Larkey, Leah S. and W. Bruce Croft, “Combining Classifiers in Text Categorization,” SIGIR'96, Zurich, Switzerland, ACM 1996, pp. 289-297.
Lewis, David D., Yiming Yang, Tony G. Rose and Fan Li, “RCV1: A New Benchmark Collection for Text Categorization Research,” Journal of Machine Learning Research 5 (2004), pp. 361-397.
Lewis, David D., “An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task,” 15th Annual SIGIR'92, Denmark, ACM 1992, pp. 37-50.
Makoto, Iwayama and Tokunaga Takenobu, “Cluster-Based Text Categorization: A Comparison of Category Search Strategies,” ISSN 0918-2802, Technical Report 95-TR0016, Aug. 1995, 15 pages.
Masand, Brij, Gordon Linoff and David Waltz, “Classifying News Stories using Memory Based Reasoning,” 15th Annual International SIGIR'92, Denmark, ACM 1992, pp. 59-65.
McCallum, Andrew and Kamal Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI '98 Workshop on “Learning for Text Categorization,” 1998, 8 pages.
McCallum, Andrew, Ronald Rosenfeld, Tom Mitchell and Andrew Y. Ng, “Improving Text Classification by Shrinkage in a Hierarchy of Classes,” Proceedings of ICML-98, 15th International Conference on Machine Learning, 1998, 9 pages.
Wibowo, Wahyu and Hugh E. Williams, “Strategies for Minimising Errors in Hierarchical Web Categorisation,” CIKM'02, Virginia, Nov. 4-9, 2002 ACM, pp. 525-531.
Yang, Yiming and Jan. O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” Proceedings of ICML-97, 14th International Conference on Machine Learning, 1997, 9 pages.
Yang, Yiming and Xin Liu, “A re-examination of text categorization methods,” 22nd Annual International SIGIR, ACM 1999, 8 pages.
Yang, Yiming, Jian Zhang and Bryan Kisiel, “A Scalability Analysis of Classifiers in Text Categorization,” SIGIR'03, Toronto, Canada, ACM, Jul. 28-Aug. 1, 2003, pp. 96-103.
Yang, Yiming, “A Study on Thresholding Strategies for Text Categorization,” SIGIR'01, New Orleans, Louisiana, ACM 2001, 9 pages.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Augmenting a training set for document categorization does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Augmenting a training set for document categorization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Augmenting a training set for document categorization will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4021338

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.