Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Reexamination Certificate
2007-01-16
2007-01-16
{hacek over (S)}mits, Talivaldis Ivars (Department: 2626)
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
Reexamination Certificate
active
10209594
ABSTRACT:
A method automatically determines groups of words or phrases that are descriptive names of a small set of documents, as well as infers concepts in the small set of documents that are more general and more specific than the descriptive names, without any prior knowledge of the hierarchy or the concepts, in a language independent manner. The descriptive names and the concepts may not even be explicitly contained in the documents. The primary application of the invention is for searching of the World Wide Web, but the invention is not limited solely to use with the World Wide Web and may be applied to any set of documents. Classes of features are identified in order to promote understanding of a set of documents. Preferably, there are three classes of features. “Self” features or terms describe the cluster as a whole. “Parent” features or terms describe more general concepts. “Child” features or terms describe specializations of the cluster. The self features can be used as a recommended name for a cluster, while parents and children can be used to place the clusters in the space of a larger collection. Parent features suggest a more general concept, while children features suggest concepts that describe a specialization of the self feature(s). Automatic discovery of parent, self and child features is useful for several purposes including automatic labeling of web directories and improving information retrieval.
REFERENCES:
patent: 5819258 (1998-10-01), Vaithyanathan et al.
patent: 5857179 (1999-01-01), Vaithyanathan et al.
patent: 5995095 (1999-11-01), Ratakonda
patent: 6055540 (2000-04-01), Snow et al.
patent: 6078913 (2000-06-01), Aoki et al.
patent: 6100901 (2000-08-01), Mohda et al.
patent: 6430558 (2002-08-01), Delano
patent: 6473095 (2002-10-01), Martino et al.
patent: 6480843 (2002-11-01), Li
patent: 6598043 (2003-07-01), Baclawski
patent: 6799176 (2004-09-01), Page
patent: 6925460 (2005-08-01), Kummamuru et al.
patent: 6931595 (2005-08-01), Pan et al.
patent: 2002/0065857 (2002-05-01), Michalewicz et al.
patent: 2002/0099702 (2002-07-01), Oddo
patent: 2002/0165860 (2002-11-01), Glover et al.
patent: 2002/0178136 (2002-11-01), Sundaresan et al.
patent: 2003/0221163 (2003-11-01), Glover et al.
patent: 2004/0111438 (2004-06-01), Chitrapura et al.
patent: 2005/0114130 (2005-05-01), Java et al.
patent: 2006/0110063 (2006-05-01), Weiss
Modha et al. “Clustering hypertext with applications to web searching,” 2000, Proceedings of the 11th ACM on Hypertext and hypermedia, pp. 143-152.
Gelbukh, et al. “A Method of Describing Document Contents through Topic Selection,” Sep. 22-24, 1999, String Processing and Information Retrieval Symposium, International Workshop on Groupware, pp. 73-80.
Zhu et al. “PageCluster: Mining conceptual link hierarchies from Web log files for adaptive Web site navigation,” May 2004, ACM Transactions on Internet Technology, ACM Press, vol. 4, Issue 2, pp. 185-208.
Chuang et al. “A practical web-based approach to generatig topic hierarchy for text segments,” 2004, Proceedings of the thirteenth ACM International conference on information and knowledge management, ACM Press, pp. 127-136.
Gaussier et al. “A hierarchical model for clustering and categorising documents,” 2002, Advances in Information Retrieval, Proceedings of the 24th BCS-IRSG European Colloquium on IR Research, pp. 229-247.
Lee et al. “Hierarchical video indexing and retrieval for subband-coded video,” Aug. 2000, IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, Issue 5, pp. 824-829.
Weiss et al. “HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering,” 1996, Proceedings of the 17th ACM conference on Hypertext, pp. 180-193.
Wan et al. “A new approach to image retrieval with hierarchical color clustering,” Sep. 1998, IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, Issue 5, pp. 628-643.
Mighlani et al. “Intelligent hierarchical layout segmentation of document images on the basis of colour content,” Dec. 2-4, 1997, Proceedings of IEEE Speech and Image Technologies for Computing and Telecommunications, vol. 1, pp. 191-194.
Rosario et al. “The Descent of Hierarchy, and Selection in Relational Semantics”, Proceedings of the 40th Annual Meeting of the ACL, published 2001, pp. 247-254.
Botafogo, et al. “Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics”, ACM Transactions on Information Systems (TOIS), ACM Press, vol. 10, Issue 2, Apr. 1992, pp. 142-180.
Caraballo, S.A., “Automatic Construction of a Hypernym-Labeled Noun Hierarchy from Text,” In Proceedings of the 37th Annual Meeting of the Association for Computation Linguistics (1999).
Fasulo, D., “An Analysis of Recent Work on Clustering Algorithms,” Technical Report, University of Washington (1999).
Glover, E.J. et al., “Using Web Structure for Classifying and Describing Web Pages,” In Proceedings of the 11th WWW Conference, Hawaii (2002).
Hearst, M.A., “Automatic Acquisition of Hyponyms from Large Text Corpora,” In Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France (Jul. 1992).
Hearst, M.A., “Automated Discovery of WordNet Relations,” In Christine Fellbaum, editor, Wordnet: An Electronic Lexcial Database, MIT Press (1998).
Hofmann, T. et al., “Statistical Models for Co-Occurence Data,” Technical Report Massachusetts Institute of Technology, Artificial Intelligence Laboratory (Feb. 1998).
Kumar, R. et al., “Trawling the Web for Emerging Cyber-Communities,” WWW8/Computer Networks, 31 (1999).
Popescul, A. et al., “Automatic Labeling of Document Clusters,” Unpublished manuscript, available at: http://citeseer.nj.nec.com/popescul00automatic.html.
Radev, D.R. et al., “Automatic Summarization of Search Engine Hit Lists,” In Proceedings of ACL'2000 Workshop on Recent Advanced in Natural Language Processing and Information Retrieval, Hong Kong, PRC (2000).
Sanderson, M. et al., “Deriving Concept Hierarchies from Text,” In Research and Development in Information Retrieval (1999).
Glover Eric J.
Lawrence Stephen R.
Pennock David M.
NEC Laboratories America, Inc.
Ng Eunice
{hacek over (S)}mits Talivaldis Ivars
LandOfFree
Inferring hierarchical descriptions of a set of documents does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Inferring hierarchical descriptions of a set of documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Inferring hierarchical descriptions of a set of documents will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3790309