Hybrid tensor-based cluster analysis

Data processing: database and file management or data structures – Database and file access – Preparing data for information retrieval

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S048000

Reexamination Certificate

active

08060512

ABSTRACT:
What is disclosed is a novel system and method for analyzing multi-dimensional cluster data sets to identify clusters of related documents in an electronic document storage system. Digital documents, for which multi-dimensional probabilistic relationships are to be determined, are received and then parsed to identify multi-dimensional count data with at least three dimensions. Multi-dimensional tensors representing the count data and estimated cluster membership probabilities are created. The tensors are then iteratively processed using a first and a complementary second tensor factorization model to refine the cluster definition matrices until a convergence criteria has been satisfied. Likely cluster memberships for the count data are determined based upon the refinements made to the cluster definition matrices by the alternating tensor factorization models. The present method advantageously extends to the field of tensor analysis a combination of Non-negative Matrix Factorization and Probabilistic Latent Semantic Analysis to decompose non-negative data.

REFERENCES:
patent: 5857179 (1999-01-01), Vaithyanathan et al.
patent: 6389436 (2002-05-01), Chakrabarti et al.
patent: 6397166 (2002-05-01), Leung et al.
patent: 6505184 (2003-01-01), Reed et al.
patent: 6533882 (2003-03-01), Woodside
patent: 7720848 (2010-05-01), Guerraz et al.
patent: 2004/0068697 (2004-04-01), Harik et al.
patent: 2006/0041590 (2006-02-01), King et al.
patent: 2006/0190241 (2006-08-01), Goutte et al.
patent: 2008/0010038 (2008-01-01), Smaragdis et al.
patent: 2009/0132901 (2009-05-01), Zhu et al.
patent: 2009/0299705 (2009-12-01), Chi et al.
D. Cohn and T. Hofmann, “The Missing Link—A Probabilistic Model of Document Content and Hypertext Connectivity”, 2001, Advances in Neural Information Processing Systems 13, MIT Press, pp. 430-436.
Wei Xu, Xin Liu, Yihong Gong Document Clustering Based on Non-negative Matrix Factorization SIGIR '03 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval.
David Cohn, Huan Chang “Learning to Probabilistically Identify Authoritative Documents” 2000 Learning-International Workshop www.psu.edu.
Ben Taskar, Eran Segal, Daphne Koller Probabilistic classification and clustering in relational data 2001 www.psu.edu.
Acar, et al., “Unsupervised Multiway Data Analysis: A Literature Survey,” 15 pages.
Kolda, et al., “The TOPHITS Model for Higher-Order Web Link Analysis,” 12 pages.
Wang, et al., “Rank-R Approximation of Tensors Using Image-as-Matrix Representation,” 8 pages.
Shashanka, et al., “Probabilistic Latent Variable Models as Non-Negative Factorizations,” 7 pages.
Ding, et al., “Orthogonal Nonnegative Matrix Tri-Factorizations for Clustering,” 10 pages.
Ding, et al., “On the Equivalence Between Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing,” pp. 1-19.
Acar, et al., “Modeling and Multiway Analysis of Chatroom Tensors,” 13 pages.
Farahat, et al., “Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis, ” pp. 105-112.
Wang, et al., “Compact Representation of Multidimensional Data Using Tensor Rank-One Decomposition,” 4 pages.
Lee, et al., “Algorithms for Non-negative Matrix Factorization,” 7 pages.
Sun, et al., “CubeSVD: A Novel Approach to Personalized Web Search,” © International World Wide Web Conference Committee (IW3C2),WWW 2005, May 10-14, 2005, pp. 382-390, Chiba, Japan, ACM1-59593-046-9/09/05/0005.
Vichi, J., “Clustering and data reduction models for three-way preference data,” University of Rome “La Sapienza”, Dep. Statistics, Probability and Applied Statistics, P.le A. Moro 5, I-00185, Rome, Italy (Session 3 (invited lecture): S3-1), 2 pages.
Li, T., “A Unified View on Clustering Binary Data,” Florida International University, School of Computer Science, Sep. 30, 2005, pp. 1-25.
Hofmann, T., “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, 42, pp. 177-196, 2001, © 2001 Kluwer Academic Publishers, Manufactured in The Netherlands.
Huang, et al., “Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High Order SVD and K-Means Clustering,”KDD'08, Aug. 24-27, 2008, Las Vegas, Nevada, USA, © 2008 ACM 978-1-60558-193-4/08/08, pp. 327-335.
Bader, et al., “Temporal analysis of semantic graphs using ASALSAN,”Seventh IEEE International Conference on Data Mining, 1550-4786/07, © 2007 IEEE, DOI 10.1109/ICDM.2007.54, pp. 33-42.
Chi, et al., “Probabilistic Polyadic Factorization and Its Application to Personalized Recommendation,” CIKM'08, Oct. 26-30, 2008, Napa Valley, California, USA, © 2008 ACM 978-1-59593-991-3/08/10, pp. 941-950.
Hofmann, T., “Probabilistic Latent Semantic Indexing,”Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval, 1999, 8 pages.
Hofmann, T., “Probabilistic Latent Semantic Indexing (Powerpoint Presentation),”Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval, 1999, 12 pages.
Kolda, T., “Orthogonal Tensor Decompositions,”Siam J. Matrix Anal. Appl., vol. 23, No. 1, pp. 243-255, © 2001 Society for Industrial and Applied Mathematics.
Ding, et al., “Posterior Probabilistic Clustering using NMF,” SIGIR'08, Jul. 20-24, 2008, Singapore, ACM 978-1-60558-164-4/08/07 pp. 831-832.
Shashua, et al., “Non-Negative Tensor Factorization with Applications to Statistics and Computer Vision,”Proceedings of the 22ndInternational Conference on Machine Learning, Bonn, Germany, 2005, 8 pages.
Vasilescu, et al., “Multilinear Analysis of Image Ensembles: TensorFaces,”Proc. of the European Conf. on Computer Vision(ECCV '02), Copenhagen, Denmark, May 2002, pp. 447-460.
Ding, et al., “Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method,” © 2006, American Association for Artificial Intelligence (www.aaai.org), 6 pages.
Vasilescu, et al., “Multilinear Subspace Analysis of Image Ensembles,”Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR '03), Madison, WI, Jun. 2003, vol. 2, pp. 93-99.
Lee, et al., “Learning the parts of objects by non-negative matrix factorization,”Nature, vol. 401, Oct. 21, 1999, www.nature.com, pp. 788-791, © 1999 Macmillan Magazines Ltd.
Harshman, R., “Foundations of the Parafac Procedure: Models and Conditions for an “Explanatory” Multimodal Factor Analysis,”UCLA Working Papers in Phonetics, 16, pp. 1-84, University Microfilms, Ann Arbor, Michigan No. 10,085.
Schwarz, G., “Estimating the Dimension of a Model,”The Annals of Statistics, vol. 6, No. 2 (Mar. 1978), pp. 461-464.
Strehl, et al., “Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions,”Journal of Machine Learning Research 3(2002, pp. 583-617.
Martin, et al., “A Jacobi-Type Method for Computing Orthogonal Tensor Decompositions,” ISIAM J. Matrix Anal. Appl.I, vol. 30, No. 3, pp. 1219-1232.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Hybrid tensor-based cluster analysis does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Hybrid tensor-based cluster analysis, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hybrid tensor-based cluster analysis will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4266532

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.