Latent semantic clustering

Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S045000

Reexamination Certificate

active

07844566

ABSTRACT:
An embodiment of the present invention provides a computer-based method for automatically identifying clusters of conceptually-related documents in a collection of documents, including the following steps: generating a document-representation of each document in an abstract mathematical space; identifying a plurality of document clusters in the collection of documents based on a conceptual similarity between respective pairs of the document-representations, wherein each document cluster is associated with an exemplary document and a plurality of other documents; and identifying a non-intersecting document cluster from among the plurality of document clusters based on (i) a conceptual similarity between the document-representation of the exemplary document and the document-representation of each document in the non-intersecting cluster and (ii) a conceptual dissimilarity between a cluster-representation of the non-intersecting document cluster and a cluster-representation of each other document cluster. Variants of the method enable creating hierarchy of clusters and conducting incremental updates of preexisting hierarchical structures.

REFERENCES:
patent: 4839853 (1989-06-01), Deerwester et al.
patent: 5301109 (1994-04-01), Landauer et al.
patent: 5745602 (1998-04-01), Chen et al.
patent: 5787422 (1998-07-01), Tukey et al.
patent: 5819258 (1998-10-01), Vaithyanathan et al.
patent: 5838819 (1998-11-01), Ruedisueli et al.
patent: 5857179 (1999-01-01), Vaithyanathan et al.
patent: 5926812 (1999-07-01), Hilsenrath et al.
patent: 5963940 (1999-10-01), Liddy et al.
patent: 5987446 (1999-11-01), Corey et al.
patent: 5999927 (1999-12-01), Tukey et al.
patent: 6041323 (2000-03-01), Kubota
patent: 6233575 (2001-05-01), Agrawal et al.
patent: 6263335 (2001-07-01), Paik et al.
patent: 6289353 (2001-09-01), Hazlehurst et al.
patent: 6347314 (2002-02-01), Chidlovskii
patent: 6349309 (2002-02-01), Aggarwal et al.
patent: 6446061 (2002-09-01), Doerre et al.
patent: 6480843 (2002-11-01), Li
patent: 6510406 (2003-01-01), Marchisio
patent: 6519586 (2003-02-01), Anick et al.
patent: 6523026 (2003-02-01), Gillis
patent: 6564197 (2003-05-01), Sahami et al.
patent: 6625585 (2003-09-01), MacCuish et al.
patent: 6654739 (2003-11-01), Apte et al.
patent: 6678679 (2004-01-01), Bradford
patent: 6684205 (2004-01-01), Modha et al.
patent: 6687696 (2004-02-01), Hofmann et al.
patent: 6775677 (2004-08-01), Ando et al.
patent: 6778979 (2004-08-01), Grefenstette et al.
patent: 6820075 (2004-11-01), Shanahan et al.
patent: 6925460 (2005-08-01), Kummamuru et al.
patent: 6928425 (2005-08-01), Grefenstette et al.
patent: 7024400 (2006-04-01), Tokuda et al.
patent: 7024407 (2006-04-01), Bradford
patent: 7113943 (2006-09-01), Bradford et al.
patent: 7137062 (2006-11-01), Kaufman et al.
patent: 7185001 (2007-02-01), Burdick et al.
patent: 2001/0037324 (2001-11-01), Agrawal et al.
patent: 2002/0103799 (2002-08-01), Bradford et al.
patent: 2003/0037251 (2003-02-01), Frieder et al.
patent: 2003/0088480 (2003-05-01), Berghofer et al.
patent: 2003/0088581 (2003-05-01), Maze et al.
Osinski, An Algorithm for Clustering of Web Search Results, Masters Thesis, Poznan University of Technology, Poland, 2003, pp. 1-91.
Dhillon, et al, Concept Decompositions for Large Sparse Text Data Using Clustering, Machine Learning, vol. 42, Issue 1-2, Jan. 2001, pp. 143-175.
Muresan, Using Document Clustering and Language Modelling in Mediated Information Retrieval, Doctoral Thesis, Rutgers Univ., 2002, pp. 1-265.
Osinski, An Algorithm for Clustering of Web Search Results, Masters Thesis, Poznan University of Technology, Poland, 2003, pp. 1-91.
P.F. Brown, et al., “The Mathematics of Statistical Machine Translation: Parameter Estimation,” 19 Computational Linguistics 263 (1993).
Klebanov, B., and Wiemer-Hastings, P., 2002, “Using LSA for Pronominal Anaphora Resolution,” in Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing, LNCS 2276, Springer Verlag, pp. 197-199.
Deerwester, S., et al., “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, 41(6), pp. 391-407, Oct. 1990.
Ding, C., A Similarity-based Probability Model for Latent Semantic Indexing, Proceedings of the 22nd Annual SIGIR Conference, Berkeley, Calif., Aug. 1999.
Marchisio, G., and Liang, J., “Experiments in Trilingual Cross-language Information Retrieval,” Proceedings, 2001 Symposium on Document Image Understanding Technology, Columbia, MD, 2001, pp. 169-178.
Hoffman, T., “Probabilistic Latent Semantic Indexing,” Proceedings of the 22nd Annual SIGIR Conference, Berkeley, CA, 1999, pp. 50-57.
Kolda, T., and O.Leary, D., “A Semidiscrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval,” ACM Transactions on Information Systems, vol. 16 , Issue 4 (Oct. 1998), pp. 322-346.
Landauer, T., et al., in “Learning Human-Like Knowledge by Singular Value Decomposition: A Progress Report,” in M.I. Jordan, MJ. Kearns and S.A. Solla (Eds.), Advances in Neural Information Processing Systems 10, Cambridge: MIT Press, pp. 45-51 (1998).
William H. Press et al., “Numerical Recipes,” The Art of Scientific Computing, Chapter 2, pp. 52-64, Cambridge University Press, 1986.
D. Arnold et al., “Machine Translation,” Chapter 6, pp. 111-128, Blackwell Publishers, Cambridge, MA, 1994.
Teuvo Kohonen, “Self-Organizing Maps,” Third Edition, Springer-Verlag Berlin Heidelberg, New York, pp. 105-115 and 191-194.
Pavel Berkhin, “Survey of Clustering Data Mining Techniques,” Accrue Software, Inc., pp. 1-56, printed from [http://citeseer.ist.psu.edu/berkin02survey.html] on Oct. 2, 2006.
Korfhage, R., “Information Storage and Retrieval”, Section 5.7, Document Similarity, pp. 125-133, Wiley and Sons, 1997.
Landuer, T., et al., “An Introduction to Latent Semantic Analysis, Discourse Processes”, vol. 25, 1998, pp. 259-284.
Dumais, S., “LSI Meets TREC: A Status Report”, in: D. Harman (Ed.), The First Text Retrieval Conference (TRECI), National Institute of Standards and Technology Special Publication #500-207, 1993, pp. 137-152. Available at website: http://lsi.argreenhouse.com/˜remde/Isi/LSIpapers.html.
Dumais, S., “Latent Semantic Indexing (LSI) and TREC-2”, in: D. Harman (Ed.), The Second Text Retrieval Conference (TREC2), National Institute of Standards and Technology Special Publication #500-215, 1994, pp. 105-116. Available at website: http://lsi.argreenhouse.com/˜remde/lsi/LSIpapers.html.
Dumais, S., “Latent Semantic Indexing (LSI) and TREC-3 Report”, in: D. Harman (Ed.), The Third Text Retrieval Conference (TREC3), National Institute of Standards and Technology Special Publication #500-226, 1995. Available at website: http://lsi.argreenhouse.com/˜remde/lsi/LSIpapers.html.
Dumais, S., et al., “Automatic Cross-Language Retrieval Using Latent Semantic Indexing”, in AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Mar. 1997.
International Search Report for International Application No. PCT/US05/23912, filed Jun. 30, 2005.
Susan T. Dumais, “Latent Semantic Analysis,” Annual Review of Information Science and Technology, vol. 38, Information Today, Inc., Medford, New Jersey, 2004, pp. 189-230.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Latent semantic clustering does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Latent semantic clustering, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Latent semantic clustering will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4224672

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.