Method, apparatus, and system for clustering and classification

Data processing: artificial intelligence – Machine learning

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

07574409

ABSTRACT:
The invention provides a method, apparatus and system for classification and clustering electronic data streams such as email, images and sound files for identification, sorting and efficient storage. The inventive systems disclose labeling a document as belonging to a predefined class though computer methods that comprise the steps of identifying an electronic data stream using one or more learning machines and comparing the outputs from the machines to determine the label to associate with the data. The method further utilizes learning machines in combination with hashing schemes to cluster and classify documents. In one embodiment hash apparatuses and methods taxonomize clusters. In yet another embodiment, clusters of documents utilize geometric hash to contain the documents in a data corpus without the overhead of search and storage.

REFERENCES:
patent: 1261167 (1918-04-01), Russell
patent: 5032987 (1991-07-01), Broder et al.
patent: 5909677 (1999-06-01), Broder et al.
patent: 5953503 (1999-09-01), Mitzenmacher et al.
patent: 5974481 (1999-10-01), Broder
patent: 5991808 (1999-11-01), Broder et al.
patent: 6073135 (2000-06-01), Broder et al.
patent: 6088039 (2000-07-01), Broder et al.
patent: 6119124 (2000-09-01), Broder et al.
patent: 6195698 (2001-02-01), Lillibridge et al.
patent: 6230155 (2001-05-01), Broder et al.
patent: 6269362 (2001-07-01), Broder et al.
patent: 6286006 (2001-09-01), Bharat et al.
patent: 6292762 (2001-09-01), Moll et al.
patent: 6349296 (2002-02-01), Broder et al.
patent: 6385609 (2002-05-01), Barshefsky et al.
patent: 6389436 (2002-05-01), Chakrabarti et al.
patent: 6438740 (2002-08-01), Broder et al.
patent: 6445834 (2002-09-01), Rising, III
patent: 6487555 (2002-11-01), Bharat et al.
patent: 6560600 (2003-05-01), Broder
patent: 6658423 (2003-12-01), Pugh et al.
patent: 6665837 (2003-12-01), Dean et al.
patent: 6687416 (2004-02-01), Wang
patent: 6711568 (2004-03-01), Bharat et al.
patent: 6732149 (2004-05-01), Kephart
patent: 7281664 (2007-10-01), Thaeler et al.
patent: 7295966 (2007-11-01), Barklund et al.
patent: 7333966 (2008-02-01), Dozier
patent: 7349386 (2008-03-01), Gou
patent: 7353215 (2008-04-01), Bartlett et al.
patent: 7370314 (2008-05-01), Minami et al.
patent: 7389178 (2008-06-01), Raz et al.
patent: 7406603 (2008-07-01), MacKay et al.
patent: 7451458 (2008-11-01), Tuchow
patent: 7463774 (2008-12-01), Wang et al.
patent: 7464026 (2008-12-01), Calcagno et al.
patent: 7477166 (2009-01-01), McCanne et al.
patent: 7487321 (2009-02-01), Muthiah et al.
patent: 2004/0049678 (2004-03-01), Walsmley et al.
patent: 2007/0112701 (2007-05-01), Chellapilla et al.
Adapted One-versus-All Decision Trees for Data Stream Classification Hashemi, Sattar; Yang, Ying; Mirzamomen, Zahra; Kangavari, Mohammadreza; Knowledge and Data Engineering, IEEE Transactions on vol. 21, Issue 5, May 2009 pp. 624-637 Digital Object Identifier 10.1109/TKDE.2008.181.
Ayra, Sunil, et al., “An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions,” J ACM 45 (1998), 1-33.
Arya, Sunil et al., “Approximate nearest neighbor queries in fixed dimensions,” 1-11.
Aubert, Gilles et al., “A Variational Method in Image Recovery,” vol. 34, No. 5 (Oct. 1997) 1948-1979.
Baytin, Alexander et al., “Threshold Properties of Uniform Spam Filters,” Cutter LLC., (Apr. 16, 2004). 1-8.
Bekkerman, Ron et al., “Distributional Word Clusters vs. Words for Text Categorization,” Journal of Machine Learning and Research (2002), 1-27.
Brin, Sergey et al., “Copy Detection Mechanisms for Digital Documents,” Stanford University, (Oct. 31, 1994), 1-21.
Broder, Andrei Z., “On the Resemblance and Containment of Documents,” Digital Systems Research Center, (1998), 21-29.
Buhler, Jeremy, “Efficient Large-Scale Sequence Comparison by Locality-Sensitive Hashing,” Univ. of Washington, (2001), 1-10.
Candès, Emmanuel J. et al, “Ridgelets: A Key to Higher-Dimensional Intermittency?,” Stanford University, (1999) 1-15.
Candès, Emmanuel J. et al, “Curvelets—A Surprisingly Effective Nonadaptive Representation For Objects with Edges,” (2000), 1-16.
Chambolle, Antonin, et al., “Image recovery via total variation minimization and related problems,” Numerische Mathematik, (1997) 167-188.
Charikar, Moses S., “Similarity Estimation Techniques from Rounding Algorithms,” Princeton Univ., (2002).
Chowdhury, Abdur, “Duplicate Data Detection,” Search & Navigation Group, America Online.
Cooper, James W. et al., “A Novel Method for Detecting Similar Documents,” IBM T J Watson, (2002), 167-185.
Conrad, Jack G. et al., “Online Duplicate Document Detection: Signature Reliability in a Dynamic Retrieval Environment,” Thompson Legal & Regulatory and Thomson-West, 443-452.
Christopoulos, C.A. et al., “JPEG2000: The New Still Picture Compression Standard,” 45-49.
Cuttr, “Our Technology Brochure Finding Similar Files in a Large File System.”
Dasgupta, Sanjoy et al., “An elementary proof of the Johnson-Lindenstrauss Lemma,” Int'l Computer Science Institute. 1-5.
Danerau. Fred J. “A Technique for Computer Detection and Correction of Spelling Errors,” IBM Corporation, (Mar. 1964) 171-176.
Do M.N. et al. The Contourlet Transform: An Efficient Directional Multirersolution Image Representation. IEEE Transactions Image on Processing, (Dec. 27, 2004).
Donoho, David L. et al., “Beamlets and Multiscale Image Analysis.” 1-196.
Gabor, D., “information Theory in Electron Microscopy,” int'i Academy of Pathology, (Jun. 1965), 1-9.
Garcia-Molina, Hector et al., “dsCAM: Finding Document Copies Across Multiple Database,” Stanford University.
Geman, Stuart, “Statistical Methods for Tomographic Image Reconstruction,” Brown University, 5-21.
Gionis, Aristides, et al., “Similarity Search in High Dimensions via Hashing,” In VLDB'99, Procedings of 25thInternational Conference on Very Large Data Bases, (Sep. 7-10, 1999) 518-529.
Gunopulos, Dimitrios et al., “Time Series Similarity Measures,” Univ. of California and Microsoft Research, 1-63.
Haveliwala, Taher H. et al., “Evaluating Strategies for Similarity Search on the Web,” (May 2002), 1-23.
Indyk, Peter et al., “Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality,” Stanford University, 604-613.
Indyk, Piotr et al., “Fast Image Retrieval via Embeddings,” 1-15.
Indyk, Piotr, “Nearest Neighbors in High-Dimensional Spaces,” 1-16.
Indyk, Piotr, et al., “Locality-Preserving Hashing In Multidimensional Spaces,” Stanford University, 1-2.
Indyk, Piotr et al., “Low Distortion Embeddings of Finite Metric Spaces,” 1-20.
Jaccard Paul, “The Distribution of the Flora in the Alpine Zone.” “The New Phytologist” Federal Polytechnic, (Feb. 1912) 1-14.
Joachims, Thorsten, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Univ. of Dortmund, (Nov. 1997, rev. Apr. 1998), 1-14.
Johnson, William B., “Extensions of Lipschitz Mappings Into a Hilbert Space,” (1984) 189-206.
Keller, Mikaela et al., “Theme Topic Mixture Model: A Graphical Model for Document Representation,” 1-8.
Kokar, Mieczyslaw M., “On Similarity Methods in Machine Learning,” Northeastern University, 1-11.
Kovászany, L.S.G. et al., “Image Processing,” IRE, 560-570.
Kramer, Henry P. et al., “Iterations of a Non-Linear Transformation For Enhancement of Digital Images,” Pattern Analysis Corp., (1974), 53-58.
Manasse, Mark, “Finding Similar Things Quickly in Large Collections,” MSR Silicon Valley.
Manber, Udi, “Finding Similar Files in a Large File System,” Univ. of Arizona, (Oct. 1993), 1-10.
Mey

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method, apparatus, and system for clustering and classification does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method, apparatus, and system for clustering and classification, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, apparatus, and system for clustering and classification will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4119082

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.