Method and apparatus for document clustering and document...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

07433869

ABSTRACT:
A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint, One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.

REFERENCES:
patent: 5237157 (1993-08-01), Kaplan
patent: 5247575 (1993-09-01), Sprague et al.
patent: 5532920 (1996-07-01), Hatrick et al.
patent: 5546528 (1996-08-01), Johnson
patent: 5619247 (1997-04-01), Russo
patent: 5625711 (1997-04-01), Nicholson
patent: 5643064 (1997-07-01), Grinderslev
patent: 5680479 (1997-10-01), Wang et al.
patent: 5729637 (1998-03-01), Nicholson
patent: 5737599 (1998-04-01), Rowe
patent: 5781785 (1998-07-01), Rowe
patent: 5819301 (1998-10-01), Rowe
patent: 5832530 (1998-11-01), Paknad
patent: 5848184 (1998-12-01), Taylor et al.
patent: 5860074 (1999-01-01), Rowe
patent: 5930813 (1999-07-01), Padgeett
patent: 5991780 (1999-11-01), Rivette et al.
patent: 5999649 (1999-12-01), Nicholson
patent: 6041316 (2000-03-01), Allen
patent: 6049339 (2000-04-01), Schiller
patent: 6119124 (2000-09-01), Broder et al.
patent: 6185684 (2001-02-01), Pravetz
patent: 6282653 (2001-08-01), Berstis et al.
patent: 6327600 (2001-12-01), Satoh et al.
patent: 6345279 (2002-02-01), Li
patent: 6356936 (2002-03-01), Donoho
patent: 6385350 (2002-05-01), Nicholson
patent: 6389541 (2002-05-01), Patterson
patent: 6446068 (2002-09-01), Kortge
patent: 6516337 (2003-02-01), Tripp
patent: 6606613 (2003-08-01), Altschuler et al.
patent: 6629097 (2003-09-01), Keith
patent: 6732090 (2004-05-01), Shanahan
patent: 6920610 (2005-07-01), Lawton et al.
patent: 6988124 (2006-01-01), Douceur et al.
patent: 2002/0138528 (2002-09-01), Gong et al.
patent: 2003/0033288 (2003-02-01), Shanahan
patent: 2003/0037094 (2003-02-01), Douceur et al.
patent: 2003/0037181 (2003-02-01), Freed
patent: 2003/0061200 (2003-03-01), Hubert
patent: 2003/0185448 (2003-10-01), Seeger et al.
patent: 2004/0030680 (2004-02-01), Veit
patent: 2004/0133544 (2004-07-01), Klessig
patent: 2004/0133545 (2004-07-01), Klessig
patent: 2004/0133588 (2004-07-01), Klessig
patent: 2004/0133589 (2004-07-01), Klessig
patent: 2004/0205448 (2004-10-01), Grefenstette
patent: 2005/0022114 (2005-01-01), Shanahan
patent: 0881591 (1998-12-01), None
patent: 0881592 (1998-12-01), None
patent: 1284461 (2003-02-01), None
patent: 0881591(B1) (2003-09-01), None
patent: 2001175807 (2001-06-01), None
patent: WO 96/27155 (1996-09-01), None
patent: WO 98/42098 (1998-09-01), None
patent: WO 99/05618 (1999-04-01), None
patent: WO 99/39286 (1999-05-01), None
patent: WO 01/20596 (2001-03-01), None
patent: WO 01/57711 (2001-09-01), None
patent: WO 02/41170 (2002-05-01), None
patent: WO 2005/062192 (2005-07-01), None
Figa, E., et al., “Lexical Inference Mechanisms for Text Understanding and Classification,” 2003, Proceedings of the 66th ASIST Annual Meeting, Humanizing Information Technology: From Ideas' to Bits and Back, ASIST 2003, Information Today, Inc., pp. 165-173, Medford, NJ, USA.
Embley, D.W., et al., “Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages,” Nov. 1999, Data & Knowledge Engineering, vol. 31, No. 3, pp. 227-251, Elsevier, Netherlands.
Embley, D.W., et al., “A Conceptual-Modeling Approach to Extracting Data from the Web,” 1998,. Conceptual Modeling—ER'98, 17th International Conference on Conceptual Modeling, Proceedings pp. 78-91, Springer-Verlag, Berlin, Germany.
Bartal, “Probabilistic Approximation of Metric Spaces and Its Algorithmic Applications,” 1996, In: FOCS Proceedings of the 37th Annual Symposium on Foundations of Computer Science. Washington DC, IEEE, Abstract, pp. 2-3, ISSN 0272-5428.
Zhang, et al., “Birch: An Efficient Data Clustering Method for Very Large Databases,” 1996, In: ACM Sigmod Record, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, New York: ACM Press, vol. 25, Issue 2, pp. 103-114, ISSN 0163-5808.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for document clustering and document... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for document clustering and document..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for document clustering and document... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4006916

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.