Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2006-06-29
2008-10-07
Alam, Shahid Al (Department: 2167)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
07433869
ABSTRACT:
A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint, One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.
REFERENCES:
patent: 5237157 (1993-08-01), Kaplan
patent: 5247575 (1993-09-01), Sprague et al.
patent: 5532920 (1996-07-01), Hatrick et al.
patent: 5546528 (1996-08-01), Johnson
patent: 5619247 (1997-04-01), Russo
patent: 5625711 (1997-04-01), Nicholson
patent: 5643064 (1997-07-01), Grinderslev
patent: 5680479 (1997-10-01), Wang et al.
patent: 5729637 (1998-03-01), Nicholson
patent: 5737599 (1998-04-01), Rowe
patent: 5781785 (1998-07-01), Rowe
patent: 5819301 (1998-10-01), Rowe
patent: 5832530 (1998-11-01), Paknad
patent: 5848184 (1998-12-01), Taylor et al.
patent: 5860074 (1999-01-01), Rowe
patent: 5930813 (1999-07-01), Padgeett
patent: 5991780 (1999-11-01), Rivette et al.
patent: 5999649 (1999-12-01), Nicholson
patent: 6041316 (2000-03-01), Allen
patent: 6049339 (2000-04-01), Schiller
patent: 6119124 (2000-09-01), Broder et al.
patent: 6185684 (2001-02-01), Pravetz
patent: 6282653 (2001-08-01), Berstis et al.
patent: 6327600 (2001-12-01), Satoh et al.
patent: 6345279 (2002-02-01), Li
patent: 6356936 (2002-03-01), Donoho
patent: 6385350 (2002-05-01), Nicholson
patent: 6389541 (2002-05-01), Patterson
patent: 6446068 (2002-09-01), Kortge
patent: 6516337 (2003-02-01), Tripp
patent: 6606613 (2003-08-01), Altschuler et al.
patent: 6629097 (2003-09-01), Keith
patent: 6732090 (2004-05-01), Shanahan
patent: 6920610 (2005-07-01), Lawton et al.
patent: 6988124 (2006-01-01), Douceur et al.
patent: 2002/0138528 (2002-09-01), Gong et al.
patent: 2003/0033288 (2003-02-01), Shanahan
patent: 2003/0037094 (2003-02-01), Douceur et al.
patent: 2003/0037181 (2003-02-01), Freed
patent: 2003/0061200 (2003-03-01), Hubert
patent: 2003/0185448 (2003-10-01), Seeger et al.
patent: 2004/0030680 (2004-02-01), Veit
patent: 2004/0133544 (2004-07-01), Klessig
patent: 2004/0133545 (2004-07-01), Klessig
patent: 2004/0133588 (2004-07-01), Klessig
patent: 2004/0133589 (2004-07-01), Klessig
patent: 2004/0205448 (2004-10-01), Grefenstette
patent: 2005/0022114 (2005-01-01), Shanahan
patent: 0881591 (1998-12-01), None
patent: 0881592 (1998-12-01), None
patent: 1284461 (2003-02-01), None
patent: 0881591(B1) (2003-09-01), None
patent: 2001175807 (2001-06-01), None
patent: WO 96/27155 (1996-09-01), None
patent: WO 98/42098 (1998-09-01), None
patent: WO 99/05618 (1999-04-01), None
patent: WO 99/39286 (1999-05-01), None
patent: WO 01/20596 (2001-03-01), None
patent: WO 01/57711 (2001-09-01), None
patent: WO 02/41170 (2002-05-01), None
patent: WO 2005/062192 (2005-07-01), None
Figa, E., et al., “Lexical Inference Mechanisms for Text Understanding and Classification,” 2003, Proceedings of the 66th ASIST Annual Meeting, Humanizing Information Technology: From Ideas' to Bits and Back, ASIST 2003, Information Today, Inc., pp. 165-173, Medford, NJ, USA.
Embley, D.W., et al., “Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages,” Nov. 1999, Data & Knowledge Engineering, vol. 31, No. 3, pp. 227-251, Elsevier, Netherlands.
Embley, D.W., et al., “A Conceptual-Modeling Approach to Extracting Data from the Web,” 1998,. Conceptual Modeling—ER'98, 17th International Conference on Conceptual Modeling, Proceedings pp. 78-91, Springer-Verlag, Berlin, Germany.
Bartal, “Probabilistic Approximation of Metric Spaces and Its Algorithmic Applications,” 1996, In: FOCS Proceedings of the 37th Annual Symposium on Foundations of Computer Science. Washington DC, IEEE, Abstract, pp. 2-3, ISSN 0272-5428.
Zhang, et al., “Birch: An Efficient Data Clustering Method for Very Large Databases,” 1996, In: ACM Sigmod Record, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, New York: ACM Press, vol. 25, Issue 2, pp. 103-114, ISSN 0163-5808.
Al Alam Shahid
Alvesteffer Jason L
Ebrary, Inc.
Glenn Michael A.
Glenn Patent Group
LandOfFree
Method and apparatus for document clustering and document... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for document clustering and document..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for document clustering and document... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4006916