System for similar document detection

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

07660819

ABSTRACT:
A document is compared to the documents in a document collection using a hash algorithm and collection statistics to detect if the document is similar to any of the documents in the document collection.

REFERENCES:
patent: 6240409 (2001-05-01), Aiken
patent: 6349296 (2002-02-01), Broder et al.
patent: 6493709 (2002-12-01), Aiken
patent: 6547829 (2003-04-01), Meyerzon et al.
patent: 6594665 (2003-07-01), Sowa et al.
S. Lawrence and C. L. Giles, “Accessibility of Information on the Web,”Nature, vol. 400, Jul. 8, 1999.
http:/
ccam.nih.gov, The National Institutes of Health (NIH), National Center for Complementary and Alternative Medicine (NCCAM), Apr. 12, 2000.
A.Z. Broder, S.C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Sixth International World Wide Web Conference, Apr. 1997.
A.Z. Broder, S.C. Glassman, M. S. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Sixth International World Wide Web Conference, Jul. 1997.
S. Brin, J. Davis, and H. Garcia-Molina, “Copy Detection Mechanisms for Digital Documents,”Proceedings of the ACM SIGMOD Annual Conference, May 1995.
N. Shivakumar and H. Garcia-Molina, “Finding Near-Replicas of Documents on the Web,”Proceedings of Workshop on Web Databases(WebDB'98), Mar. 1998.
N. Shivakumar and H. Garcia-Molina, “SCAM: A Copy Detection Mechanism for Digital Documents,”Proceedings of the Second International Conference in Theory and Practice of Digital Libraries, Jun. 1995.
N. Shivakumar and H. Garcia-Molina, “Building a Scalable and Accurate Copy Detection Mechanism,”Proceedings of Third International Conference on Theory and Practice of Digital Libraries, Mar. 1996.
N. Heintze, “Scalable Document Fingerprinting,”Proceedings of the Second USENIX Workshop on Electronic Commerce, 1996.
C. Buckley, C. Cardie, S. Mardis, M. Mitra, D. Pierce, K. Wagstaff, and J. Walz, “The Smart/Empire TIPSTER IR System,”TIPSTER Phase III Proceedings, Morgan Kaufmann, 2000.
V. Chalana, A. Bruce, and T. Nguyen, “Duplicate Document Detection in DocBrowse,” www.statsci.com/docbrowse/paper/spie98
ode—1.htm, Jul. 31, 1999.
G. Salton, A. Wong and C.S. Yang, “A Vector Space Model for Automatic Indexing,”Comm. Of the ACM, vol. 18, No. 11, pp. 613-620, Nov. 1975.
M. F. Porter, “An Algorithm for Suffix Stripping,”Program, vol. 14, No. 3, pp. 130-137, Jul. 1980.
B. Kjell, W.A. Woods, and O. Frieder, “Discrimination of Authorship Using Visualization,”Information Processing and Management, Pergamon Press, vol. 30, No. 1, pp. 141-150, Jan. 1994.
R. S. Scotti and C. Lilly, “Analysis and Design of Test Corpora for Zero-Tolerance Government Document Review Process,” Symposium for Document Image Understanding Technology, Annapolis, Maryland, Apr. 1999.
D. Grossman, D. Holmes, and O. Frieder, “A Parallel DBMS Approach to IR in TREC-3”,Overview of the Third Text Retrieval Conference(TREC-3), Nov. 1994.
A. F. Smeaton, F. Kelledy, and G. Quinn, “Ad Hoc Retrieval Using Thresholds, WSTs for French Monolingual Retrieval, Document-at-a-Glance for High Precision and Triphone Windows for Spoken Documents,”Proceedings of the Sixth Text Retrieval Conference(TREC-6), p. 461, 1997.
U.S. Department of Commerce, National Institute of Standards and Technology, “Secure Hash Standard,” Federal Information Processing Standards Publication FIPS PUB 180-1, Apr. 17, 1995.
U.S. Department of Commerce, National Institute of Standards and Technology, “Secure Hash Standard,” Federal Information Processing Standards Publication FIPS PUB 180, May 11, 1993.
Ronald L. Rivest, “The MD4 Message Digest Algorithm,”Proceedings of Advances in Cyrptology- CRYPTO '90, Springer- Verlag, pp. 303-311, 1991.
D. A. Grossman, D. O. Holmes, O. Frieder, M. D. Nguyen and C. E. Kingsbury, “Improving Accuracy and Run-Time Performance for TREC-4”,Overview of the Fourth Text Retrieval Conference (TREC-4), Nov. 1995.
Broder A. Z.: “On the Resemblance and Containment of Documents,” Compression and Complexity of Sequences 1997. Jun. 11-13, 1997, pp. 21-29.
Broder A. Z.: “Syntactic Clustering of the Web” Computer Networks and ISDN Systems, vol. 29, No. 8-13, Sep. 1, 1997, pp. 1157-1166.
Brin et al.: “Copy Detection Mechanisms For Digital Documents,” SIGMOD Record, Association for Computing Machinery, vol. 24, No. 2, Jun. 1, 1995, pp., 398-409.
Manber U: “Finding Similar Files in a Large File System,” Proceedings of the Winter USENIX Conference, pp. 1-10, Jan. 17, 1994.
Shivakumar et al.: “Finding Near-Replicas of Documents on the Web,” International Workshop WEBDB, 1997, pp. 204-212.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for similar document detection does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for similar document detection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for similar document detection will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4151277

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.