Data processing: database and file management or data structures – Data integrity – Using checksum
Reexamination Certificate
2011-07-19
2011-07-19
Mofiz, Apu M (Department: 2161)
Data processing: database and file management or data structures
Data integrity
Using checksum
C707S692000, C707S741000, C707S742000, C707S747000, C707S727000, C707S728000, C707S729000, C707S730000
Reexamination Certificate
active
07984029
ABSTRACT:
In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold.
REFERENCES:
patent: 5619709 (1997-04-01), Caid
patent: 6621930 (2003-09-01), Smadja
patent: 6658423 (2003-12-01), Pugh et al.
patent: 2003/0221166 (2003-11-01), Farahat
patent: 2005/0060643 (2005-03-01), Glass et al.
patent: 2006/0294077 (2006-12-01), Bluhm et al.
“Online Duplicate Document Detection Signature Reliability in a Dynamic Retrieval Environment”, Conrad et al. Copyright 2003 ACM.
Application filed Dec. 21, 2004 (U.S. Appl. No. 11/016,928).
Application filed Dec. 21, 2004 (U.S. Appl. No. 11/016,930).
Office Action dated May 24, 2007 (U.S. Appl. No. 11/016,930).
Office Action dated Sep. 2, 2008 (U.S. Appl. No. 11/016,930).
Androutsopoulos et al., An Evaluation of Naive Bayesian Anti-Spam Filtering, Proceedings of the Workshop on Machine Learning in the New Information Age: 11th European Conference on Machine Learning (ECML 2000), G. Potamias, V. Moustakis, and M. van Someren, eds., 2000, pp. 9-17.
Bilenko et al., Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases, Tech. Rep. A1 02-296, Artificial Intelligence Lab, University of Texas at Austin, 2002, pp. 1-19.
Breiman, Bagging Predictors, Machine Learning, 24 (1996), pp. 123-140.
Brin et al., Detection Mechanisms for Digital Documents, Proceeding of SIGMOD, 1995, pp. 398-409.
Broder, on the Resemblance and Containment of Documents, SEQS: Sequences '97, 1998, pp. 21-29.
Broder et al., Syntactic Clustering of the Web, Computer Networks and ISDN Systems 29, 1997, pp. 1157-1166.
Buckley et al., The Smart/Empire Tipster IR System, Proceedings—Tipster Text Program Phase III, 2000, pp. 107-121.
Chowdhury et al., Collection Statistics for Fast Duplicate Document Detection, ACM Transactions on Information Systems, 20 (2002), pp. 171-191.
Cooper et al., A Novel Method for Detecting Similar Documents, Proceedings of the 35th Hawaii International Conference on System Sciences, 2002.
Graham-Cummings, The Spammers' Compendium, Proceedings of the Spam Conference, Jan. 17, 2003, pp. 1-17.
Gionis et al., Similarity Search in High Dimensions Via Hashing, Proceedings of the 25th International Conference on Very Large Databases (VLDB), 1999, pp. 518-529.
Fetterly et al., On the Evolution of Clusters of Near-Duplicate Web Pages, Proceedings of the First Latin American Web Congress, 2003, pp. 37-45.
Fawcett, “In Vivo” Spam Filtering: A Challenge Problem for KDD, SIGKDD Explorations, vol. 5, Issue 2, (2003), pp. 140-148.
Drucker et al., Support Vector Machines for Spam Categorization, IEEE Transactions on Neural Networks, vol. 10, No. 5, Sep. 1999, pp. 1048-1054.
Hall, A Countermeasure to Duplicate-Detecting Anti-Spam Techniques, AT&T Labs Technical Report 99.9.1, AT&T Corp., 1999, pp. 1-26.
Haveliwala et al., Scalable Techniques for Clustering the Web, Proceedings of WebDB 2000, 2000.
Heintze, Scalable Document Fingerprinting, The USENIX Association, Proceedings of the Second USENIX Workshop on Electronic Commerce, Nov. 1996, pp. 191-200.
Hernandez et al., The Merge/Purge Problem for Large Databases, Proceedings of the SIGMOD Conference, 1995, pp. 127-138.
Hoad et al., Methods for Identifying, Versioned and Plagiarised Documents, Journal of the American Society for Information Science and Technology, 2002, pp. 203-215.
Ilyinsky et al., An Efficient Method to Detect Duplicates of Web Documents With the Use of Inverted Index, Proceedings of the Eleventh International World Wide Web Conference, 2002.
Kleinberg, Bursty and Hierarchical Structure in Streams, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002, pp. 1-25.
Kolcz et al., Data Duplication: An Imbalance Problem 2, Proceedings of the ICML '2003 Workshop on Learning from Imbalanced Datasets (11), 2003.
Kolcz et al., SVM-Based Filtering of E-Mail Spam With Content-Specific Misclassification Costs, Proceedings of the Workshop on Text Mining (TextDM'2001), 2001, pp. 1-14.
Kwok, A New Method of Weighting Query Terms for AD-HOC Retrieval, Computer Science Department, Queens College, City University of New York, Flushing NY.
McCallum et al., Efficient Clustering of High-Dimensional Data Sets With Application to Reference Matching, Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), 2000.
Robertson et al., Okapi At Trec-7: Automatic AD HOC, Filtering, VLC and Interactive, Proceedings of the 7th Text Retrieval Conference, 1998, pp. 253-264.
Sahami et al., A Bayesian Approach to Filtering Junk E-Mail, Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, 1998.
Salton et al., A Vector-Space Model for Information Retrieval, Communications of the ACM, vol. 18, No. 11, Nov. 1975, 613-620.
Sanderson et al., Duplicate Detection in, The Reuters Collection, Tech. Rep. TR-1997-5, Department of Computing Science, University of Glasgow, 1997, pp. 11.
Shivakumar et al., Finding Near-Replicas of Documents on the Web, WEBDB: International Workshop on the World Wide Web and Databases, WebDB, LNCS, 1999.
Singhal et al., Pivoted Document Length Normalization, Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996.
Winkler et al., The State of Record Linkage and Current Research Problems, Tech. Rep., Statistical Research Division, U.S. Bureau of Census, Washington, DC, 1999.
Androutsopoulos et al., Learning to Filter Unsolicited Commercial E-Mail. Technical Report Feb. 2004, NCSR Demokritos, 2004, pp. 1-52.
Baker et al., Distributional Clustering of Words for Text Classification, Proceedings of SIGIR-98, 21st ACM international Conference on Research and Development in Information Retrieval, 1998, pp. 96-103.
Carreras et al., Boosting Trees for Anti-Spam Email Filtering, Proceedings of RANLP-01, 4th International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, BG, 2001.
Slonim et al., The Power of Word Clusters for Text Classification, 23rd European Colloquium on Information Retrieval Research, 2001, pp. 1-12.
Yerazunis, Sparse Binary Polynomial Hashing and the CRM114 Discriminator, MIT Spam Conference, 2003.
Zhou et al., Approximate Object Location and Spam Filtering on Peer-To-Peer Systems, Proceedings of ACM/IFIP/USENIX International Middleware Conference (Middleware 2003), 2003, pp. 1-20.
Office Action dated Sep. 2, 2008 (U.S. Appl. No. 11/016,928).
Alspector Joshua
Chowdhury Abdur R.
Kolcz Aleksander
AOL Inc.
Finnegan Henderson Farabow Garrett & Dunner L.L.P.
Mofiz Apu M
Nguyen Cindy
LandOfFree
Reliability of duplicate document detection algorithms does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Reliability of duplicate document detection algorithms, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reliability of duplicate document detection algorithms will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2672815