Data processing: database and file management or data structures – Database and file access – Preparing data for information retrieval
Reexamination Certificate
2007-05-25
2011-10-25
Wu, Yicun (Department: 2159)
Data processing: database and file management or data structures
Database and file access
Preparing data for information retrieval
C707S750000
Reexamination Certificate
active
08046372
ABSTRACT:
A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.
REFERENCES:
patent: 4849898 (1989-07-01), Adi
patent: 5062074 (1991-10-01), Kleinberger
patent: 5261112 (1993-11-01), Futatsugi
patent: 5666442 (1997-09-01), Wheeler
patent: 5835892 (1998-11-01), Kanno
patent: 5960383 (1999-09-01), Fleischer
patent: 6038561 (2000-03-01), Snyder et al.
patent: 6075896 (2000-06-01), Tanaka
patent: 6076086 (2000-06-01), Masuichi
patent: 6167398 (2000-12-01), Wyard et al.
patent: 6173251 (2001-01-01), Ito
patent: 6263121 (2001-07-01), Melen et al.
patent: 6484168 (2002-11-01), Pennock et al.
patent: 6606744 (2003-08-01), Mikurak
patent: 6810376 (2004-10-01), Guan
patent: 6961721 (2005-11-01), Chaudhuri
patent: 7113943 (2006-09-01), Bradford et al.
patent: 7155427 (2006-12-01), Prothia et al.
patent: 7346839 (2008-03-01), Acharya
patent: 7386441 (2008-06-01), Kempe
patent: 7426507 (2008-09-01), Patterson
patent: 7529756 (2009-05-01), Haschart
patent: 7562088 (2009-07-01), Daga
patent: 7567959 (2009-07-01), Patterson
patent: 7599914 (2009-10-01), Patterson
patent: 7599930 (2009-10-01), Burns et al.
patent: 7603345 (2009-10-01), Patterson
patent: 7668887 (2010-02-01), Vella
patent: 2002/0016787 (2002-02-01), Kanno
patent: 2003/0065658 (2003-04-01), Matsubayashi et al.
patent: 2003/0101177 (2003-05-01), Matsubayashi et al.
patent: 2005/0276479 (2005-12-01), Goldberg et al.
patent: 2006/0112128 (2006-05-01), Brants
patent: 2006/0282415 (2006-12-01), Shibata
patent: 1 380 966 (2004-01-01), None
Ghahrmani, Z., and K.A. Heller, “Bayesian Sets,” Advances in Neural Information Processing Systems 18 (2006), 8 pages.
“Google Sets,” ©2007 Google, /labs.google.com/sets> [retrieved Feb. 13, 2008].
Bilenko, M., et al., “Adaptive Name Matching in Information Integration,” IEEE Intelligent Systems 18(5):16-23, Sep./Oct. 2003.
Kilgarriff, A., “Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity Between Corpora,” Information Technology Research Institute Technical Report Series, ITRI-97-07, University of Brighton, U.K., Aug. 1997, 16 pages.
Ramos, J., “Using TF-IDF to Determine Word Relevance in Document Queries,” Proceedings of the First Instructional Conference on Machine Learning (iCML-2003), Piscataway, N.J., Dec. 3-8, 2003, 4 pages.
Emery Grant M.
Manoharan Aswath
Mohan Vijai
Terra Egidio
Thirumalai Srikanth
Amazon Technologies Inc.
Kowert Robert C.
Mamillapalli Pavan
Meyertons Hood Kivlin Kowert & Goetzel P.C.
Wu Yicun
LandOfFree
Duplicate entry detection system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Duplicate entry detection system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Duplicate entry detection system and method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4297164