Duplicate entry detection system and method

Data processing: database and file management or data structures – Database and file access – Preparing data for information retrieval

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S750000

Reexamination Certificate

active

08046372

ABSTRACT:
A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.

REFERENCES:
patent: 4849898 (1989-07-01), Adi
patent: 5062074 (1991-10-01), Kleinberger
patent: 5261112 (1993-11-01), Futatsugi
patent: 5666442 (1997-09-01), Wheeler
patent: 5835892 (1998-11-01), Kanno
patent: 5960383 (1999-09-01), Fleischer
patent: 6038561 (2000-03-01), Snyder et al.
patent: 6075896 (2000-06-01), Tanaka
patent: 6076086 (2000-06-01), Masuichi
patent: 6167398 (2000-12-01), Wyard et al.
patent: 6173251 (2001-01-01), Ito
patent: 6263121 (2001-07-01), Melen et al.
patent: 6484168 (2002-11-01), Pennock et al.
patent: 6606744 (2003-08-01), Mikurak
patent: 6810376 (2004-10-01), Guan
patent: 6961721 (2005-11-01), Chaudhuri
patent: 7113943 (2006-09-01), Bradford et al.
patent: 7155427 (2006-12-01), Prothia et al.
patent: 7346839 (2008-03-01), Acharya
patent: 7386441 (2008-06-01), Kempe
patent: 7426507 (2008-09-01), Patterson
patent: 7529756 (2009-05-01), Haschart
patent: 7562088 (2009-07-01), Daga
patent: 7567959 (2009-07-01), Patterson
patent: 7599914 (2009-10-01), Patterson
patent: 7599930 (2009-10-01), Burns et al.
patent: 7603345 (2009-10-01), Patterson
patent: 7668887 (2010-02-01), Vella
patent: 2002/0016787 (2002-02-01), Kanno
patent: 2003/0065658 (2003-04-01), Matsubayashi et al.
patent: 2003/0101177 (2003-05-01), Matsubayashi et al.
patent: 2005/0276479 (2005-12-01), Goldberg et al.
patent: 2006/0112128 (2006-05-01), Brants
patent: 2006/0282415 (2006-12-01), Shibata
patent: 1 380 966 (2004-01-01), None
Ghahrmani, Z., and K.A. Heller, “Bayesian Sets,” Advances in Neural Information Processing Systems 18 (2006), 8 pages.
“Google Sets,” ©2007 Google, /labs.google.com/sets> [retrieved Feb. 13, 2008].
Bilenko, M., et al., “Adaptive Name Matching in Information Integration,” IEEE Intelligent Systems 18(5):16-23, Sep./Oct. 2003.
Kilgarriff, A., “Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity Between Corpora,” Information Technology Research Institute Technical Report Series, ITRI-97-07, University of Brighton, U.K., Aug. 1997, 16 pages.
Ramos, J., “Using TF-IDF to Determine Word Relevance in Document Queries,” Proceedings of the First Instructional Conference on Machine Learning (iCML-2003), Piscataway, N.J., Dec. 3-8, 2003, 4 pages.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Duplicate entry detection system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Duplicate entry detection system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Duplicate entry detection system and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4297164

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.