Data processing: database and file management or data structures – Data integrity – Data cleansing – data scrubbing – and deleting duplicates
Reexamination Certificate
2009-01-21
2011-11-08
Vital, Pierre (Department: 2156)
Data processing: database and file management or data structures
Data integrity
Data cleansing, data scrubbing, and deleting duplicates
Reexamination Certificate
active
08055633
ABSTRACT:
A method of duplicate detection for data items in a stream of data items, the method comprising the steps of: receiving a data item from the stream of data items; applying at least two different hashing algorithms to the data item to generate hash keys that identify elements in a first bloom filter data structure having a plurality of elements; checking a state of each of the identified elements to determine if the data item is a potential duplicate, the determination depending on whether the identified elements are indicated as having been also identified for a previous data item received from the stream; and in response to the determination that the data item is a potential duplicate, checking an index of hash keys to determine if at least one of the generated hash keys exists in the index to identify the data item as an actual duplicate.
REFERENCES:
patent: 6804667 (2004-10-01), Martin
patent: 6988124 (2006-01-01), Douceur et al.
patent: 2003/0037022 (2003-02-01), Adya et al.
patent: 2008/0154852 (2008-06-01), Beyer et al.
Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors, Jul. 1970, ACM, vol. 13 No. 7, pp. 422-427.
Deng, Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters, Jun. 29, 2006, SIGMOD 2006, pp. 25-36.
International Business Machines - Corporation
Liao Jason
Mims Jr. David A.
Rodriguez Sylvia
Vital Pierre
LandOfFree
Method, system and computer program product for duplicate... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method, system and computer program product for duplicate..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, system and computer program product for duplicate... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4262807