Method for duplicate detection and suppression

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

07603370

ABSTRACT:
A method detects similar objects in a collection of such objects by modification of a previous method in such a way that per-object memory requirements are reduced while false detections are avoided approximately as well as in the previous method. The modification includes (i) combining k samples of features into s supersamples, the value of k being reduced from the corresponding value used in the previous method; (ii) recording each supersample to b bits of precision, the value of b being reduced from the corresponding value used in the previous method; and (iii) requiring l matching supersamples in order to conclude that the two objects are sufficiently similar, the value of l being greater than the corresponding value required in the previous method. One application of the invention is in association with a web search engine query service to determine clusters of query results that are near-duplicate documents.

REFERENCES:
patent: 5721788 (1998-02-01), Powell et al.
patent: 5909677 (1999-06-01), Broder et al.
patent: 5974481 (1999-10-01), Broder
patent: 6058410 (2000-05-01), Sharangpani
patent: 6119124 (2000-09-01), Broder et al.
patent: 6269362 (2001-07-01), Broder et al.
patent: 6349296 (2002-02-01), Broder et al.
patent: 6658423 (2003-12-01), Pugh et al.
Manasse, Mark et al., “On the Evolution of Clusters of Near-Duplicate Web Pages” Nov. 1, 2003, IEEE Computer Society, p. 1-9.
Broder. “On the Resemblance and Containment of Documents,” inProc. Compression and Complexity of Sequences,1997, pp. 21-29 (Los Alamitos, Calif.: IEEE Computer Society, 1998).
Broder et al. “Syntactic Clustering of the Web,” inProc. 6th Intl. World Wide Web Conf.,1997, pp. 391-404. Available: http://decweb.ethz.ch/WWW6/Technical/Paper205/Paper205.html (Mar. 22, 2004).
Manasse. “Finding Similar Things Quickly in Large Collections.” Available: http://research.microsoft.com/research/sv/PageTurner/similarity.htm (Mar. 22, 2004). Includes hypertext link to the following URL at which shingleprinting source code for estimating document similarity (version 1.0) may be downloaded upon acceptance of Microsoft Research End User License Agreement: http://research.microsoft.com/research/downloads/download.aspx?FUID={2EECAA76-45DO-494C-B712-A7A30CCE89E9}.
U.S. Appl. No. 09/960,583, filed Sep. 21, 2001, Manasse et al.
U.S. Appl. No. 10/055,586, filed Jan. 22, 2002, Bar-Yossef et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for duplicate detection and suppression does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for duplicate detection and suppression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for duplicate detection and suppression will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4069716

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.