Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2004-03-22
2009-10-13
LeRoux, Etienne P (Department: 2161)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
07603370
ABSTRACT:
A method detects similar objects in a collection of such objects by modification of a previous method in such a way that per-object memory requirements are reduced while false detections are avoided approximately as well as in the previous method. The modification includes (i) combining k samples of features into s supersamples, the value of k being reduced from the corresponding value used in the previous method; (ii) recording each supersample to b bits of precision, the value of b being reduced from the corresponding value used in the previous method; and (iii) requiring l matching supersamples in order to conclude that the two objects are sufficiently similar, the value of l being greater than the corresponding value required in the previous method. One application of the invention is in association with a web search engine query service to determine clusters of query results that are near-duplicate documents.
REFERENCES:
patent: 5721788 (1998-02-01), Powell et al.
patent: 5909677 (1999-06-01), Broder et al.
patent: 5974481 (1999-10-01), Broder
patent: 6058410 (2000-05-01), Sharangpani
patent: 6119124 (2000-09-01), Broder et al.
patent: 6269362 (2001-07-01), Broder et al.
patent: 6349296 (2002-02-01), Broder et al.
patent: 6658423 (2003-12-01), Pugh et al.
Manasse, Mark et al., “On the Evolution of Clusters of Near-Duplicate Web Pages” Nov. 1, 2003, IEEE Computer Society, p. 1-9.
Broder. “On the Resemblance and Containment of Documents,” inProc. Compression and Complexity of Sequences,1997, pp. 21-29 (Los Alamitos, Calif.: IEEE Computer Society, 1998).
Broder et al. “Syntactic Clustering of the Web,” inProc. 6th Intl. World Wide Web Conf.,1997, pp. 391-404. Available: http://decweb.ethz.ch/WWW6/Technical/Paper205/Paper205.html (Mar. 22, 2004).
Manasse. “Finding Similar Things Quickly in Large Collections.” Available: http://research.microsoft.com/research/sv/PageTurner/similarity.htm (Mar. 22, 2004). Includes hypertext link to the following URL at which shingleprinting source code for estimating document similarity (version 1.0) may be downloaded upon acceptance of Microsoft Research End User License Agreement: http://research.microsoft.com/research/downloads/download.aspx?FUID={2EECAA76-45DO-494C-B712-A7A30CCE89E9}.
U.S. Appl. No. 09/960,583, filed Sep. 21, 2001, Manasse et al.
U.S. Appl. No. 10/055,586, filed Jan. 22, 2002, Bar-Yossef et al.
LeRoux Etienne P
Microsoft Corporation
Stace Brent
Woodcock & Washburn LLP
LandOfFree
Method for duplicate detection and suppression does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for duplicate detection and suppression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for duplicate detection and suppression will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4069716