System and method for detecting duplicate and similar documents

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

07139756

ABSTRACT:
A system and a method are described for rapidly determining document similarity among a set of documents, such as a set of documents obtained from an information retrieval (IR) system. A ranked list of the most important terms in each document is obtained using a phrase recognizer system. The list is stored in a database and is used to compute document similarity with a simple database query. If the number of terms found to not be contained in both documents is less than some predetermined threshold compared to the total number of terms in the document, these documents are determined to be very similar. It is shown that these techniques may be employed to accurately recognize that documents, that have been revised to contain parts of other documents, are still closely related to the original document. These teachings further provide for the computation of a document signature that can then be used to make a rapid comparison between documents that are likely to be identical.

REFERENCES:
patent: 4993068 (1991-02-01), Piosenka et al.
patent: 5913208 (1999-06-01), Brown et al.
patent: 6263348 (2001-07-01), Kathrow et al.
patent: 6615209 (2003-09-01), Gomes et al.
patent: 6658423 (2003-12-01), Pugh et al.
Cooper, J.W., “The Technology of Lexical Navigation,” Workshop on Browsing Technology, First Joint conference on Digital Libraries, Roanoke, VA, 2001.
Cooper, J.W. et al, “OBIWAN—A Visual Interface for Prompted Query Refinement,” HICCS-31; 1998.
Ravin, Y. et al, “Extracting Names from Natural-Language Text,” IBM Research Report 20338; Jan. 16, 1996.
Cooper, J.W. et al, “Lexical navigation: Visually Prompted Query Expansion and Refinement,” IBM Research Report 20874; May 1, 1997.
Neff, M.S. et al., “A Document Summarization for Active Markup” Proceedings of the 32ndHawaii International Conference on System Sciences, Jan. 1999.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for detecting duplicate and similar documents does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for detecting duplicate and similar documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for detecting duplicate and similar documents will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3700971

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.