Information retrieval systems with duplicate document...

Data processing: database and file management or data structures – Data integrity – Data cleansing – data scrubbing – and deleting duplicates

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S729000, C707S730000

Reexamination Certificate

active

07809695

ABSTRACT:
Many companies provide online search facilities that enable users to conduct computerized searches for documents. Unfortunately, these searches frequently provide results that include duplicate documents—that is, documents that are completely or substantially identical to each other. This problem is particularly vexing when searching news stories, for example. Moreover, the duplicate documents are intermixed in the search results, leaving users to manually manage the complexities of identifying and/or filtering them. Accordingly, the present inventors devised systems, methods, and software that facilitate the identification and/or grouping of duplicate documents in search results. One exemplary system includes a signature generation module which generates document signatures based on length, temporal, and/or content components; a real-time duplicate detection module which uses the document signatures to identify “exact” or “fuzzy” duplicate documents; and a user-interface or presentation module which controls how duplicate documents are presented or suppressed in search results.

REFERENCES:
patent: 5488725 (1996-01-01), Turtle et al.
patent: 5826260 (1998-10-01), Byrd et al.
patent: 5913208 (1999-06-01), Brown et al.
patent: 6138113 (2000-10-01), Dean et al.
patent: 6654739 (2003-11-01), Apte et al.
patent: 6658423 (2003-12-01), Pugh et al.
patent: 6757675 (2004-06-01), Aiken et al.
patent: 6785669 (2004-08-01), Aggarwal et al.
patent: 6978419 (2005-12-01), Kantrowitz
patent: 7013310 (2006-03-01), Messing et al.
patent: 7139756 (2006-11-01), Cooper et al.
patent: 7264274 (2007-09-01), Ridgway et al.
patent: 2002/0161788 (2002-10-01), McDonald
patent: 2002/0174101 (2002-11-01), Fernley et al.
patent: 2003/0172063 (2003-09-01), Gutta et al.
patent: 2004/0093323 (2004-05-01), Bluhm et al.
patent: 2005/0060643 (2005-03-01), Glass et al.
patent: 0513652 (1992-11-01), None
patent: WO-03075181 (2003-09-01), None
patent: WO-2006/023941 (2006-03-01), None
“International Search Report for corresponding PCT Application No. PCT/US2005/030024”, (Dec. 16, 2005),4 pgs.
“Secure Hash Standard”,Federal Information Processing Standards Publication 180-1, U.S. Department of Commerce/National Institute of Standards and Technology, (Apr. 17, 1995), 18 pgs/.
Brin, S., et al., “Copy Detection Mechanisms for Digital Documents”,Proceedings of the Special Interest Group on Management of Data(SIGMOD '95), (May, 1995), 398-409.
Brin, S., et al., “The Anatomy of a Large-Scale Hypertextual Web Search Engine”Proceedings of the Seventh International World Wide Web Conference(WWW7 '98), (Apr. 1998), 107-117.
Broder, A. Z., et al., “Syntactic Clustering of the Web”,Proceedings of the Sixth International World Wide Web Conference(WWW6 '97), (Apr. 1997), 1157-1166.
Burgin, R., “Variations in Relevance Judgments and the Evaluation of Retrieval Performance”,Information Processing and Management, 26(5), (1992), 619-627.
Callan, J., et al., “Query-Based Sampling of Text”,ACM Transactions on Information Systems(TOIS), 19(2), (Apr. 2001), 97-130.
Carletta, J., “Assessing Agreement on Classification Tasks: The Kappa Statistic”,Computational Linguistics, 22(2), (1996), 249-254.
Chowdhury, A., et al., “Collection Statistics for Fast Duplicate Document Detection”,ACM Transactions on Information Systems(TOIS), 20(2), (Apr. 2002), 171-191.
Cleverdon, C. D., “The Effect of Variations in Relevance Assessments in Comparative Experimental Tests of Index Languages”,Technical Report, Cranfield Library Report No. 3, Cranfield Institute of Technology, Cranfield, UK, (Oct. 1970), 53 pgs.
Conrad, J. G., et al., “Constructing a Text Corpus for Inexact Duplicate Detection”,SIGIR '04, Jul. 25-29, 2004; Sheffield, South Yorkshire, UK, (Jul. 2004), 2 pgs.
Conrad, J. G., et al., “Managing Déjà Vu: Collection Building for the Identification of Non-Identical Duplicate Documents”,CIKM '04, Nov. 8-13, 2004; Washington, DC,(Nov. 2004),9 pgs/.
Conrad, J. G., et al., “Online Duplicate Document Detection: Signature Reliability in a Dynamic Retrieval Environment”,Proceedings of the 12th International Conference on Information and Knowledge Management(CIKM'03), (Nov. 2003), 443-452.
Cooper, J. W., et al., “Detecting Similar Documents Using Salient Terms”,Proceedings of the 11th International Conference on Information and Knowledge Management(CIKM '02), (Nov. 2002), 245-251.
Cormack, G. V., et al., “Efficient Construction of Large Test Collections”,Proceedings of the 21st International Conference on Research and Development in Information Retrieval(SIGIR '98), (Aug. 1998), 282-289.
Frieder, O., et al., “Efficiency Considerations for Scalable Information Retrieval Servers”,Journal of Digital Information, 1(5), (Jan. 2000), 1-26.
Frieder, O. , et al., “On Scalable Information Retrieval Systems”,Computer society, Proceedings of the Second IEEE International Symposium on Network Computing and Application(NCA'03), (Apr. 16, 2003), 241-245.
Harter, S. P., “Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness”,Journal of the American Society for Information Science, 47(1), (1996), 37-49.
Hawking, D., et al., “Overview of TREC-8 Web Track”,The Eighth Text Retrieval Conference(TREC 8), (Feb. 2000), 131-148.
Heintze, N., et al., “Scalable Document Fingerprinting (Extended Abstract)”,Proceedings of the Second USENIX Electronic Commerce Workshop, (Nov. 1996), 191-200.
Hersh, W., et al., “OHSUMED: An Interactive Retrieval Evaluation and New Large Test Collection for Research”,Proceedings of the 17th International Conference on Research and Development in Information Retrieval(SIGIR '94), (Jul. 1994), 192-201.
Hoppenbrouwers, J., et al., “Invading the Fortress: How to Besiege Reinforced Information Bunkers”,IEEE Proceedings, Advances in Digital Libraries, (May 22, 2000), 27-35.
Jones, K. S., et al., “Information Retrieval Test Collections”,Journal of Documentation, 32(1), (Mar. 1976), 59-75.
Jones, K. S., et al., “Report on the Need for and Provision of an ‘Ideal’ Information Retrieval Test Collection”,British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, (1975), 1-42.
Manber, U., “Finding Similar Files in a Large File System”,USENIX Winter 1994 Technical Conference Proceedings(USENIX '94), (Jan. 1994), 1-10.
Marcu, D., “The Automatic Construction of Large-Scale Corpora for Summarization Research”,Proceedings of the 22nd International Conference on Research and Development in Information Retrieval(SIGIR '99), (Aug. 1999), 137-144.
Miller, C. , et al., “Detecting Duplicates: A Researcher's Dream Come True”,Online, 14(4), (Jul. 1990), 27-34.
Mishra, R. K., et al., “KhojYantra: An Integrated MetaSearch Engine with Classification, Clustering and Ranking”,Database Engineering and Applications, Symposium 2000, (Sep. 18, 2000), 122-131.
Mitchell, T. M., “Contents”,Machine Learning, WCB/ McGraw-Hill, (1997), 9 pgs.
Moroney, M. J.,Facts from Figures, Harmondsworth [Eng.] ; Baltimore : Penguin Books, 3rd Edition, (1956), 334-370.
Park, S. T., et al., “Analysis of Lexical Signatures for Fnding Lost or Related Documents”,Proceedings of the 25th International Conference on Research and Development in Information Retrieval(SIGIR '02), (Aug. 2002), 11-18.
Phelps, T. A., et al., “Robust Hyperlinks: Cheap, Everywhere, Now”,Proceedings of the 8th International Conference on Digital Documents and Electronic Publishing(DDEP '00), (Sep. 2000), 28-43.
Press, W. H., et al.,Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, New York, NY, 2nd Edition, (1992), 504-510.
Rose, T., et al., “The

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Information retrieval systems with duplicate document... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Information retrieval systems with duplicate document..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information retrieval systems with duplicate document... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4229280

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.