Data processing: database and file management or data structures – Database and file access – Preparing data for information retrieval
Reexamination Certificate
2007-05-25
2010-10-12
Trujillo, James (Department: 2159)
Data processing: database and file management or data structures
Database and file access
Preparing data for information retrieval
C707S750000
Reexamination Certificate
active
07814107
ABSTRACT:
A system and method for determining the likelihood of two documents describing substantially similar subject matter is presented. A set of tokens for each of two documents is obtained, each set representing strings of characters found in the corresponding document. A matrix of token pairs is determined, each token pair comprising a token from each set of tokens. For each token pair in the matrix, a similarity score is determined. Those token pairs in the matrix with a similarity score above a threshold score are selected and added to a set of matched tokens. A similarity score for the two documents is determined according to the scores of the token pairs added to the set of matched tokens. The determined similarity score is provided as the likelihood that the first and second documents describing substantially similar subject matter.
REFERENCES:
patent: 4849898 (1989-07-01), Adi
patent: 5062074 (1991-10-01), Kleinberger
patent: 5261112 (1993-11-01), Futatsugi et al.
patent: 5835892 (1998-11-01), Kanno
patent: 5960383 (1999-09-01), Fleischer
patent: 6038561 (2000-03-01), Snyder
patent: 6075896 (2000-06-01), Tanaka
patent: 6076086 (2000-06-01), Masuichi et al.
patent: 6167398 (2000-12-01), Wyard
patent: 6173251 (2001-01-01), Ito et al.
patent: 6263121 (2001-07-01), Melen
patent: 6606744 (2003-08-01), Mikurak
patent: 6810376 (2004-10-01), Guan et al.
patent: 6961721 (2005-11-01), Chaudhuri
patent: 7113943 (2006-09-01), Bradford
patent: 7346839 (2008-03-01), Acharya
patent: 7386441 (2008-06-01), Kempe et al.
patent: 7426507 (2008-09-01), Patterson
patent: 7529756 (2009-05-01), Haschart
patent: 7562088 (2009-07-01), Daga
patent: 7567959 (2009-07-01), Patterson
patent: 7599914 (2009-10-01), Patterson
patent: 7603345 (2009-10-01), Patterson
patent: 2002/0016787 (2002-02-01), Kanno
patent: 2003/0065658 (2003-04-01), Matsubayashi
patent: 2003/0101177 (2003-05-01), Matsubayashi et al.
patent: 2006/0112128 (2006-05-01), Brants et al.
patent: 2006/0282415 (2006-12-01), Shibata et al.
patent: 1380966 (2004-01-01), None
Ghahramani, Z., and K.A. Heller, “Bayesian Sets,” in Y. Weiss et al. (eds.), “Advances in Neural Information Processing Systems 18 (Proceedings of the 2005 Conference),” MIT Press, May 2006, 8 pages.
“Google™ Sets,” © 2007 Google, <://labs.google.com/sets> [retrieved Feb. 13, 2008].
Bilenko, M., et al., “Adaptive Name Matching in Information Integration,” IEEE Intelligent Systems 18(5): Sep./Oct. 16-23, 2003.
Kilgarriff, A., “Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity Between Corpora,” Information Technology Research Institute Technical Report Series, ITRI-97-07, University of Brighton, U.K., Aug. 1997, 16 pages.
Ramos, J., “Using TF-IDF to Determine Word Relevance in Document Queries,” Proceedings of the First Instructional Conference on Machine Learning (iCML-2003), Piscataway, N.J. Dec. 3-8, 2003, 4 pages.
Emery Grant M.
Manoharan Aswath
Mohan Vijai
Terra Egidio
Thirumalai Srikanth
Amazon Technologies Inc.
Christensen O'Connor Johnson & Kindness PLLC
Mamillapalli Pavan
Trujillo James
LandOfFree
Generating similarity scores for matching non-identical data... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Generating similarity scores for matching non-identical data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Generating similarity scores for matching non-identical data... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4176913