Generating similarity scores for matching non-identical data...

Data processing: database and file management or data structures – Database and file access – Preparing data for information retrieval

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S750000

Reexamination Certificate

active

07814107

ABSTRACT:
A system and method for determining the likelihood of two documents describing substantially similar subject matter is presented. A set of tokens for each of two documents is obtained, each set representing strings of characters found in the corresponding document. A matrix of token pairs is determined, each token pair comprising a token from each set of tokens. For each token pair in the matrix, a similarity score is determined. Those token pairs in the matrix with a similarity score above a threshold score are selected and added to a set of matched tokens. A similarity score for the two documents is determined according to the scores of the token pairs added to the set of matched tokens. The determined similarity score is provided as the likelihood that the first and second documents describing substantially similar subject matter.

REFERENCES:
patent: 4849898 (1989-07-01), Adi
patent: 5062074 (1991-10-01), Kleinberger
patent: 5261112 (1993-11-01), Futatsugi et al.
patent: 5835892 (1998-11-01), Kanno
patent: 5960383 (1999-09-01), Fleischer
patent: 6038561 (2000-03-01), Snyder
patent: 6075896 (2000-06-01), Tanaka
patent: 6076086 (2000-06-01), Masuichi et al.
patent: 6167398 (2000-12-01), Wyard
patent: 6173251 (2001-01-01), Ito et al.
patent: 6263121 (2001-07-01), Melen
patent: 6606744 (2003-08-01), Mikurak
patent: 6810376 (2004-10-01), Guan et al.
patent: 6961721 (2005-11-01), Chaudhuri
patent: 7113943 (2006-09-01), Bradford
patent: 7346839 (2008-03-01), Acharya
patent: 7386441 (2008-06-01), Kempe et al.
patent: 7426507 (2008-09-01), Patterson
patent: 7529756 (2009-05-01), Haschart
patent: 7562088 (2009-07-01), Daga
patent: 7567959 (2009-07-01), Patterson
patent: 7599914 (2009-10-01), Patterson
patent: 7603345 (2009-10-01), Patterson
patent: 2002/0016787 (2002-02-01), Kanno
patent: 2003/0065658 (2003-04-01), Matsubayashi
patent: 2003/0101177 (2003-05-01), Matsubayashi et al.
patent: 2006/0112128 (2006-05-01), Brants et al.
patent: 2006/0282415 (2006-12-01), Shibata et al.
patent: 1380966 (2004-01-01), None
Ghahramani, Z., and K.A. Heller, “Bayesian Sets,” in Y. Weiss et al. (eds.), “Advances in Neural Information Processing Systems 18 (Proceedings of the 2005 Conference),” MIT Press, May 2006, 8 pages.
“Google™ Sets,” © 2007 Google, <://labs.google.com/sets> [retrieved Feb. 13, 2008].
Bilenko, M., et al., “Adaptive Name Matching in Information Integration,” IEEE Intelligent Systems 18(5): Sep./Oct. 16-23, 2003.
Kilgarriff, A., “Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity Between Corpora,” Information Technology Research Institute Technical Report Series, ITRI-97-07, University of Brighton, U.K., Aug. 1997, 16 pages.
Ramos, J., “Using TF-IDF to Determine Word Relevance in Document Queries,” Proceedings of the First Instructional Conference on Machine Learning (iCML-2003), Piscataway, N.J. Dec. 3-8, 2003, 4 pages.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Generating similarity scores for matching non-identical data... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Generating similarity scores for matching non-identical data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Generating similarity scores for matching non-identical data... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4176913

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.