Transformation-based framework for record matching

Data processing: database and file management or data structures – Database and file access – Record – file – and data search and comparisons

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

08032546

ABSTRACT:
A transformation-based record matching technique. The technique provides a flexible way to account for synonyms and more general forms of string equivalences when performing record matching by taking as explicit input user-defined transformation rules (such as, for example, the fact that “Robert” and “Bob” that are synonymous). The input string and user-defined transformation rules are used to generate a larger set of strings which are used when performing record matching. Both the input string and data elements in a database can be transformed using the user-defined transformation rules in order to generate a larger set of potential record matches. These potential record matches can then be subjected to a threshold test in order to determine one or more best matches. Additionally, signature-based similarity functions are used to improve the computational efficiency of the technique.

REFERENCES:
patent: 6101492 (2000-08-01), Jacquemin et al.
patent: 6374241 (2002-04-01), Lamburt et al.
patent: 6654717 (2003-11-01), Loofbourrow et al.
patent: 6938053 (2005-08-01), Jaro
patent: 6961721 (2005-11-01), Chaudhuri et al.
patent: 7155427 (2006-12-01), Prothia et al.
patent: 7287019 (2007-10-01), Kapoor et al.
patent: 2004/0172393 (2004-09-01), Kazi et al.
patent: 2004/0181527 (2004-09-01), Burdick et al.
patent: 2005/0027717 (2005-02-01), Koudas et al.
patent: 2007/0192342 (2007-08-01), Shriraghav et al.
patent: 2010/0107055 (2010-04-01), Orelind et al.
Winkler., “Matching and Record Linkage”, US Bureau of Census, pp. 1-38, 1995.
Minton., et al, “A Heterogeneous Field Matching Method for Record Linkage”, Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM '05), 2005, pp. 1-8.
Martins., et al, “Semantic Similarity Match for Data Quality” Faculty of Sciences, University of Lisbon, Portugal, Nov. 2007, pp. 2-16.
Bilenko., et al, “Adaptive Duplicate Detection Using Learnable String Similarity Measures”, Proceedings of Ninth ACM SIGKDD International conference on Knowledge Discovery and Data Mining (KDD-2003), WA,Aug. 2003, pp. 39-48.
Chaudhuri, et al., “Robust and efficient fuzzy match for online data cleaning”, Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data, Jun. 2003, pp. 313-324.
Elmagarmid, et al.,“Duplicate record detection: A survey”, IEEE Trans. on Knowledge and Data Engg., vol. 19, No. 1, Jan. 2007, pp. 1-16.
United States Postal Service, http://www.usps.com/, Apr. 22, 2008.
“Wikipedia” http://en.wikipedia.org/, Apr. 22, 2008.
“DBLP”, http://www.informatik.uni-trier.de/˜ley/db/index.html, Apr. 22, 2008.
“RIDDLE: Repository of Information on Duplicate Detection, Record Linkage, and Identity Uncertainty”, http://www.cs.utexas.edu/users/ml/riddle, Apr. 24, 2008.
Winkler, “The state of record linkage and current research problems”, US Bureau of Census, 1999. 15 pages.
“Trillium Software”, www.trilliumsoft.com/trilliumsoft.nsf, Apr. 22, 2008.
Needleman, et al., “A general method applicable to the search for similarities in the amino acid sequences of two proteins”, Journal of Molecular Biology, vol. 48, Mar. 1970, pp. 443-453.
Salton, et al., “Term-weighting approaches in automatic text retrieval”, Information Processing and Management, vol. 24, Jan. 1988, pp. 513-523.
Miller, et al., “A hidden markov model information retrieval system”, Proc. of the 22nd ACM SIGIR Conf. on Research and Development in Information Retrieval, Aug. 1999, pp. 214-221.
Gionis, et al., “Similarity search in high dimensions via hashing”, Proc. of the 25th Intl. Conf. on Very Large Data Bases, Sep. 1999, pp. 518-529.
Chaudhuri, et al., “A primitive operator for similarity joins in data cleaning”, Proc. of the 22nd Intl. Conf. on Data Engineering, Apr. 2006, pp. 1-12.
Tejada, et al., “Learning domain-independent string transformation weights for high accuracy object identification”, Proc. of the 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, Jul. 2002, pp. 350-359.
Sarawagi, et al., “Interactive deduplication using active learning”, Proc. of the 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, Jul. 2002, pp. 269-278.
Ananthakrishna, et al., “Eliminating fuzzy duplicates in data warehouses”, Proc. of the 28th Intl. Conf. on Very Large Data Bases, Aug. 2002, pp. 586-597.
Dong, et al., “Reference reconciliation in complex information spaces”, Proc. of the 2005 ACM SIGMOD Intl. Conf. on Management of Data, Jun. 2005, pp. 85-96.
Singla, et al., “Multi-relational record linkage”, MRDM, 2004, pp. 1-18.
Bhattacharya, et al., “Collective entity resolution in relational data”, IEEE Data Engineering Bulletin, vol. 29, No. 2 , 2006, pp. 4-12.
Hopcroft, et al., “Introduction to Automata Theory, Languages and Computation”, Addison Wesley, 1979, pp. 60-65.
Gravano, et al., “Approximate string joins in a database (almost) for free”, Proc. of the 27th Intl. Conf. on Very Large Data Bases, Sep. 2001, pp. 491-500.
Sarawagi, et al., “Efficient set joins on similarity predicates”, Proc. of the 2004 ACM SIGMOD Intl. Conf. on Management of Data, Jun. 2004, pp. 743-754.
Arasu, et al., “Efficient exact set-similarity joins”, Proc. of the 32nd Intl. Conf. on Very Large Data Bases, Sep. 2006, pp. 918-929.
Koudas et al., “Record Linkage: similarity measures and algorithms”, in Proc. of the 2006 ACM SIGMOD Intl. Conf. on Management of Data, Jun. 2006, pp. 802-803.
Chaudhuri et al.,“Example-driven design of efficient record matching queries”, in Proc. of the 33rd Intl. Conf. on Very Large Data Bases, Sep. 23-28, 2007, pp. 1-12.
Bansal et al., “Correlation clustering”, Mach. Learn., vol. 56, No. 1-3, 2004, pp. 89-113.
Chandel et al., “Benchmarking declarative approximate selection predicates”, in Proc. of the 2007 ACM SIGMOD Intl. Conf. on Management of Data, Jun. 2007, pp. 353-364.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Transformation-based framework for record matching does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Transformation-based framework for record matching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Transformation-based framework for record matching will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4255989

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.