Designing record matching queries utilizing examples

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

07634464

ABSTRACT:
The subject disclosure pertains to a powerful and flexible framework for record matching. The framework facilitates design of a record matching query or package composed of a set of well-defined primitive operators (e.g., relational, data cleaning . . . ), which can ultimately be executed to match records. To assist design of such packages, a learning technique based on examples is provided. More specifically, a set of matching and non-matching record pairs can be input and employed to facilitate automatic package generation. A generated package can subsequently be transformed manually and/or automatically into a semantically equivalent form optimized for execution.

REFERENCES:
patent: 6047284 (2000-04-01), Owens et al.
patent: 6449609 (2002-09-01), Witkowski
patent: 6618727 (2003-09-01), Wheeler et al.
patent: 6721754 (2004-04-01), Hurst et al.
patent: 6792414 (2004-09-01), Chaudhuri et al.
patent: 6795819 (2004-09-01), Wheeler et al.
patent: 6912549 (2005-06-01), Rotter et al.
patent: 6961721 (2005-11-01), Chaudhuri et al.
patent: 6965888 (2005-11-01), Cesare et al.
patent: 7007017 (2006-02-01), Bergholz et al.
patent: 7296011 (2007-11-01), Chaudhuri et al.
patent: 7370057 (2008-05-01), Burdick et al.
patent: 2004/0019593 (2004-01-01), Borthwick et al.
patent: 2004/0148287 (2004-07-01), Manion et al.
patent: 2004/0181526 (2004-09-01), Burdick et al.
patent: 2004/0249789 (2004-12-01), Kapoor et al.
patent: 2004/0260694 (2004-12-01), Chaudhuri et al.
patent: 2005/0027717 (2005-02-01), Koudas et al.
patent: 2005/0097150 (2005-05-01), McKeon et al.
patent: 2005/0144163 (2005-06-01), Tang et al.
patent: 2005/0154615 (2005-07-01), Rotter et al.
patent: 2005/0256740 (2005-11-01), Kohan et al.
patent: 2005/0278357 (2005-12-01), Brown et al.
patent: 2006/0031189 (2006-02-01), Muras et al.
patent: 2304387 (2000-04-01), None
Muralidhar Krishnaprasad, et al. Query Rewrite for XML in Oracle XML DB. Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004. 12 pages.
Raghu Ramakrishnan, et al. SRQL: Sorted Relational Query Language. 1998 IEEE. Published in the Proceedings of SSDBM'98, Jul. 1-3, 1998 in Capri, Italy. 12 pages.
Ingolf Geist, et al. Combining a Formal with an Example-driven Approach for Data Integration. http://mordor. prakinf.tu-ilmenau.de/papers/sattler/fdbs01.pdf. Last accessed Apr. 11, 2006. 19 pages.
Soumen Chakrabarti, et al. Distributed Hypertext Resource Discovery Through Examples. Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999. 12 pages.
Agichtein, et al “Mining Reference Tables for Automatic Text Segmentation”, In Proceedings of ACM SIGKDD, 2004. 10 pages.
S. Argamon-Engelson, et al.“Committee-Based Sample Selection for Probabilistic Classifiers”, Journal of Artificial Intelligence research, 1999. 26 pages.
M. Bilenko, et al. “Riddle: Repository of Information on Duplicate Detection, Record Linkage, and Identity Uncertainty”. http://www.cs.utexas.edu/users/ml/riddle. 1 pg. Last Accessed: Jun. 19, 2006.
M. Bilenko, et al. “Adaptive Duplicate Detection Using Learnable String Similarity Measures”, In Proceedings of ACM SIGKDD, 2003, 10 pages.
V. Borkar, et al. “Automatic Segmentation of Text Into Structured Records”, In Proceedings of ACM SIGMOD, 2001. 12 pages.
L. Breiman, et al. “Classification and Regression Trees”, Wadsworth, 1984. pp. 1-50.
L. Breiman, et al. “Classification and Regression Trees”, Wadsworth, 1984. pp. 51-111.
L. Breiman, et al. “Classification and Regression Trees”, Wadsworth, 1984. pp. 112-172.
L. Breiman, et al. “Classification and Regression Trees”, Wadsworth, 1984. pp. 173-233.
L. Breiman, et al. “Classification and Regression Trees”, Wadsworth, 1984. pp. 234-294.
L. Breiman, et al. “Classification and Regression Trees”, Wadsworth, 1984. pp. 295-358.
S. Chaudhuri, et al. “Robust and Efficient Fuzzy Match for Online Data Cleaning”, In Proceedings of ACM SIGMOD, 2003. 12 pages.
S. Chaudhuri, et al. “A primitive operator for similarity joins in data cleaning”, In Proceedings of ICDE, 2006. 12 pages.
W. Cohen, “Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity”, In Proceedings of ACM SIGMOD, 1998. 12 pages.
W. Cohen, et al. “Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration”, In Proceedings of ACM SIGKDD, 2002. 6 pages.
W. Cohen, “Data Integration Using Similarity Joins and a Word-Based Information Representation Language”, ACM Transactions on information systems, 2000. 34 pages.
I. P. Felligi, et al. “A Theory For Record Llnkage”, Journal of the American Statistical Society, 1969. 29 pages.
H. Galhardas, et al. “Declarative Data Cleaning: Language, Model, and Algorithms”, In Proceedings of VLDB, 2001. 10 pages.
L. Gravano, et al. “Approximate String Joins in a Database (almost) for Free”, In Proceedings of VLDB, 2001. 10 pages.
D. Haussler, “Quantifying Inductive Bias: Ai Learning Algorithms and Valiant's Learning Framework”, Artificial Intelligence, 1988. 45 pages.
M. Hernandez, et al. “The Merge/Purge Problem for Large Databases”, In Proceedings of ACM SIGMOD, 1995. 12 pages.
J. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers Inc., 1993. pp. 1-55.
J. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers Inc., 1993. pp. 56-121.
J. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers Inc., 1993. pp. 122-184.
J. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers Inc., 1993. pp. 185-247.
J. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers Inc., 1993. pp. 248-302.
V. Raman, “Potter's Wheel: An Interactive Data Cleaning System”, In Proceedings of VLDB, 2001. 10 pages.
S. Sarawagi, et al. “Interactive Deduplication Using Active Learning”, In Proceedings of ACM SIGKDD, 2002. 10 pages.
S. Sarawagi, et al. “Efficient Set Joins on Similarity Predicates”, In Proceedings of ACM SIGMOD, 2004. 12 pages.
A. Simitsis, et al. “Optimizing ETL Processes in Data Warehouse” In Proceedings of ICDE, 2005. 12 pages.
S Tejada,et al. “Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification”, In Proceedings of ACM SIGKDD, 2002. 10 pages.
S. Tejada, et al. “Learning Object Identification Rules for Information Integration”, Information Systems, 2001. 29 pages.
T. Software. www.trilliumsoft.com/trilliumsoft.nsf. 1 pg. Last Accessed: Jun. 19, 2006.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Designing record matching queries utilizing examples does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Designing record matching queries utilizing examples, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Designing record matching queries utilizing examples will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4119964

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.