Automated database blocking and record matching

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

07152060

ABSTRACT:
An automated blocking technique is used as a first step to find approximate matches in a database. The technique builds a blocking set to be as liberal as possible in retrieving records that match on individual fields or sets of fields while avoiding selection criteria that are predicted to return more than the maximum number of records defining a particular special requirement. The ability to do blocking without extensive manual setup at low cost is highly advantageous especially when using a machine learning based second-stage matching algorithm.

REFERENCES:
patent: 5497486 (1996-03-01), Stolfo et al.
patent: 5515534 (1996-05-01), Chuah et al.
patent: 5668987 (1997-09-01), Schneider
patent: 5717915 (1998-02-01), Stolfo et al.
patent: 5802518 (1998-09-01), Karaev et al.
patent: 5819291 (1998-10-01), Haimowitz et al.
patent: 5960430 (1999-09-01), Haimowitz et al.
patent: 5970482 (1999-10-01), Pham et al.
patent: 6336117 (2002-01-01), Massarani
patent: 6438741 (2002-08-01), Al-omari et al.
patent: 6523019 (2003-02-01), Borthwick
patent: 6598042 (2003-07-01), Kienan
patent: 6804662 (2004-10-01), Annau et al.
Article entitled “Record Linkage Software” by System Resources Cooperation et al., published in Nov. 18, 1999 (pp. 1-61).
Aboulnaga, et al., “Self-tuning Histograms: Building Histograms Without Looking at Data” (1998).
Baxter, et al., A Comparison of Fast Blocking Methods for Record Linkage,First Workshop on Data Cleansing, Record Linkage, and Object Consolidation(Aug. 2003).
Bruno, et al., “Exploiting Statistics on Query Expressions for Optimization,”ACM SIGMOD 2002, pp. 263-274 (Madison, Wisconsin Jun. 2002).
Bruno, et al., “STHolds: A Multidimensional Workload-Aware Histogram,”ACM SIGMOD 2001, (Santa Barbara, CA May 2001).
Cheng, et al., “Learning Belief Networks from Data: An Information Theory Based Approach” (1998).
Cohen, et al., “Hardening Soft Information Sources,” pp. 255-259KDD '00 Boston, MA(2000).
Deshpande, et al., “Independence is Good: Dependency-Based Histogram Synopses for High-Dimensional Data,”ACM SIGMOD 2001, (Santa Barbara CA May 21-24, 2001).
Aglindo-Legaria, et al., “Statistics on Views,”Proceedings of the 29thVLDB Conference(Berlin, Germany 2003).
Gu et al., “Record Linkage: Current Practice and Future Directions,”CMIS Technical Report No. 03/ 83 (Apr. 2003).
Haas, et al., “Selectivity and Cost Estimation for Joins Based on Random Sampling,”Journal of Computer and System Sciences52, 550-569 (Academic Press 1996).
Haas, et al., “Sequential Sampling Procedures for Query Size Estimation,”Proc. of 1992 ACM SIGMOD Intl. Conf. on Management of Data, pp. 341-350 (1992).
Ioannidis, et al., “On the Propagation of Errors in the Size of Join Results,” pp. 268-277,Proc. ACM SIGMOD Intl. Conf. (May 1991).
Jagadish, et al., “Optimal Histograms with Quality Guarantees,”Proceedings of the 24thVLDB Conference, New York, USA (1998).
König et al., “Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-Size Estimation,”Proceedings of the 25thVLDB Conference, Edinburgh, Scotland (1999).
Lipton, et al., “Practical Selectivity Estimation through Adaptive Sampling,”Proc. 1990 ACM SIGMOD Intl. Conf. Management of Data, pp. 1-11 (May 7, 1992).
Mannino, et al., “Statistical Profile Estimation in Database Systems,”ACM Computing Surveys, vol. 20 (3), pp. 1920221 (Sep. 1988).
Neiling, et al., “The good into the Pot, the bad into the Crop. Preselection of Record Pairs for Database Fusion,”Proceedings of the 1stWorkshop on Database Fusion held in Magdeburg, Germany(May 3-4, 2001).
Piatetsky-Shapiro, et al., “Accurate Estimation of the Number of Tuples Satisfying a Condition,”ACM SIGMOD Intl. Conf. on the Management of Data, Boston, MA (1984).
Poosala, et al., “Improved Histograms for Selectivity Estimation of Range Predicates,” pp. 294-305 (1995).
Poosala, et al., “Selectivity Estimation Without the Attribute Value Independence Assumption,”Proc. of the 23rd Int. Conf. on Very Large Databases(Aug. 1997).
Stillger, et al., “LEO -n BS2's Learning Optimizer,”Proceedings of the 27thVLDB Conference, Roma, Italy (2001).
Winkler, “Quality of Very Large Databases,” Bureau of the Census Statistical Research Division, Statistical Research Report Series No. RR2001/04 (Jul. 25, 2001).
IBM Technical Disclosure Bulletin, “Inverting Noun-de-Noun Constructions in French-to-English Translation,” U.S., 3 pages (Oct. 1, 1994).
Verykois, V.S. and Elmagarmid, A.K., “Automating the Approximate Record Matching Process,” Computer Sciences Dept., Purdue University, West Lafayette, IN (Jun. 15, 1999).
Chen, Ming-Syan et al., “Data Mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, No. 6 (Dec. 1996).
Jaynes, E.T., “Information Theory and Statisical Mechanics,” The Physical Review, vol. 106, No. 4, 620-630 (May 15, 1957).
Getoor, L., et al., “Selectivity Estimation using Probabilistic Models,” ACM SIGMOD May 21-24, 2001 Santa Barbara, California.
Winkler, W.E., “Record Linkage and Machine Learning,” Virginia Tech (Nov. 1, 2001.
Belin, T.R. and Rubin, D.B., “A Method for Calibrating False-Match Rates in Record Linkage,” Journal of the American Statistical Association, vol. 90, Issue 430, 697-707 (Jun. 1995).
Copas, J.B., and Hilton, F.J., “Record Linkage: Statistical Models for Matching Computer Records,” Journal of the Royal Statistical Society, Series A (Statistics in Society), vol. 153, Issue 3, 287-320 (1990).
Lait, A.J. and Randell, B., “An Assessment of Names Matching Algorithms,” Dept. of Computing Science, University of Newcastle upon Tyne, unpublished (Sep. 1995).
Porter, E.H. and Winkler, W.E., “Approximate String Comparison and its Effect on an Advanced Record Linkage System,”.
Winkler, W.E., “On Dykstra's Iterative Fitting Procedure,” Annals of Probability, vol. 18, issue 3, 1410-1415 (Jul. 1990).
“Record Linkage Techniques—1997,” Proceedings of an International Workshop and Exposition Arlington, VA (Mar. 20-21, 1997).
Record Linkage Workshop: Bibliography of Background Papers, U.S. Census Bureau, http://www.census.gov.srd/www/reclink/biblio.html (Jul. 18, 2002).
Recommendations, The Matching Group, Administrative Records Subcommittee, Federal Committee on Statistical Methodology (May 1985).
RLESearch—Probabilistic record linking engine, heep:/
edinfo.nih.gov/docs/RLESearch.htm(Jul. 18, 2002).
Caruso, F., et al., “Telcordia's Database Reconciliation and Data Quality Analysis Tool,” Proceedings of the 26thInternational Conference on Very Large Databases, Cairo, Egypt (2000).
Cochinwala, M., et al., “Efficient data reconciliation” Information Sciences 137 (2001) 1-15, Telcordia Technologies Inc. Published by Elsevier Science Inc.
Citations: Efficient Data Reconciliation—Cochinwala, Kurien, Lalk, Shasha, hppt://citeseer.nj.nec.com/context/2036079/0, printed Nov. 17, 2003.
Efficient Clustering of High-Dimensional Data Sets with Application to Reference MatchingMcCallum, A., Nigam, K., Ungar, L. In KDD-00, 2000.
Learning to Match and Cluster Large High-Dimensional Data Sets For Data IntegrationCohen, W., Richman, J. In SIGKDD'02, 2002.
Record Matching: Past, Present and FutureCochinwala, M., Dalal S., Elmagarmid A., and Verykios, V. Submitted to ACM Computing Surveys, 2003.
Join Synopses for Approximate Query AnsweringAcharya, S., Gibbons, P., Poosala, V., Ramaswamy, S. In Proceedings of the ACM SIGMOD Conference, pp. 275-286. ACM Press, 1999.
Algorithms for Index-Assisted Selectivity EstimationAoki, P. In ICDE, p. 258, 1999.
The New Jersey Data Reduction ReportBarbara, D., DuMouchel, W., Falout

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Automated database blocking and record matching does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Automated database blocking and record matching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automated database blocking and record matching will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3708742

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.