Batch automated blocking and record matching

Data processing: database and file management or data structures – Data integrity – Data cleansing – data scrubbing – and deleting duplicates

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

07899796

ABSTRACT:
Batch, or “offline”, blocking takes a set of records and generates sets (or blocks, hence the name blocking) of potentially matching records for the entire set. The blocks of potential matches are then passed to a matching process to evaluate which records match. Applications include but are not limited to individual matching such as student identification, householding, business matching, supply chain matching, financial matching, news or text matching, and other applications.

REFERENCES:
patent: 5497486 (1996-03-01), Stolfo et al.
patent: 5514534 (1996-05-01), Chuah et al.
patent: 5668987 (1997-09-01), Schneider
patent: 5717915 (1998-02-01), Stolfo et al.
patent: 5802518 (1998-09-01), Karaev et al.
patent: 5819291 (1998-10-01), Haimowitz et al.
patent: 5960430 (1999-09-01), Haimowitz et al.
patent: 5970482 (1999-10-01), Pham et al.
patent: 5991758 (1999-11-01), Ellard
patent: 6092034 (2000-07-01), McCarley et al.
patent: 6336117 (2002-01-01), Massarani
patent: 6438741 (2002-08-01), Al-omari et al.
patent: 6578056 (2003-06-01), Lamburt
patent: 6598042 (2003-07-01), Kienan
patent: 6804662 (2004-10-01), Annau et al.
patent: 7290019 (2007-10-01), Bjorner et al.
patent: 2004/0172393 (2004-09-01), Kazi et al.
patent: 2008/0065630 (2008-03-01), Luo et al.
A.J. Lait and Brian Randell. An assessment of name matching algorithms. Unpublished. Sep. 1995. Available from the author at Brian.Randell@newcastle.ac.uk.
Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S., Join Synopses for Approximate Query Answering, In Proceedings of the ACM SIGMOD Conference, pp. 275-286. ACM Press, 1999. # 287.
Aoki, P., “Algorithms for Index-Assisted Selectivity Estimation,” in ICDE, p. 258, 1999. # 288.
Arellano, M.G. et al., “A probabilistic approach to the patient identification problem,” Proceedings of the Fifth Annual Symposium on Computer Applications in Medical Care (Nov. 1981).
Arellano, M.G., “An implementation of a two-population Fellegi-Sunter probability model,” U.S. Dept. of the Treasure, Publication 1200 (Dec. 1985).
Arellano, M.G., “Assessing the significance of a match,” unpublished. Advanced Linkage Technologies of America, Inc. (Jan. 17, 1995).
Arellano, M.G., “Issues in identification and linkage of patient records across an integrated delivery system,” Journal of Healthcare Information Management, 12 (3) 43-52 (Fall 1998).
Arellano, Max G.. An implementation of a two-population Fellegi-Sunter probability model. U.S. Department of the Treasury, Publication 1299. Dec. 1985.
Poosala, V., Ioannidis, Y., “Selectivity Estimation Without the Attribute Value Independence,” Proceedings of VLDB, Athens Greece, pp. 486-495, Aug. 1997. # 308.
Barbara, D., DuMouchel, W., Faloutsos, C., Haas, P., Hellerstein, J., loannidis, Y., Jagadish, H., Johnson, T., Ng, R., Poosala, V., Ross, K., Sevcik, K., “The New Jersey Data Reduction Report, Data Engineering Bulletin,” 20 (4), 1997. #289.
Belin, T.R. and Rubin, D.B., “A Method for Calibrating False-Match Rates in Record Linkage,” Journal of The American Statistical Association, vol. 90, Issue 430, 697-707 (Jun. 1995).
Berger, A.L., et al., “A comparison of criteria for maximum entropy/minimum divergence feature selection,” in Proceedings of the Third Conference on Empirical Methods in Natural Language Processing, N. Ide and A. Boutilainen, Eds., The Association for Computational Linguistics, pp. 97-106 (Jun. 1988).
Berger, A.L., et al., “A maximum entropy approach to natural language processing,” Computational Linguistics 22(1):39-71 (1996).
Berger, Adam L., Stephen A. Della Pietra, Vincent J. Della Pietra. “An entropy maximum approach to natural language processing.” Computational Linguistics, 22(1):39-71 (1996).
Bertolazzi, P. et al., “Supporting Trusted Data Exchanges in Cooperative Information Systems,” pp. 1-37, Rome, Italy.
Bitton, D., DeWitt, D, Duplicate Record Elimination in Large Data Files. ACM Transactions on Database Systems 8 (1983), No. 2, 255-265. #312.
Borthwick, “The ChoiceMaker 2 Record Matching System,” ChoiceMaker Technologies, Inc., pp. 1-7, (Nov. 1994).
Borthwick, A., “A Maximum Entropy Approach to Named Entity Recognition,” PhD thesis, New York University, Available from the NYU Computer Science Dept. Library or at http://cs.nyu.edu/cs/projects/proteus/publication/index.html (1999).
Caruso, F., et al., “Telcordia's Database Reconciliation and Data Quality Analysis Tool,” Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt (2000).
Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K., “Approximate Query Processing Using Wavelets,” In Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 111-122, 2000. # 290.
Chen, C., Roussopoulos, N., Adaptive Selectivity Estimation Using Query Feedback in Proceedings of the 1994 ACM SIGMOD International Conference on the Management of Data, pp. 161-172. ACM Press, 1994. # 291.
Chen, Ming-Syan et al., “Data Mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, No. 6 1041-4347/96; 866-883, (Dec. 1996).
Christen, P., Churches, T, .“Febrl—Freely extensible biomedical record linkage,” Release 0.2 edition, Apr. 2003. # 283.
Christen, P., “Probabilistic Name and Address Cleaning and Standardisation,”.
Christodoulakis, S., “Estimating Block Transfers and Join Sizes,” In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 40-54, San Jose, California, May 1983. # 313.
Christodoulakis, S., “Estimating Record Selectivities,” Information System, 8(2): 105-115, 1983. # 322.
Cochinwala, Kurien, Lalk, Shasha, Citations: Efficient Data Reconciliation- , hppt://citeseer.nj.nec.com/context/2036079/0 (printed Nov. 17, 2003).
Cochinwala, M., Dalal S., Elmagarmid A., and Verykios, V., “Record Matching: Past, Present and Future,” Submitted to ACM Computing Surveys (2003) # 286.
Cochinwala, M., et al., “Efficient data reconciliation” Information Sciences 137 (2001) 1-15, Telcordia Technologies Inc. Published by Elsevier Science Inc.
Cochinwala, M., et al., “Arroyo: An Efficient Data Cleaning Tool,” pp. 1-21 (Nov. 4, 1997).
Cohen, W., Richman, J., “Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration,” in SIGKDD'02, 6 pages (2002) # 278.
Cohen, W., “Whirl: A Word-based Information Representation Language,” Artificial Intelligence, 118:163-196, (2000). # 282.
Cohen, W.W., “Integration of heterogeneous databases without common domains using queries based on textual similarity,” Proceedings of the ACM SIGMOD Conference on Data Management (1998).
Cohen, William W., “Some practical observations on integration of Web information,” WebDB '99. 1999.
Copas, J.B., and Hilton, F.J., “Record Linkage: Statistical Models for Matching Computer Records,” Journal of the Royal Statistical Society, Series A (Statistics in Society), vol. 153, Issue 3, 287-320 (1990).
Crystal, M.R. et al., Studies in Data Annotation Effectiveness, Proceedings of the DARPA BRoadcase News Workshop (HUB-4) (Feb. 1999).
Della Pietra, S., et al., “Inducing features of random fields,” Technical Report CMU-CS-95-144, Carnegie Mellon University (1995).
Desjardins, M., Getoor, L., Koller, D., “Using Feature Hierarchies in Bayesian Network Learning” (Extended Abstract), Lecture Notes in Artificial Intelligence, 1864, (2000). # 292.
Dey, D., Mookerjee, V. , “A Sequential Technique for Efficient Record Linkage,” Submitted to Operations Research Journal, (2000). Not included—no longer accessible.
Dey, D., Sarkar, S., De, P., “Entity Matching in Heterogeneous Databases: A Distance Based Decision Model,” P

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Batch automated blocking and record matching does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Batch automated blocking and record matching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Batch automated blocking and record matching will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2639806

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.