Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2008-07-29
2008-07-29
Al-Hashemi, Sana (Department: 2164)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
07406479
ABSTRACT:
A set similarity join system and method are provided. The system can be employed to facilitate data cleaning based on similarities through the identification of “close” tuples (e.g., records and/or rows). “Closeness” can be is evaluated using a similarity function(s) chosen to suit the domain and/or application. Thus, the system facilitates generic domain-independent data cleansing.The system can be employed with a foundational primitive, the set similarity join (SSJoin) operator, which can be used as a building block to implement a broad variety of notions of similarity (e.g., edit similarity, Jaccard similarity, generalized edit similarity, hamming distance, soundex, etc.) as well as similarity based on co-occurrences. The SSJoin operator can exploit the observation that set overlap can be used effectively to support a variety of similarity functions. The SSJoin operator compares values based on “sets” associated with (or explicitly constructed for) each one of them.
REFERENCES:
patent: 2004/0260694 (2004-12-01), Chaudhuri et al.
patent: 2005/0055321 (2005-03-01), Fratkina et al.
patent: 2005/0262044 (2005-11-01), Chaudhuri et al.
patent: 2006/0179052 (2006-08-01), Pauws et al.
Ananthakrishna, et al. “Eliminating Fuzzy Duplicates in Data Warehouses” Proceedings if the 28th VLDB Conference, Hong Kong, China (2002) 12 pages.
Chatziantoniou, et al. “Querying Multiple Features of Groups in Relational Databases” Proceedings of the 22nd VLDB Conference Mumbai(Bombay), India (1996) pp. 295-306.
Chatziantoniou, et al. “Groupwise Processing of Relational Queries” Proceedings of the 23rd VLDB Conference Athens, Greece (1997) pp. 476-485.
Chaudhuri, et al. “Robust and Efficient Fuzzy Match for Online Data Cleaning” SIGMOD San Diego, California (Jun. 9-12, 2003) 12 pages.
Cohen, William W. “Data Integration Using Similarity Joins and a Word-Based Information Representation Language” ACM Transactions of Information Systems, vol. 18 No. 3 (Jul. 2000) 34 pages.
Gravano, et al. “Text Joins in an RDBMS for Web Data Integration” WWW2003 Budapest, Hungary (May 20-24, 2003) 12 pages.
Gravano, et al. Approximate String Joins in a Database (Almost) for Free) Proceedings of the 27th VLDB Conference, Rome, Italy (2001) 10 pages.
Guha, et al. “Merging the Results of Approximate Match Operations” Proceedings of the 30th VLDB Conference, Toronto, Canada (2004)pp. 636-647.
Hernandez, et al. “The Merge/Purge Problem for Large Databases” SIGMOD San Jose, California (1995) pp. 127-138.
Ramasamy, et al. “Set Containment Joins: The Good, The Bad and The Ugly” Proceedings of the 26th VLDB Conference, Cario, Egypt (2000) pp. 351-362.
Sarawagi, et al. “Efficient Set of Joins on Similarity Predicates” SIGMOD Paris, France (Jun. 13-18, 2004) 12 pages.
Chaudhuri, et al. “Robust Identification of Fuzzy Duplicates” (2004) Proceedings of the 1st ACM Workshop on Hardcopy Document Proceedings12 pages.
Felligi, et al. “A Theory for Record Linkage” (1969) American Statistical Association vol. 64, 29 pages.
Chaudhuri Surajit
Ganti Venkatesh
Shriraghav Kaushik
Al-Hashemi Sana
Amin Turocy & Calvin LLP
Microsoft Corporation
LandOfFree
Primitive operator for similarity joins in data cleaning does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Primitive operator for similarity joins in data cleaning, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Primitive operator for similarity joins in data cleaning will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2766289