Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-10-30
2001-10-30
Vu, Kim (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06311179
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of data mining, and more particularly, a novel data mining system and search methodology for generating associations among items in a large database.
2. Discussion of the Prior Art
The problem of finding association rules was introduced in a reference entitled “Mining Association Rules Between Sets of Items in Very Large Databases,” Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207-216, 1993 authored by Agrawal R., Imielinski T., and Swami A. The problem identified in the reference was directed to finding the relationships between different items in a large database, e.g., a database containing customer transactions. Such information can be used for many sales purposes such as target marketing, because the buying patterns of consumers can be inferred from one another.
As described in the above-mentioned reference, there is first identified a set {I} comprising all items in the database of transactions. A transaction {T} which is a subset of {I} is defined to be a set of items which are bought together in one operation. An association rule between a set of items {X} which is a subset of {I} and another set {Y} which is also a subset of {I} is expressed as {X}=>{Y}, and indicates that the presence of the items X in the transaction also indicates a strong possibility of the presence of the set of items Y. The measures used to indicate the strength of an association rule are support and confidence. The support of the rule X=>Y is the fraction of the transactions containing both X and Y. The confidence of the rule X=>Y is the fraction of the transactions containing X which also contain Y. In the association rule problem, it is desired to find all rules above a minimum level of support and confidence. The primary concept behind most association rule algorithms is a two phase procedure: In the first phase, all frequent itemsets (or large itemsets) are found. An itemset is “frequent” or large if it satisfies a user-defined minimum support requirement. The second phase uses these frequent itemsets in order to generate all the rules which satisfy the user specified minimum confidence.
Since its initial formulation, considerable research effort has been devoted to the association rule problem. A number of algorithms for large itemset generation have been proposed, such as those found in Agrawal R., Mannila H., Srikant R., Toivonen H., and Verkamo A. I. “Fast Discovery of Association Rules.” Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Chapter 12, pages 307-328. Proceedings of the 20th International Conference on Very Large Data Bases, pages 478-499, 1994. and Brin S., Motwani R. Ullman J. D. and Tsur S., “Dynamic Itemset Counting and Implication Rules for Market Basket Data.” Proceedings of the ACM SIGMOD, 1997. pages 255-264. Variations of association rules such as generalized association rules, quantitative association rules and multilevel association rules have been studied in Srikant R., and Agrawal R., “Mining Generalized Association Rules.” Proceedings of the 21st International Conference on Very Large Data Bases, 1995, pages 407-419, and, Srikant R., and Agrawal R. “Mining Quantitative Association Rules in Large Relational Tables,” Proceedings of the ACM SIGMOD Conference on Management of Data, 1996, pages 1-12.
Although there are many previously proposed methods and systems, there is no efficient method which can generate large itemsets for very large scale problems. For these problems, current techniques require too much time to be of any practical use. The importance of solving such large scale problems is quite great, given the fact that most databases containing customer transaction data are quite large.
SUMMARY OF THE INVENTION
The present invention is directed to a system and method for generating large database itemsets for very large scale problems. The system particularly employs the use of a lexicographic tree structural representation of large itemsets in the database that is particularly adapted for handling large scale problems.
According to the principles of the invention there is provided a system and method for automatically generating associations of items included in a database. A user first specifies a support criteria indicating a strength of desired associations of items contained in the said database. Then, a recursive or non-recursive program is executed for generating a hierarchical tree structure comprising one or more levels of database itemsets, with each itemset representing item associations determined to have satisfied the specified support criteria. The recursive program includes steps of characterizing nodes of the tree structure as being either active and enabling generation of new nodes at a new level of the tree, or inactive, at any given time; enabling traversal of the tree structure in a predetermined manner and projecting each of the transactions included in the database onto currently active nodes of the tree structure to generate projected transaction sets; and counting the support of the candidate extensions of nodes to determine whether the further itemsets satisfy the specified support criteria. All itemsets meeting the specified support criteria are added to the tree structure at a new level.
Advantageously, by projecting transactions upon the lexicographic tree structure, the CPU time for counting large itemsets is substantially reduced.
REFERENCES:
patent: 4554631 (1985-11-01), Reddington
patent: 5615341 (1997-03-01), Agrawal et al.
patent: 5758147 (1998-05-01), Chen et al.
patent: 5758353 (1998-05-01), Marquis
patent: 5842200 (1998-11-01), Agrawal et al.
patent: 5920855 (1999-07-01), Aggarwal et al.
patent: 5930805 (1999-07-01), Marquis
patent: 5987468 (1999-11-01), Singh et al.
patent: 6006225 (1999-12-01), Bowman et al.
patent: 6092064 (2000-07-01), Aggarwal et al.
patent: 6092065 (2000-07-01), Floratos et al.
patent: 6108004 (2000-08-01), Medl
patent: 6108686 (2000-08-01), Floratos et al.
patent: 6138117 (2000-10-01), Bayardo
“Efficient Algorithms for Discovering Association Rules” by Heikki Mannila, et al., pp. 181-192, University of Helsinki, Department of Computer Science, Helsinki, Finland, 1994.
“Clustering Association Rules” by Brian Lent, et al., pp. 220-231, Department of Computer Science, Stanford University, Stanford, CA, Apr. 1997.
“Mining Association Rules between Sets of Items in Large Databases” by Rakesh Agrawal, et al., pp. 207-216, IBM Almaden Research Center, San Jose, CA, May 1993.
“Fast Algorithms for Mining Association Rules”, by Rakesh Agrawal, et al., pp. 487-499, IBM Almaden Research Center, San Jose, CA, 1994.
“An Efficient Algorithm for Mining Association Rules in Large Databases” by Ashok Savasere, et al., pp. 432-444, College of Computing, Georgia Institute of Technology, Atlanta, GA, 1995.
“Mining Generalized Association Rules” by Ramakrishnan Srikant, et al., pp. 406-419, Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switerland, Sep. 1995.
“An Effective Hash-Based Algorithm for Mining Association Rules” by John Soo Park, et al., pp. 175-186, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, 1995.
“Sampling Large Databases for Association Rules:” by Hannu Toivonen, pp. 134-145, University of Helsinki, Department of Computer Science, Helsinki, Finland, 1996.
“Mining Quantitative Association Rules in Large Relational Tables” by Ramakrishnan Srikant, et al.,, pp. 1-12, IBM Almaden Research Center, San Jose, CA, 1996.
“Dynamic Itemset Counting and Implication Rules for Market Basket Data” by Sergey Brin, et al., pp. 255-264, Department of Computer Science, Stanford University, Stanford, CA, 1997.
“Efficiently Mining Long Patterns from Databases” by Roberto J. Bayardo, Jr., pp. 85-93, IBM Almaden Research Center, San Jose, CA, 1998.
Agarwal Ramesh C.
Aggarwal Charu C.
Prasad V. V. V.
International Business Machines - Corporation
Ly Anh
Scully Scott Murphy & Presser
Vu Kim
Zarick, Esq. Gail H.
LandOfFree
System and method of generating associations does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method of generating associations, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method of generating associations will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2616459