Depth first method for generating itemsets

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Depth first method for generating itemsets Depth first method for generating itemsets

: 1999-02-19
: 2002-05-14
: Breene, John (Department: 2177)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: C707S793000, C707S793000, C707S793000, C706S018000, C706S048000
: Reexamination Certificate
: active
: 06389416
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of data mining, and more particularly, a novel data mining system and depth first search methodology for generating associations among items in a large database.
2. Discussion of the Prior Art
The problem of finding association rules was introduced in a reference entitled “Mining Association Rules Between Sets of Items in Very Large Databases,” Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207-216, 1993 authored by Agrawal R., Imielinski T., and Swami A. The problem identified in the reference was directed to finding the relationships between different items in a large database, e.g., a database containing customer transactions. Such information can be used for many sales purposes such as target marketing, because the buying patterns of consumers can be inferred from one another.
As described in the above-mentioned reference, there is first identified a set {I} comprising all items in the database of transactions. A transaction {T} which is a subset of {I} is defined to be a set of items which are bought together in one operation. An association rule between a set of items {X} which is a subset of {I} and another set {Y} which is also a subset of {I} is expressed as {X}→{Y}, and indicates that the presence of the items X in the transaction also indicates a strong possibility of the presence of the set of items Y. The measures used to indicate the strength of an association rule are support and confidence. The support of the rule X→Y is the fraction of the transactions containing both X and Y. The confidence of the rule X→Y is the fraction of the transactions containing X which also contain Y. In the association rule problem, it is desired to find all rules above a minimum level of support and confidence. The primary concept behind most association rule algorithms is a two phase procedure: In the first phase, all frequent itemsets (or large itemsets) are found. An itemset is “frequent” or large if it satisfies a user-defined minimum support requirement. The second phase uses these frequent itemsets in order to generate all the rules which satisfy the user specified minimum confidence.
Since its initial formulation, considerable research effort has been devoted to the association rule problem. A number of algorithms for large itemset generation have been proposed, such as those found in Agrawal R., Mannila H., Srikant R., Toivonen H., and Verkamo A. I. “Fast Discovery of Association Rules.” Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Chapter 12, pages 307-328. Proceedings of the 20th International Conference on Very Large Data Bases, pages 478-499, 1994. and Brin S., Motwani R. Ullman J. D. and Tsur S., “Dynamic Itemset Counting and Implication Rules for Market Basket Data.” Proceedings of the ACM SIGMOD, 1997. pages 255-264. Variations of association rules such as generalized association rules, quantitative association rules and multilevel association rules have been studied in Srikant R., and Agrawal R., “Mining Generalized Association Rules.” Proceedings of the 21st International Conference on Very Large Data Bases, 1995, pages 407-419, and, Srikant R., and Agrawal R. “Mining Quantitative Association Rules in Large Relational Tables,” Proceedings of the ACM SIGMOD Conference on Management of Data, 1996, pages 1-12.
Although there are many previously proposed methods and systems, there is no efficient method which can generate large itemsets for very large scale problems. For these problems, current techniques require too much time to be of any practical use. The importance of solving such large scale problems is quite great, given the fact that most databases containing customer transaction data are quite large.
SUMMARY OF THE INVENTION
The present invention is directed to a computer-implemented system and method for mining long patterns in databases, which relies on a depth-first search technique in order to construct a lexicographic tree of itemsets. The depth first traversal of a tree is defined to be the process of traversing the nodes in the tree in such a way that after examining a node, all of that nodes descendants are examined before examining any other node. In depth-first strategy, the nodes of the lexicographic tree are examined in depth-first order in order to count the supports of their candidate extensions. In other words, all descendant itemsets of a node are examined before examining other nodes of the lexicographic tree. By projecting transactions upon the lexicographic tree structure in a depth-first manner, the CPU time for counting large itemsets is substantially reduced.
Novel and innovative counting methods are implemented including byte-wise counting and postprocessing for higher level nodes in the lexicographic tree structure; and, a two phase bucketing procedure, for counting support at lower level nodes. Use of these counting methods efficiently optimizes processing time and database storage utilization.

REFERENCES:
patent: 5615341 (1997-03-01), Agrawal et al.
patent: 5724573 (1998-03-01), Agrawal et al.
patent: 5842200 (1998-11-01), Agrawal et al.
patent: 5920855 (1999-07-01), Aggarwal et al.
patent: 6061682 (2000-05-01), Agrawal et al.
patent: 6138117 (2000-10-01), Bayardo
patent: 6185559 (2001-02-01), Brin et al.
patent: 6272478 (2001-08-01), Obata et al.
patent: 6278997 (2001-08-01), Agrawal et al.
patent: 6278998 (2001-08-01), Ozden et al.
patent: 6301575 (2001-10-01), Chadha et al.
patent: 6311179 (2001-10-01), Agarwal et al.
patent: 6324533 (2001-11-01), Agrawal et al.
Agrawal et al. “Mining Associatin Rules between Sets of Items in Large Databases”, IBM Almaden Research Center, San Jose, CA May 1993, pp 207-216.*
Yen et al. “An Efficient Data Mining Technique for Discovering Interesting Association Rules”, Database and Expert System Applications, 1997. pp: 664-669.*
Cheung et al. “A Fast Distributed Algorithm for Mining Association Rules”, Parallel and Distributed Information System, 1996 pp. 31-42.*
Bayardo R “Efficiently mining long patterns from databases” ACM SIGMOD Record, Jun. 1998, pp. 85-93.

Affiliated with

Agarwal Ramesh C.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Aggarwal Charu C.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Prasad V. V. V.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Breene John

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

International Business Machines - Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Pham Khanh

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Scully Scott Murphy & Presser

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Zarick, Esq. Gail H.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Depth first method for generating itemsets does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Depth first method for generating itemsets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Depth first method for generating itemsets will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2899209

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure