Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-12-21
2004-12-14
Mizrahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06832216
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a method and a system for data analysis of a database and a data warehouse. In particular, the invention relates to data mining, i.e. analysis of any hidden association of data in database.
In data mining for mining useful information buried in an enormous amount of data by analyzing the data, an association rule is discovered, which indicates association of data. For instance, by mining an association rule in the analysis of receipt data, i.e. a record of purchase of customer, in retail business, it is possible to know the purchase pattern of each customer. When a rule is mined, which means: “a customer who purchased commodity A and commodity B at the same time is very likely to buy commodity C and commodity D at the same time”, it is found that commodities A and B are related to the sale of commodities C and D, which is useful in determining the sales policy, e.g. in display and arrangement of commodities or selection of special price commodity.
According to the conventional type association rule, number of appearances of a specific item is counted in a group of data, which comprises a plurality of items, and at least one rule of association between the items is obtained. Description is given below on examples. When a database comprising 5 items of A
1
, A
2
, A
3
, A
4
and A
5
is given as shown in
FIG. 2
, an association rule between the items as shown in the equation 5 is obtained:
(
A
1=yes)∩(
A
3=yes)
(
A
5=yes) (5)
This is an association rule, which means: “In the data where items A
1
and A
3
appear, item A
5
is also likely to appear very often.” This is called an “association rule”. The association rule is expressed as:
(Assumption)
Conclusion) (6)
Each of the assumption and the conclusion comprises at least one items respectively. In case of (equation 5), the assumption is: (A
1
=yes)∩(A
3
=yes), and the conclusion is (A
5
=yes).
The association rule has two indices: support and confidence. Support is an amount of data, to which an association rule is applicable, i.e. a ratio of data where a combination of items in the association rule appears. Also, the number of times of appearances of the combination of items is called a “support count”. Confidence is an accuracy index of the association rule, i.e. the ratio, at which the conclusion is satisfied when the assumption of association rule is satisfied.
For instance, in case of the association rule given in the equation 5 mined from the database of
FIG. 2
, it is evident that the total number of records is 5, and the number of records supporting the association rule of the equation 5 is 2. Thus, the support count is 2 and the support is 0.4(40%). With regard to confidence, it is evident that number of records where all items in the association rule of the equation 5 appear is 2 and that number of records where the items of the assumption appear is 3. Thus, confidence is about 0.67(67%).
The studies on mining of the association rule have been performed in the field of data mining. For instance, the studies are described in the following references:
(1) R. Agrawal, T. Imielinski, and A. Swami: “Mining association rules between sets of items in large databases”; Proceedings of the ACM SIGMOD Conference on Management of Data, May 1993. (Reference 1)
(2) R. Agrawal and R. Srikant: “Fast algorithms for mining association rules; Proceedings of the 20th VLDB Conference, 1994. (Reference 2)
According to these methods, from the database comprising a plurality of items, all association rules are mined, which satisfy minimum support and minimum confidence, i.e. minimum values of support and confidence set by the user. These methods are based on the property that support of a product set of items (hereinafter referred as “itemset”) cannot be smaller than the support of an itemset (Y
⊃
X) contained in X, and pruning candidates of an association rule is performed. This method can be achieved according to the block diagram of FIG.
3
. First, a database is retrieved, and the number of appearances of each of the items is counted (support count), and any item which satisfies minimum support is picked up. These are called “frequent items”. Next, 2 items in the frequent items are combined, and an itemset is prepared, for which both support count and support are not known. This itemset is called “candidate itemset”. The database is retrieved again, and support count of each of the candidate itemsets is counted, and an itemset is obtained, which satisfies minimum support. The itemset satisfying the minimum support is called “frequent itemset”. Then, from the frequent itemset containing (k-
1
) items, itemsets are combined, which have the leading (k-
2
) items equal to each other, and a candidate itemset comprising “k” items is prepared. In this case, based on the property of the above-mentioned itemset, pruning of the candidate itemset is performed. Specifically, when a combination of items not present in the frequent itemset comprising (k-
1
) items is contained in the candidate itemset, which comprises “k” items prepared above, the candidate itemset is deleted. Next, the support count of each of the candidate itemsets is counted by retrieving the data from the database. The itemset to satisfy minimum support is picked up, and a frequent itemset is obtained. This process is repeated for each number of items, and this is continued until the newly picked frequent itemset turns to be empty. When the process to obtain the frequent itemset has been completed, all frequent itemsets to satisfy minimum support has been picked up. Next, using the frequent itemset thus obtained, all association rules are derived. From the frequent itemset when the number of items “k” is 2 or more, at least one association rule is derived, which has “n” items in an assumption and (k-n) items in a conclusion. Here, “n” is an integer of less than “k”. For instance, from a frequent itemset ABC having 3 items, 6 association rules can be derived: {A, B}
{C}, {A, C}
{B}, {B, C}
{A}, {A}
{B, C}, {B}
{A,C}, and {C}
{A, B}. As soon as the association rules are derived, the confidence of the association rule is calculated from the supports of the itemset in the association rules and the itemset in the assumptions, and only the association rules are mined, which satisfy minimum confidence set by the user. For instance, in case the support of the itemset {A, B, C} is 0.3 and support of the itemset {A, C} is 0.4, confidence of the association rule {A, C}
{B} is calculated as 0.75 (75%).
The method for efficiently mining the frequent itemset is described, for instance in: J. Han, J. Pei, and Y. Yin: “Mining frequent patterns without candidate generation”; Proceedings of ACM SIGMOD International Conference on Management of Data, 2000(Reference
3
). In the reference
3
, database is retrieved at first, and a frequent item satisfying a minimum support value is searched. Again, the database is retrieved, and a tree structure containing only the frequent items in the records of the database is constructed, and this is stored in the main memory. By analyzing only the tree structure in the main memory, all frequent itemsets containing 2 or more items are mined.
The association rules mined by the methods of the references
1
,
2
and
3
are only the rules, i.e. affirmation. For instance, from the database shown in
FIG. 2
, the association rule {A
1
,A
3
}
{A
5
} is mined by the above method. This means: (A
1
=yes)∩(A
3
=yes)
(A
5
=yes). That is, when A
1
and A
3
are contained in the data, it is a relation where A
5
is also contained. Here, the fact that a certain item A is contained in the data, i.e. a item expressed as (A=yes) is called “affirmative item”. Also, an expression that the item A is not contained in the data, i.e. a i
Nishizawa Itaru
Shintani Takahiko
Ushijima Kazutomo
A. Marquez, Esq. Juan Carlos
Fisher Esq. Stanley P.
Mizrahi Diane D.
Mofiz Apu M
Reed Smith LLP
LandOfFree
Method and system for mining association rules with negative... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for mining association rules with negative..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for mining association rules with negative... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3310128