Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-01-20
2002-07-02
Corrielus, Jean (Department: 2772)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06415287
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to data mining and, more particularly, to mining association rules with numerical weighted attribute values.
2. Background Description
Data mining is the extraction of implicit knowledge and discovery of interesting characteristics and patterns that are not explicitly presented in the data. These techniques can play an important role in presenting data in a concise manner and accommodating data semantics. During recent years, one active topic in this field is association rule discovery which is introduced in R. Agrawal, T. Imielinski and A. Swami, “Mining association rules between set of items in large databases”,
Proceedings of the
1993
ACM SIGMOD Conference on Management of Data,
1993. In general, the problem can be modeled as follows: let I be a set of items and T be a set of transactions, each of which consists of a subset of items in I. An association rule is an implication in the form of “X→Y”, where X and Y are sets of disjoint item sets. “X→Y” holds in T with support s and confidence c if s % transactions contain all items in the set “X∪Y” and c % transactions which contain all items in X also contain all items in Y. The goal of association rule mining process is to find all these rules with respect to some minimum s and c. An item (or itemset) is called a “frequent” item (or itemset) if its support is at least s.
These conventional association rules have been widely used in many application domains. In the market basket problem, each piece of merchandise in a supermarket can be viewed as an item, and the set of items that a customer purchases at one time period can be viewed as a transaction. The association rules represent the likelihood of items being purchased together by a customer. Thus, this knowledge can be used to guide sale promotions. For example, if there is a high confidence between soda and snacks, then the supermarket may reduce the price of soda to increase the sale of snacks.
However, the traditional association rules focus on binary attributes. In other words, this approach only considers whether an item is present in a transaction, but does not take into account the weight/intensity of an item within a transaction. For example, a customer may purchase ten bottles of soda and five bags of snacks and another may purchase four bottles of soda and one bag of snacks at a time. However, these two transactions will be the same in the conventional association rule approach. This could lead to loss of some vital information; i.e., intensity or weight of each item in a transaction. For example, if a customer buys more than seven bottles of soda, he or she is likely to purchase three or more bags of snacks; otherwise, the purchase tendency of soda is not strong. The traditional association rule can not express this type of relationship. With this knowledge, the supermarket manager may set a promotion such as “if a customer buys six bottles of soda, he or she can get two free bags of snacks.”
A new variation of association rules has been proposed in R. Srikant, and R. Agrawal, “Mining quantitative association rules in large relational tables”,
Proceedings of the
1996
ACM SIGMOD Conference on Management of Data Montreal
, Canada, June, 1996. The domain of each quantitative attribute is divided into a set of intervals (i.e., buckets, ranges) via equi-depth partitioning; i.e., each interval is with a similar number of attribute values. Intervals are combined as long as their support is less than the user specified max-support. Each of original intervals and the combined intervals is treated as a different item. Thus, the attribute value of the newly “generated” items is binary: “1” stands for that the original attribute value is within the specified interval; “0” stands for that the original attribute value is not within the specified interval. The algorithms for mining traditional association rules can then be applied on these newly generated binary items.
During the mapping process, one numerical attribute will be mapped into O(k
2
) binary attributes where k is the number of intervals for one attribute domain. When the number of numerical attributes is large, the number of mapped binary attributes could be unmanageable.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide an efficient method for mining a specific type association rule, the weighted association rule.
According to the invention, The traditional association rule problem is extended by allowing a weight to be associated with each item in a transaction to reflect interest/intensity of each item within the transaction. In turn, this provides an opportunity to associate a weight parameter with each item in a resulting association rule, called a weighted association rule (WAR). For example, “soda [4, 6]→snack [3, 5]” is a weighted association rule indicating that if a customer purchases soda in the quantity between four and six bottles, he or she is likely to purchase three to five bags of snacks. Thus, WARs can not only improve the confidence of the rules, but also provide a mechanism to do more effective target marketing by identifying or segmenting customers based on their potential degree of loyalty or volume of purchases.
There can be a very large number of items in any given problem, and every item has a numerical attribute, although only a small fraction of items are present in a transaction. Thus, according to the preferred method of the present invention, the frequent itemsets are first generated (without considering weights). This method differs from the prior art. The weight range of each item in each frequent itemset is divided into a sequence of intervals, referred to as base intervals”. Then the weight domain space of each frequent itemset is partitioned into fine grids. A grid is a multi-dimensional cube whose span on each dimension is exactly one base interval. The density of a grid is defined as the ratio of the actual number of transactions that belong to it over the expected number of transaction in it, assuming random uniform distribution. A density threshold is used to separate the transaction concentrating regions from the rest. A grid is called a “dense grid” if its density is above the pre-specified threshold. Otherwise, it is called a “sparse grid”. A “dense region” is the union of a set of adjacent dense grids. WARs can be identified based on these “dense” regions.
In reality, the number of valid weighted association rules (WARs) could be very large, and a user may not be interested in all WARs, but rather a small subset of them. For example, if the inventory of soda becomes too large in a supermarket, the manager may be only interested in the WARs involving soda. It would be desirable if the user can interact with the mining system to obtain a set of interesting WARs by varying the parameter values. Inspired by this, the present method supports both batch mode and interactive mode. In the batch mode, all qualified WARs are returned. On the other hand, in the interactive mode, a user not only can choose a set of interesting frequent itemsets to further investigate, but also can vary the support, interest, and density thresholds, so that only those qualified WARs for the interesting itemsets, are returned.
The present invention can be summarized as follows.
1) A new class of association rule problem, WAR, is proposed.
2) Due to the nature of this problem, the mining process is accomplished by a threefold approach:
a) First generating frequent itemsets and then deriving WARs from each frequent itemset.
b) During the frequent itemset generation process, the associated weight is ignored. An itemset is a frequent itemset if this itemset appears in at least a certain percentage of the transactions.
c) During the WAR derivation process, the weight domain space is partitioned in such a manner that the total number of grids is limited by some parameter N so that the available memory/storage space can be fully utili
Wang Wei
Yang Jiong
Yu Philip Shi-Lung
Corrielus Jean
McGuireWoods LLP
Zarick, Esq. Gail H.
LandOfFree
Method and system for mining weighted association rule does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for mining weighted association rule, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for mining weighted association rule will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2858378