Interactive mining of most interesting rules

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C706S001000

Reexamination Certificate

active

06651049

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to mining database association rules, and more particularly to an interactive process of mining the most important database association rules.
2. Description of the Related Art
A data-set is a finite set of records. In the present invention, a record is simply an element on which boolean predicates (e.g., conditions) are applied. A rule consists of two conditions (e.g., antecedent and consequent) and is denoted as A→C where A is the antecedent and C is the consequent. A rule constraint is a boolean predicate on a rule. Given a set of constraints N, it is said that a rule r satisfies the constraints in N if every constraint in N evaluates to true given r. Some common examples of constraints are item constraints and minimums on support and confidence (e.g., see Srikant et al., 1997, Mining Association Rules With Item Constraints, In
Proceedings of the Third International Conference On Knowledge Discovery in Databases and Data Mining,
67-73; Agrawal et al. 1993, Mining Associations between Sets of Items in Massive Databases. In
Proceedings of the
1993
ACM
-
SIGMOD International Conference on Management of Data,
207-216, all incorporated herein by reference). The input to the problem of mining optimized rules is a 5-tuple <U, D, ≦, C, N>, where U is a finite set of conditions, D is a data-set, ≦ is a total order on rules, C is a condition specifying the rule consequent, and N is a set of constraints on rules.
For example, as discussed in Agrawal el al. 1993, Supra, using a supermarket with a large collection of items as an example, typical business decisions that the management of the supermarket has to make include what to put on sale, how to design coupons, how to place merchandise on shelves in order to maximize the profit, etc. Analysis of past transaction data is a commonly used approach in order to improve the quality of such decisions. Until recently, however, only global data about the cumulative sales during some time period (a day, a week, a month, etc.) was available on the computer. Progress in bar-code technology has made it possible to store the so called basket data that stores items purchased on a per-transaction basis. Basket data type transactions do not necessarily consist of items bought together at the same point of time. It may consist of items bought by a customer over a period of time. Examples include monthly purchases by members of a book club or a music club.
Several organizations have collected massive amounts of such data. These data sets are usually stored on tertiary storage and are very slowly migrating to database systems. One of the main reasons for the limited success of database systems in this area is that conventional database systems do not provide the necessary functionality for a user interested in taking advantage of this information.
Thus, the large collection of basket data type transactions are mined for association rules between sets of items with some minimum specified confidence. An example of such an association rule is the statement that 90% of transactions that purchase bread and butter also purchase milk. The antecedent of this rule consists of bread and butter and the consequent consists of milk alone. The number 90% is the confidence factor of the rule.
Rule mining enhances databases with functionalities to process queries such as finding all rules that have a specific product as consequent (these rules may help plan what the store should do to boost the sale of the specific product), or may help shelf planning by determining if the sale of items on shelf A is related to the sale of items on shelf B), and finding the best rule that has “bagels” in the consequent (“best” can be formulated in terms of the confidence factors of the rules, or in terms of their support, i.e., the fraction of transactions satisfying the rule).
For, example, let I=I
1
, I
2
, . . . ,I
m
be a set of binary attributes, called items. Let T be a database of transactions. Each transaction t is represented as a binary vector, with t[k]=1 if t bought the item I
k
, and t[k]=0 otherwise. There is one tuple in the database for each transaction. Let X be a set of some items in I. We say that a transaction t satisfies X if for all items I
k
in X, t[k]=1.
An association rule has an implication of the form X→I
j
, where X is a set of some items in I, and T
j
is a single item in I that is not present in X. The rule X→I
j
is satisfied in the set of transactions T with the confidence factor 0≦c≦I if at least c % of transactions in T that satisfy X also satisfy I
j
. The notation X→y I
j
|c is used to specify that the rule X=I
j
has a confidence factor of c.
Given the set of transactions T, the conventional process is interested in generating all rules that satisfy certain additional constraints of two different forms. The first form is syntactic constraints. These constraints involve restrictions oil items that can appear in a rule. For example, the only interesting rules may be the rules that have a specific item I
x
appearing in the consequent. Combinations of the above constraints are also possible, and all rules that have items from some predefined itemset X appearing in the consequent, and items from some other itemset Y appearing in the antecedent may be requested.
The second form is support constraints. These constraints concern the number of transactions in T that support a rule. The support for a rule is defined to be the fraction of transactions in T that satisfy the union of items in the consequent and antecedent of the rule.
Support should not be confused with confidence. The confidence of a rule is the probability with which the consequent evaluates to true given that the antecedent evaluates to true in the input data-set, which is computed as follows:
conf

(
A

C
)
=
sup

(
A

C
)
sup

(
A
)
(
1
)
While confidence is a measure of the rule's strength, support corresponds to statistical significance. Besides statistical significance, another motivation for support constraints comes from the fact that usually the only interesting rules are the ones with support above some minimum threshold for business reasons. If the support is not large enough, it means that the rule is not worth consideration or that it is simply less preferred (may be considered later).
When mining an optimal disjunction, a set of conditions A

U is treated as a condition itself that evaluates to true if and only if one or more of the conditions within A evaluates to true on the given record. For both cases, if A is empty then the set of conditions A

U always evaluates to true. Algorithms for mining optimal conjunctions and disjunctions differ significantly in their details, but the problem can be formally stated in an identical manner. More specifically, the manner in which the optimized rule mining may be formally stated is finding a set A
1

U such that A
1
satisfies the input constraints, and there exists no set A
2

U such that A
2
satisfies the input constraints and A
1
<A
2
. Algorithms for mining optimal disjunctions typically allow a single fixed conjunctive condition without complications (see Rastogi et al. 1998, Mining Optimized Association Rules with Categorical and Numeric Attributes, In
Proceedings of the
14
th International Conference on Data Engineering,
503-512, incorporated herein by reference).
Any rule A→C whose antecedent is a solution to an instance I of the optimized rule mining problem is said to be I-optimal (or just optimal if the instance is clear from the context). For simplicity, a rules antecedents (denoted with A and possibly some subscript) and rules (denoted with and possibly some subscript) are sometimes treated interchangeably since the consequent is always fixed and clear from the context.
The support and confidence values of rules are often used to define rule

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Interactive mining of most interesting rules does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Interactive mining of most interesting rules, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Interactive mining of most interesting rules will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3182942

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.