Method and apparatus for discovering association rules

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C706S047000

Reexamination Certificate

active

06385608

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a data processing system, especially to a hash-tree operation performed in mining data for discovering unknown rules in databases and in calculating algorithm for the data mining.
2. Description of the Related Art
It is known well that there are association rules in knowledge obtained by data mining for a large database. The association rules usually relate to sets of items often appearing in the same record. The rules are typically utilized for marketing strategies in retail sales. For example, an association rule is discovered by analyzing customer purchase records or accumulated sales receipt data in order to find a purchase tendency. Based on the purchase tendency, things or services bought at the same time can be known, which helps to develop sales and implement marketing strategies. The “record” in this specification means a list of items bought by a customer.
As a method of discovering an association rule in database relating to itemsets in a record, an algorithm called “Apriori” by R. Agrawal et al. is disclosed in the article, “Fast Algorithms for Mining Association Rules” (Proc. of 20th VLDB, 1994) and in Unexamined Japanese Patent Publication 8-287106 (Priority is claimed in U.S. Priority Number; 415006, Priority Date: Mar. 31, 1995).
In “Apriori”, two indexes “support” and “confidence” are used as a criterion for discovering an association rule. However, we herein use another criterion “frequency” instead of the “support” to explain “Apriori” in view of our present invention.
An example of association rule will be described. When there are two itemsets [A, B, . . . , X] and [Y], the association rule between the itemsets is expressed by the form
A, B, . . . , X→Y
where the number of records including all of A, B, . . . , X, Y is called a frequency of the rule and
a ratio of records including A, B, . . . , X, Y to the records including A, B, . . . , X is called a confidence.
In the “Apriori”, an association rule having a frequency and a confidence respectively greater than a predefined lowest value (minimum frequency and minimum confidence) is selected.
By implementing an association rule discovering system shown in
FIGS. 30 and 31
, the “Apriori” can be realized. Now, the procedure of this method is explained with reference to
FIG. 31
, using database of
FIG. 33
to simply explain the association rule discovering. Each record in the database of
FIG. 33
has a record ID and includes items expressed by integers equal to or more than 1.
An itemset having k items is called a k-itemset. (k is an integer equal to or more than 2.) A set of k-itemsets whose frequencies are equal to or more than a minimum frequency is called a large-itemset Lk having length k. A set of k-itemsets potentially to be elements of the large-itemset Lk is called a candidate-itemset Ck having length k. Namely, k-itemsets, whose frequencies are equal to or more than the minimum frequency, in Ck are selected to be elements of Lk.
At a user input step
100
in
FIG. 31
, a minimum frequency and a minimum confidence are obtained from the user through a user input unit
10
. At an L
1
generating step
110
, a candidate itemset verifying unit
21
selects a record from a database
1
one by one, counts appearing times of each item in the record, and increases a counting number (frequency) of the item. If a new item appears, a counting area for the new item is newly provided. After all the items in all the records have been counted, only items each of which has a total counted frequency more than the minimum frequency are registered in a hash-tree.
FIG. 34
shows the case that each frequency of five items, 1, 2, 3, 4, and 5 is more than the minimum frequency and the five items have been registered in the hash-tree. Both ends of each branch of the hash-tree are called nodes and generally item numbers are correspondingly assigned to the nodes. At the beginning end of the hash-tree, no item number is assigned to the node, which is called a root. The number of branches, from the root to the last node, is called a branch length. Therefore, each branch length in
FIG. 34
is 1.
At a Ck generating step
120
, a candidate-itemset Ck is generated from a large-itemset Lk−1 having length k−1 by a candidate-itemset generating unit
22
. In the initial state shown in
FIG. 34
, k equals 2 and C
2
is generated from L
1
.
Now, the case of C
3
being generated from L
2
will be explained.
FIG. 35
illustrates a hash-tree made up to the state of L
2
. The detail of the Ck generating step of
FIG. 31
is shown in a block diagram of
FIG. 32
having two-step procedures: a join step
121
and a prune step
122
.
Referring to
FIGS. 35 and 36A
, the join step
121
is described with reference to a node (called an original node here) at a branch end of k−1 long. New branches are extended in order to make child nodes for the original node. The child nodes should have item numbers larger than the item number assigned to the original node out of item numbers assigned to other nodes having the same parent node as the original node. For instance, the node shown as root→1→3 in
FIG. 35
is expressed by [1, 3]. (This expression is used hereinafter for representing a node in the hash-tree). With respect to the node [1,3], nodes [1,4] and [1,5], which have the same parent node as the node [1,3] and whose item numbers 4 and 5 are larger than 3, are joined to the node [1,3] to be [1,3,4] and [1,3,5]. With respect to the node [1,4], the node [1,5] is joined to be [1,4,5]. No new branch is extended for the node [1,5] because there is no larger item number than 5 for [1]. The above procedure is illustrated in
FIG. 36A
as a state before pruning.
The prune step
122
will now be explained. A branch indicating an itemset, whose length is extended to k, has been made at the join step
121
. Then, every (k−1)-itemset, made by deleting one item from the k-itemset, is checked whether the (k−1)-itemset is included in Lk−1 or not. Only when all the (k−1)-itemsets are included in Lk−1, the k-itemset is left to be utilized. If there is at least one (k−1)-itemset which is not included in Lk−1, the k-itemset is deleted.
For instance, in the case of checking [1,3,4], three
2
-itemsets [1,3], [1,4], and [3,4] are checked to be in L
2
or not. Referring to
FIG. 36A
, as all of these three itemsets are included in L
2
, [1,3,4] is left. In the case of checking [1,3,5], three
2
-itemsets [1,3], [1,4], and [3,4] and [3,5] are checked to be in L
2
or not. Since [3,5] does not exist in L
2
, [1,3,5] is deleted.
All the k-itemsets made in the join step
121
are checked at the prune step
122
. A hash-tree after the pruning is shown in FIG.
36
B.
After the Ck generating step
120
, an Lk generating step
130
is performed by the candidate-itemset verifying unit
21
. In the Lk generating step
130
, a record is selected from the database one by one to count the number of k-itemsets in Ck. Then, only k-itemsets whose frequencies are more than the minimum frequency are left to be elements of Lk. Matching is performed between the record and the hash-tree in counting the number of k-itemsets.
Now, the matching will be explained. First, a record is selected from the database one by one, and checking is performed for the selected record along the hash-tree from the root. Then, it is checked whether an item corresponding to a child node of the root exists in the record or not. If such an item does not exist in the record, the matching for the record is finished to check the next record. When such an item exists in the record, it is checked whether an item corresponding to a next child node (grandchild node of the root) exists in t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for discovering association rules does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for discovering association rules, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for discovering association rules will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2835452

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.