Association rule generation and group-by processing system

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06226634

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to processing of a large amount of data normally stored in a database. Specifically, it relates first to a group-by process, and second to a database mining process.
The present invention relates to a process of classifying a large amount of records depending on a key value of each record and performing a specified operation on a group of records having the same key value, such as, for example, obtaining an average value. Such a process is referred to as a group-by process. The present invention relates more specifically to a group-by processing system based on a hash process, that is, a system of hashing a large volume of records according to a hash function value obtained by applying an appropriate hash function to a key value, generating a list of the hashed records, sorting the hashed records in the list according to the key value, and performing a group-by process on the sorted records of the resultant list.
The present invention also relates to a database mining process for obtaining a rule of the relationship among data stored in a database, and more specifically to a system of counting occurrences of a combination of related data in a large volume of data in the database. According to a count result of the system, a process of generating an association rule is performed as one of the data mining methods using a combination which meets a given condition and the number of occurrences of the combination. The association analysis based on the association rule is carefully considered in the U.S. and many other nations.
2. Description of the Related Art
Normally, an operation in the group-by process, that is, an operation on a group of records having the same key value, for example, the same item number, can be a count process for counting records such as item sets, an operation for computing a total or average of specific field values of the group of records, etc. These group-by processes are frequently performed in a relational database process, a statistical process, etc.
A group-by processing system can be based on a sort process or a hash process. A system based on the sort process is performed by continuously accessing records having the same key value by sorting a group of records according to the key values. That is, the group of records are first sorted according to the key values, and a list of resultant sorted records is searched from the beginning. A specified operation is repeated for the records having the same key value. When the key value changes, the operation is initialized.
In a hash process, a group of records are read into an input buffer, and hash function values, which respectively correspond to key values of the records to be hashed, are calculated by using a hash function. Then, each of the records in the input buffer is stored in one of record buffers according to the hash function value of the record. If a record buffer, which corresponds to one of the hash function values, is filled with records, the records in the record buffer are output into one of the hashed lists, which respectively correspond to a record buffer, in a secondary storage.
FIG. 164
is a flowchart showing an example of a conventional group-by processing system based on the sort process. When the process starts as shown in
FIG. 164
, a group of records to be processed in a group-by process is sorted according to key values in step S
201
. In step S
202
, the leading record in the group of records is read. In step S
203
, a function is initialized. In step S
204
, an operation of the function is performed on the read record. In step S
205
, it is determined whether or not any record still exists in the group of the sorted records.
When a record exists, the leading record in the group of the sorted records is read in step S
206
. In step S
207
, it is determined whether or not the key value of the record is equal to the key value of the previously read record. When the key values are equal to each other, the processes in and after steps S
204
are repeated.
When the key value of a record is not equal to that of the previous record, a termination process is performed on the function in step S
208
. In step S
209
, the result of the function process and the record are output as a resultant record, and the processes in and after step S
203
are repeated.
If it is determined in step S
205
that the group of the sorted records have been read, then the termination process is performed on the function in step S
210
, and the result of the function process performed on the previous record and the record are output as a resultant record in step S
211
, and the process terminates.
FIGS. 165A through 165F
are practical examples showing the proceedings of the group-by process performed according to the flowchart shown in FIG.
164
. As shown in
FIG. 165A
, a group of records to be processed in a group-by process contains 10 records, and each record comprises only a key value, for example, an item number for simple representation. When the process in step S
201
is completed, the state shown in
FIG. 165B
is realized.
In this example, the operation in the group-by process is a count process for obtaining the number of records having the same key value. When the processes in steps S
202
and S
203
terminate, the state shown in
FIG. 165C
is entered. That is, in the initialization of the function shown in step S
203
, the count value is set to 0.
In step S
204
, a count value is incremented only by 1 to enter the state shown in FIG.
165
D. The determination in step S
205
is YES, and in step S
206
, it is determined that the current record ‘
1
’ is equal to the previous record, and the new record ‘
1
’ becomes a current record. The determination in step S
207
is YES, the count value is incremented in step S
204
, and the state shown in
FIG. 165E
is entered.
The determination in step S
205
is YES again, and a new record is read in step S
206
. The value of the key of the record is 2, and the value is different from the value of the previous record, which is 1. Therefore, the termination process is performed on a function in step S
208
. When a counting operation is performed, the termination process is only to fix the current count value, and the fixed value indicates the result of the process of the function and is added to the previous record, that is, ‘
1
’ in this example, to be output as a resultant record in step S
209
. That is, the output resultant record is ‘
1
,
2
’.
FIG. 165F
shows the result.
By repeatedly performing the processes, the following group of records can be finally obtained as a process result.
1, 2
2, 2
3, 3
4, 1
5, 2
The result indicates that the group of records to be processed in the group-by process contains two records having the key value of 1, two records having the key value of 2, three records having the key value of 3, one record having the key value of 4, and two records having the key value of 5.
FIG. 166
shows an example of a conventional group-by processing system based on the hash process. When
FIG. 166
is compared with the flowchart based on the sort process shown in
FIG. 164
, the group of records to be processed in the group-by process are stored in one of record buffers according to the hash function values in step S
221
. Then, records in each record buffer are sorted according to key values in step S
222
, all sorted contents in the record buffers are connected to form a string, and the processes almost the same as the processes in and after step S
202
shown in
FIG. 164
are performed in steps S
223
through S
232
.
FIGS. 167A through 167F
show a practical example of the proceedings of the process performed according to the flowchart shown in FIG.
166
. The group of records to be processed in the group-by process are the same as those shown in
FIGS. 165A through 165F
. When the process is based on the hash process, the group of records to be processed in the group-by process are hashed using an appropriate hash function. In this example, mod 3 is

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Association rule generation and group-by processing system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Association rule generation and group-by processing system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Association rule generation and group-by processing system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2570648

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.