Database operation processor

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06760718

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a (relational) database system. Firstly, this invention relates to self-join operation processing for extracting a record satisfying a condition specified in a query from a plurality of records in a same table. Secondly, this invention relates to data mining for finding a relation among data stored in a database by the self-join operation.
2. Description of the Related Art
As a processing method of a join operation in the relational database system, “Nest Loop Processing Method,” “Sort Merge Processing Method,” “Hash Processing Method,” etc. are known.
In the “Nest Loop Processing Method,” while each part of records in a table is stored in a buffer in a memory, all records in another table are read. Then, an output record is produced by combining records which satisfy a join condition.
Since one of the tables is read repeatedly, the “Nest Loop Processing Method” is not efficient. If it is possible to read only records satisfying the join condition by using an index, efficiency of the “Nest Loop Processing Method” can be improved.
The “Nest Loop Processing Method” can be applied to any join condition. However, most of the join operations are equi-join operations with a join condition of equal key values. For the equi-join operations, the processing can become efficient by restricting a retrieval range of the join records by using the “Sort Merge Processing method” and “Hash Processing Method.”
In the “Sort Merge Processing Method,” record groups in both of the tables are sorted by a join key at first. Accordingly, it becomes possible to access records with an equal key value continuously. Further, a changing point of the key value is found and identified as a retrieval range boundary.
In the “Hash Processing Method,” the record groups in both of the tables are classified into a plurality of groups according to a value obtained by applying a hash function to the join key, and the processing are divided into join operations between corresponding groups. There is a possibility that there are records with various join key values in each of the groups, however the possibility can be reduced by increasing a number of the groups. Operations of the hash function is simpler than sorting. Therefore, the “Hash Processing Method” has an advantage when the performance of the CPU (Central Processing Unit) is limited.
In the “Sort Merge Processing Method” and “Hash Processing Method,” an input table is preprocessed, and a processing result is written in a storing unit as intermediate data. Therefore, in self-join processing where two input tables are same and join keys are same, two intermediate data with a same content are produced physically.
Accordingly, a load of processing in the self-join operation in a same table is as heavy as a load of processing in the join operation between different tables.
An association rule mining (basket analysis) is a kind of data mining, i.e., analyzing mass data statistically and finding useful rules and knowledge. The association rule mining is utilized for increasing sales through a merchandise display in a store, set sales, etc. by finding a trend in combinations of purchased items from a consumer purchase behavior history.
The association rule mining includes a phase of extracting all combinations of items (frequent itemset), which appear a determined number of times or more, from purchase history data stored in the database system and a phase of finding an association rule by considering inclusion relations among the extracted combinations. Particularly, as known, a load of processing in the former phase is heavy.
As a processing method in the phase of extracting, Apriori Algorithms proposed by R. Agrawal, et. al. are well known.
The processing method described in “Fast Algorithms for Mining Association Rules” in proceedings of the 20
th
VLDB (Very Large Data Bases) Conference, pages 487-499, 1994 is as follows.
An appearance number of each purchased item is counted in the purchase history data, and items of which appearance number reaches a certain value are extracted as frequent items.
Two different frequent items are combined as a candidate 2-itemset (“2” in the “candidate 2-itemset” shows a number of combined items).
An appearance number of the candidate 2-itemset in the purchase history data is counted, and the candidate 2-itemset of which appearance number reaches a certain value is extracted as a frequent 2-itemset.
If k>=3, following steps are repeated.
Two itemsets including one different item are chosen from frequent (k−1)-itemsets, and an itemset of k items including each item from the both sets, i.e., candidate k-itemset, is generated.
However, an itemset including a combination of items, which is not in the frequent (k−1)-itemset, is excluded from the candidate k-itemset.
When the candidate k-itemset becomes empty, the processing is ended.
An appearance number of the candidate k-itemset in the purchase history data is counted, and a set of which appearance number reaches a certain value is extracted as a frequent k-itemset.
For counting the appearance number of the combination of the items, only same time purchase should be counted. Since a number of items purchased at a same time is variable, it is not appropriate to represent all the items purchased at the same time by a record in the purchase history data. Therefore, as shown in (
1
) of
FIG. 29
, it is general that the record is configured for each of the items, and a combination of items purchased at a same time is represented by the records with a same transaction ID (Identification Data).
Therefore, for finding a combination of k items, it is necessary to produce a record including k items with a same transaction ID by joining the purchase history data with itself k−1 times (self-join).
However, in an existing database system, a load of the self-join operation is heavy, and the performance is not enough. Therefore, in most of association rule mining systems such as a data mining processing method described in Japanese Unexamined Published Patent Application Hei 11-3342 (published on Jan. 6, 1999) “A Group-By Processing Method,” the purchase history data are extracted from the database system in advance, converted to a file in a unique form including items in a variable number as shown in (
2
) of
FIG. 29
, and processing is performed by using a special software.
SUMMARY OF THE INVENTION
In a relational database system according to the related art, since a join operation within a same table is not considered, only a processing method for different tables can be used. Therefore, it is impossible to offer a practically sufficient performance for a processing such as the association rule mining which includes many self-join operations.
However, the mining system according to the related art such as Japanese Unexamined Published Patent Application Hei 11-3342 for extracting the frequent itemset outside the database system has problems as follows.
An overhead exists in extracting the mass data.
Since data are copied, extra disk area and managing operations are necessary.
It is one of objects of this invention to solve the above-stated problems in the related art. Particularly, some of the aims of this invention are as follows.
No overhead exists in extracting the mass data.
Since data are not copied, extra disk area and managing operations are not necessary.
By improving the performance of the database system by parallelization, etc., the performance can be improved without changing the mining system.
For achieving the above aims, a content of the query is analyzed, and it is also judged if the database operation processor uses a self-join operation. If the self-join operation is used, a processing method, which is effective and efficient only for the self-join operation, is used instead of an ordinary processing method of the join operation. Accordingly, the query including the self-join operation can be processed efficiently.
According to an aspect of this invention, a databas

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Database operation processor does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Database operation processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Database operation processor will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3221561

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.