Distributed shared memory system and method of controlling...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C709S213000, C711S147000, C711S148000, C711S153000

Reexamination Certificate

active

06219672

ABSTRACT:

BACKGROUND OF THE INVENTION
This application is based on Japanese Patent Application No. 9-341384, filed Dec. 11, 1997, and Japanese Patent Application No. 10-102383, filed Apr. 14, 1998, the contents of which are incorporated herein by reference.
The present invention relates to a distributed shared memory system suitably applied to a multiprocessor system of a shared memory type that executes large-scale data mining, for example, on the TB (terabyte) order, and a method of controlling the distributed shared memory.
With recent advances in bar code techniques and the like, retailers such as supermarkets store a large volume of sales data. Advanced retailers analyze such sales data stored in large volume, and reflect the analysis result in store layouts, thereby increasing sales. Such a technique is generally called data mining.
Of various information obtained by data mining, most typical information is an association rule. For example, an association rule includes the information “50% of the customers who buy packs of paper diaper also buy cans of beer”. This is an example associated with supermarkets in the U.S.A. This association rule indicates that in the U.S.A., young fathers often buy packs of paper diaper, and hence buy cans of beer together. In accordance with this information, therefore, for example, packs of paper diaper and cans of beer are placed near to increase the sales of cans of beer. A method of obtaining such an association rule is disclosed in R. Agrawal et al., “Mining Association Rules between Sets of Items in Large Databases”, Proceedings of ACM SIGMOD, May 1993. This method will be briefly described below.
Let I={i1, i2, . . . , im} be a set (item) of attributes, and D={t1, t2, . . . , tn} be a transaction database. In this case, ti is a set of items. An association rule is defined as X≧Y. In this case, X and Y are subsets of I, and the common set of X and Y is an empty set. Two evaluation values referred to as support and confidence values will be defined. A support value indicates the ratio of X to D, and a confidence value indicates the ratio of transactions, which include both X and Y, to the transactions including X in D. An association rule is extracted by the following procedure.
(1) An item set that satisfies the minimum support value is detected (this item is called a frequent item set).
(2) An association rule that satisfies the minimum confidence value is detected from the frequent item set obtained in (1).
An example of how an association rule is extracted will be described below. Assume that T
1
={1, 3, 4}, T
2
={1, 2, 3, 5}, T
3
={2, 4}, T
4
={1, 2}, and T
5
={1, 3, 5} are set as transactions. An association rule that satisfies a minimum support value of 60% and a minimum confidence value of 60% is detected from these transactions. A frequent item set is {1}, {2}, {3}, and {1, 3}, and 1≧3 is obtained as an association rule.
Apriori algorithm is known as a technique of efficiently extracting this frequent item set. Apriori algorithm is described in R. Agrawal et al., “Fast Algorithms for Mining Association Rules”, Proceedings of 20th VLDB, 1994. This technique will be briefly described below.
(1) A transaction database is read, and the appearance frequency of each item is counted up, thereby obtaining support values. In this case, to count up the appearance frequency of each item is to count the number of times each item appears in the transaction database. Subsequently, “count up” indicates this.
(2) Items that satisfy the minimum support value are extracted as a frequent item set having length 1.
(3) Combinations of pairs of items are formed from the frequent item set having length 1. These combinations will be referred to as candidate item sets having length 2.
(4) Support values are obtained by searching the transaction database.
(5) Items that satisfy the minimum support value are extracted to form a frequent item set having length 2.
(6) The following is the processing to be performed in the case of length k(≧2).
(a) A candidate item set having the length k is formed from a frequent item set having a length k−1.
(b) Support values are obtained by searching the transaction database.
(c) Items that satisfy the minimum support value are extracted to form a frequent item set having the length k.
(7) The above processing is repeated until the frequent item set becomes empty. As described above, in conventional data mining, this Apriori algorithm is basically used to find association rules.
Although this Apriori algorithm is efficient, since transaction data to be processed in data mining is on the TB order, large-volume transaction data cannot be processed. Even if such data can be processed, it takes an enormous processing time. For example, 1-TB transaction data corresponds to 500 2-GB (gigabyte) disk units. Even if an SMP computer is used, it is difficult to connect all the 500 disk units to one computer. Even if 500 disk units can be connected, problems arise in terms of I/O performance. For this reason, disk units storing transaction data on the TB order are preferably distributed to a plurality of nodes to be processed by using a cluster system. However, since Apriori algorithm is an algorithm for sequential processing, this algorithm does not operate on the cluster system. Even if this Apriori algorithm is improved to operate on a cluster system of a distributed memory type, the resultant system inevitably becomes a programming model of a distributed memory type accompanying communications. This makes it difficult to develop a data mining program. More specifically, a programming model of a shared memory type allows exclusive control using a lock mechanism. In the case of a programming model of a distributed memory type, however, since each processor cannot see an identical storage area in each distributed memory, the algorithm must be basically changed, and the program must be modified.
BRIEF SUMMARY OF THE INVENTION
It is an object of the present invention to provide a distributed shared memory system which can efficiently process large-scale data mining, for example, on the TB (terabyte) order.
According to the present invention, there is provided a distributed shared memory system applied to a multiprocessor system of a distributed memory type in which a plurality of computers are loosely-coupled, comprising shared memory space providing means for providing a shared memory space which is commonly accessible with an identical address with respect to all the processes operating on the plurality of computers, data structure preparing means for preparing, in the shared memory space, a data structure storing appearance frequency in units of sets of specific items extracted from input data by the processes, count up history obtaining means for obtaining a history of count up of the process with respect to the appearance frequency stored in the data structure, count up history transfer means for transferring, to other computers, the count up history obtained by the count up history transfer means, count up history receiving means for receiving the count up history transferred from each of other computers, and count up history reflecting means for reflecting the count up history received by the count up history receiving means in the appearance frequency stored in the data structure.
In addition, according to the present invention, there is provided a distributed shared memory control method applied to a multiprocessor system of a distributed memory type in which a plurality of computers are loosely-coupled, comprising the steps of providing a shared memory space which is commonly accessible with an identical address with respect to all the processes operating on the plurality of computers, preparing, in the shared memory space, a data structure storing appearance frequency in units of sets of specific items extracted from input data by the processes, obtaining a h

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Distributed shared memory system and method of controlling... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Distributed shared memory system and method of controlling..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distributed shared memory system and method of controlling... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2510827

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.