Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-02-25
2002-03-26
Verbrugge, Kevin (Department: 2185)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S147000, C709S213000
Reexamination Certificate
active
06363458
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to a Distributed Shared Memory (DSM) system, and more particularly, to a communication protocol to transfer/receive data between respective nodes.
Furthermore, the present invention relates to a new Adaptive Granularity type communication protocol to integrate fine- and coarse communication in a distributed shared memory which makes it possible to actively process depending on the communication data size between respective nodes. According to the present invention, it is possible to obtain standard load-store communication performance by employing cache line transfer for fine-grain sharing and bulk transfer performance by supporting spectrum granularity for bulk transfer.
Generally speaking, Distributed Shared Memory system is a noteworthy system as a recent multiprocessor system owing to its large expandability and programmability. In addition, most hardware distributed shared memory machines (DSM) achieve high performance by allowing the caching of shared writable data as well as read-only data.
The descriptions of the general concept for the above Distributed Shared Memory system are as follows, with reference to accompanied FIG.
1
.
Distributed Shared Memory system is a multiprocessor computer system in which respective node can refer to memory of other nodes (i.e., distributed memory) as if it were its own memory. This architecture is a cache coherence management basis DSM system. Hence, one node in cache coherence management basis DSM system can refer to the memory of other nodes. It means that DSM system makes it possible to obtain good performance by storing the block referred to the memory of remote nodes in its cache and referring to the data in its cache without direct referring to the memory of remote nodes when it is necessary to refer to the above certain block.
However, if the corresponding cache line of certain node are modified by the certain node when the respective nodes share the memory block of home node, then the nodes with unchanged data are forced to refer to the old, unchanged data. Therefore, implementing cache in respective node introduces problem.
To solve the above problem, many cache coherence protocols are implemented. With reference to
FIG. 1
, respective nodes (NA, NB, NC, . . . , NK) are tightly connected to each other by Interconnect Network: IC Net) (This network is developed to speed up the message transfer.). In this case, k-array n-cube network, wormhole route network are included in the preferred example for the above IC Net.
The inner structure of the respective nodes is similar to those of prior uniprocessor (e.g., personal computer, Sparc20, and so on) as referred to by NK in
FIG. 1
, but respective node has node controller and directory to support DSM.
When read/write miss for cache happens in one node, the node controller transfers request to home node for cache coherence management. If reply for the request is transferred from home node, then the controller handles the corresponding protocols.
In addition, directory must have the information of its own memory sharing state in order to keep the cache coherence. Of course, the number of directory entries must be large enough to meet the number of respective blocks one by one, and the above entry stores the number of node sharing this memory block. If other node tries to change the memory block of home node, then the other node must acquire write approval from the home node and the home node transfers invalidation requests to all nodes sharing the corresponding block before write approval transfer. If the home node has received all acknowledge messages, then the home node transfers write approval to write requesting node.
The above briefly described protocol is a Hardware DSM Protocol (HDP), and a transfer/receive protocol for fine-grain.
Therefore, one disadvantage of this kind of cache-coherent machine is that they use a fixed size block (i.e., a cache line for loads and stores) as a way of a communication. While this works well for fine-grain data, on some application, another communication program the characteristics of which is parallelism can sometimes be more effective than caching permitting for data bulk transfer.
Hence, to solve the described problems, a method supporting above two types simultaneously, i.e., the method for using all advantages of fine- and coarse-grain communications is supposed recently and the brief descriptions of it are as follows.
More recent shared memory machines have begun to integrate both models within a single architecture and to implement coherence protocols in software rather than in hardware. In order to use the bulk transfer facility on these machines, several approaches have been proposed such as explicit messages and new programming models. In explicit message passing communication primitives such as send-receive or memory-copy are used selectively to communicate coarse-grain data, while load-store communication is used for fine-grain data. In other words, two communication paradigms coexist in the program and it is the user's responsibility to select the appropriate model.
Though these two approaches support an arbitrarily variable granularity and thus may potentially lead to large performance gains, they suffer from decreased programmability and increased hardware complexity. In other words, there is a tradeoff between the support of arbitrary size granularities and programmability.
SUMMARY OF THE INVENTION
The primary objects of the present invention to solve the prior problems is to provide a new Adaptive Granularity type communication protocol to integrate fine and coarse communication in a distributed shared memory which makes it possible to actively process on communication protocol setting depending on the communication data size between respective nodes. According to the present invention, it is possible to obtain standard load-store communication performance by employing cache line transfer for fine-grain sharing and bulk transfer performance by supporting spectrum granularity for bulk transfer. In addition, by efficiently supporting the transparent bulk data communication, it is possible to reduce the programmer's burden for using variable-size granularity.
The present invention is characterized as a data communication method for reading/writing data between memories in a distributed shared memory system wherein said protocol selectively performs bulk transfer by supporting spectrum granularity for coarse-grain sharing or standard load-store transfer by employing cache line transfer for fine-grain sharing.
In accordance with the aspects of the present invention, a bulk data communication method of a data communication method for reading/writing data between memories in a distributed shared memory system is provided which comprises the steps of a) determining only the communication type without designating the requested data size according to the data type and transferring the request to the home node, when the node controller is instructed from local cache; b) determining the granularity depending on the sharing pattern and transferring the bulk data to the requesting node, when the home node receives the bulk request; c) adding the two blocks into one buddy when two adjacent blocks are owned to same node; and d) writing by the node controller only requested data in the cache line and the rests in local memory in order to use for future cache miss, when the data arrive.
Another object of the present invention is to provide a bulk data communication method which, after the step b), further comprises the step of dividing by the home node the block into two parts in order to reduce the false sharing when the ownership of the block is changed.
Another object of the present invention is to provide a data communication method from local or remote nodes to home node for reading/writing data between memories in a distributed shared memory system in which a plurality of nodes are connected on interconnection network, and said respec
Bachman & LaPointe P.C.
Korea Advanced Institute of Science and Technology
Verbrugge Kevin
LandOfFree
Adaptive granularity method for integration of fine and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Adaptive granularity method for integration of fine and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Adaptive granularity method for integration of fine and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2876905