Distributed shared memory multiprocessor system based on a...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S148000, C709S218000

Reexamination Certificate

active

06253292

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a distributed shared memory multiprocessor system; and, more particularly, to a distributed shared memory multiprocessor system based on a unidirectional ring bus using a snooping scheme.
DESCRIPTION OF THE PRIOR ART
An extensive shared memory multiprocessor system having a single address space and coherent caches provides a flexible and powerful operation environment. The single address space and the coherent caches conveniently implement data partitioning and dynamic load balancing and provide a better environment for a parallel compiler, a standard operating system and multi-programming to thereby make it possible to use a machine with a higher degree of flexibility and efficiency.
Such a shared memory multiprocessor system may be classified into two groups: i.e., a uniform memory access (UMA) multiprocessor, e.g., the multiprocessor
100
shown in
FIG. 1
; and a non-uniform memory access(NUMA) multiprocessor or distributed shared memory multiprocessor, e.g., the multiprocessor
200
shown in FIG.
2
. The UMA multiprocessor
100
of
FIG. 1
comprises processor modules, shared memories, an input/output (I/O) system
150
and a system bus
160
, wherein only two processor modules
110
and
120
and two shared memories
130
and
140
are depicted for the sake of simplicity. The processor module
110
includes a central processing unit (CPU)
112
and a cache
114
and the processor module
120
includes a CPU
122
and a cache
124
.
The shared memories
130
and
140
are commonly accessed by both CPU's
112
and
122
, thereby increasing traffics in the system bus
160
connected to the shared memories
130
and
140
. The increased traffics in the system bus
160
will in turn increase access delay times to the system bus
160
and to the shared memories
130
and
140
.
In order to overcome such defects, the distributed shared memory multiprocessor
200
of
FIG. 2
is developed, which comprises processor modules and a system bus
230
, wherein only two processor modules
210
and
220
are depicted for the sake of simplicity. The processor module
210
includes a CPU
212
, a cache
214
, a shared memory
216
and an I/O system
218
and the processor module
220
includes a CPU
222
, a cache
224
, a shared memory
226
and an I/O system
228
.
The distributed shared memory multiprocessor
200
distributes the shared memories
216
and
226
to the respective processor modules
210
and
220
to thereby achieve a short access time for the CPU in each processor module to the corresponding shared memory in said each processor module, i.e., local shared memory, compared with an access time for the same CPU to the other shared memory in the other processor module, i.e., remote shared memory. Thus, memory access times of the CPU are different from each other according to the location of the shared memory to be accessed. The distributed shared memory multiprocessor
200
induces more accesses to the local shared memory than the remote shared memory to thereby alleviate the traffics in the system bus
230
and reduce memory access delays.
Although each of the caches shown in
FIGS. 1 and 2
has a smaller capacity than that of the shared memories, the caches provide much shorter access times. This is achieved by reducing the number of access requests to system buses and the shared memories. To be more specific, the number of access requests to the system buses and the shared memories is reduced since the caches store data blocks of the shared memories which are likely to be frequently used by CPU's.
However, the shared memory multiprocessor system based on the bus system has a very strict restriction on the system scalability and the bandwidth of the bus.
In order to alleviate the restriction, an interconnection network comprising a multiplicity of high speed point-to-point links is recommended. Many structures, e.g., a mesh, a torus, a hypercube, a N-cube, a MIN(multi-stage interconnection network) , an omega network and a ring structure can be adapted to the interconnection network structure. Among these structures, the ring structure is easy to design and implement. Moreover, while the bus sequentially transmits each transaction, the ring transmits a plural number of transactions at one time to thereby increase the bandwidth.
Meanwhile, when a CPU in a processor module, for example, the CPU
112
in the processor module
110
shown in
FIG. 1
, performs a write operation on a data block stored at the cache
114
to thereby entail a change in the data block, the change in the data block must be reflected to a corresponding data block in the other cache
124
in the other processor module
120
. This kind of problem is, so called, a cache coherence problem.
In general, a snooping scheme for the cache coherence is not adapted to a multiprocessor based on a point-to-point link since there occurs much overhead in applying the snooping scheme to the multiprocessor based on the point-to-point link. But, in a ring based multiprocessor using a unidirectional point-to-point link, the snooping scheme for the cache coherence is more efficient compared with a directory scheme or a SCI (scalable coherence interface) scheme using a double linked list.
Specifically, while a unicast cycles half the ring on the average in the ring based multiprocessor, the broadcast cycles the whole ring on the average to thereby cause twice the cost that the unicast incurs. Since, however, packets to be broadcast are usually request packets with no data, there is not much relation to the utilization of the ring. Moreover, unlike the directory or the SCI scheme, the snooping scheme does not generate extra transactions for cache coherence to thereby reduce the utilization of the ring and the memory access times.
Referring to
FIG. 3
, there is illustrated an exemplary operation of a distributed shared memory multiprocessor
300
based on a unidirectional ring bus which maintains the cache coherence by using the directory or the SCI scheme. If a processor node PN
1
310
fails to read a data block DB
322
from a local cache therein, the PN
1
310
unicasts a request packet for the DB
322
, i.e., RQ
1
312
to a processor node Home
320
, wherein a local shared memory in the Home
320
stores the DB
322
. The Home
320
unicasts a request packet for an updated data block DB′
332
, i.e., RQ
2
324
to a processor node PN
2
330
, wherein DB′
332
is an updated version of the DB
322
and a local cache in the PN
2
330
stores the DB′
332
therein. The PN
2
330
unicasts the DB′
332
to the Home
320
and the Home
320
updates the DB
322
to DB′
332
to unicast the DB′
332
to the PN
1
310
.
As is illustrated above, in the multiprocessor system based on the unidirectional ring bus using the directory or the SCI scheme, a data block is transmitted via a processor node whose local shared memory stores the original data block. This inefficiency occurs due to the cache coherence, which causes heavy traffics of the unidirectional ring bus and increase in the memory access times. Accordingly, it is needed to alleviate this problem by means of the snooping scheme.
SUMMARY OF THE INVENTION
It is, therefore, a primary object of the invention to provide a distributed shared memory multiprocessor based on a unidirectional ring bus using a snooping scheme.
In accordance with the present invention, there is provided a distributed shared memory multiprocessor system comprising: a group of processor nodes, wherein the processor nodes are arranged assuming the form of a ring and one of the processor nodes generates a request signal for a data block, the remaining processor nodes snoop their own internal parts, and one of the remaining processor nodes provides the data block; and a ring bus for connecting the processor nodes in the form of the ring and providing a path through which the request signal is broadcast to each of the other processor nodes and the data block is unicast to the processor node which has generate

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Distributed shared memory multiprocessor system based on a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Distributed shared memory multiprocessor system based on a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distributed shared memory multiprocessor system based on a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2491930

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.