Electrical computers and digital data processing systems: input/ – Access arbitrating
Reexamination Certificate
2000-04-28
2003-11-18
Myers, Paul R. (Department: 2181)
Electrical computers and digital data processing systems: input/
Access arbitrating
C710S200000, C710S107000, C710S108000, C711S163000
Reexamination Certificate
active
06651124
ABSTRACT:
TECHNICAL FIELD
The invention relates to computer processors and memory systems. More particularly, the invention relates to optimizing coherent memory access operations within multiprocessor computer systems having distributed shared memory architectures.
BACKGROUND ART
Multiprocessor, or parallel processing, computer systems rely on a plurality of microprocessors to handle computing tasks in parallel to reduce overall execution time. One common implementation of a multiprocessor system is the “single bus architecture, in which a plurality of processors are interconnected through a single bus. However, because of the limited bandwidth of the single bus also limits the number of processors that can be interconnected thereto, recently networked multiprocessor systems have also been developed, which utilize processors or groups of processors connected to one another across an interconnection fabric, e.g., a network, and communicating via “packets” or messages.
Typically, in a networked multiprocessor system includes a plurality of nodes or clusters interconnected via a network. For example,
FIG. 1
shows an exemplary networked multiprocessor system
100
, in which a plurality of nodes
102
are interconnected to each other via the interconnection fabric
101
, e.g., a network. By way of an example, only two nodes are shown. However, the networked multiprocessor system
100
may have any number of nodes. Moreover, although, in
FIG. 1
, the interconnection fabric
101
is shown to provide interconnections only between the nodes
102
, all system entities, including the cells
103
, the processors
105
and the memories
104
, are interconnected, and communicate, with the rest of the system through the interconnection fabric
101
.
Each of the nodes
102
of the networked multiprocessor system
100
may be further divided into a smaller hierarchical units—referred herein as “cells”
103
—, which comprises a plurality of processors
105
and a shared memory
104
. Each processor
105
may comprise any processing elements that may share data within the distributed shared memory in the system, e.g., a microprocessor, an I/O device or the like. The grouping into nodes and/or cells of the system entities may be made physically and/or logically.
Each of the shared memory
104
may comprise a portion of the shared memory for the system
100
, and may include a memory controller and/or a coherency controller (not shown) to control memory accesses thereto from various processors in the system, and to monitor the status of local copies of the memory stored in caches in various processors in the system using a coherency directory that are maintained in each node or within each cell.
As shown in
FIG. 2
, a typical shared memory
104
receives memory request packets through the blocking (BL) queue
204
, receives response packets from processors
105
through the processor return (PR) queue
203
, and sends memory return packets in response to the memory requests through the MR queue
202
. The PL/BL queue
201
is a buffer to hold incoming memory requests, where a BL transaction is a transaction involving a memory access request to the shared memory
104
, e.g., a memory read transaction, and where a PR transaction is a transaction involving a response from a processor
105
in response to a coherency check request from a shared memory
104
and/or a write back of a copy of a cache line in its cache back to the memory
104
.
One significant problem that exists with many networked computer systems is that of deadlock, where nodes may in effect “lock up” due to an inability to pass packets or messages to other nodes. In particular, in some networked computer systems, sending a primary request to another node may result in the receiving or destination node sending out one or more secondary requests, e.g., to notify other nodes having local copies of a memory block that the copies of the memory block in their respective caches must be marked invalid (often referred to as “invalidation” requests). However, if one or more secondary requests cannot be sent by the destination node, the node may block receipt of new requests from other nodes. This may result in two nodes each waiting for responses from the other.
For example, if a shared memory
104
receives a BL transaction, e.g., a memory read request from a processor, the memory
104
may send a MR request, e.g., a coherency check request, to other processor(s) to ensure that no copies of the requested data reside in caches of other processor(s). The shared memory
104
then must wait for processor responses (PR) from the other processor(s) before it can issue a MR transaction that satisfies the memory read request. However, if the MR transaction to other processors could not be issued, e.g., because the MR queue is full at the time, then no other transaction can be received by the shared memory
104
, and thus a deadlock occurs.
For optimal performance of the shared memory system, it is crucial that the exchange of packets among the processors and shared memories continuously flow.
Prior attempts to address the deadlock problem includes provision of MR queues which can hold as many memory return transactions as BL transactions in the PR/BL queue may generate. Unfortunately, however, this approach requires a large buffer area. The control logic necessary to control this larger buffer area also becomes very large and complex, and thus is more difficult to design and too slow to operate.
Moreover, because the number of PR transactions and the number of BL transactions that can be processed at any given time are each fixedly arranged and independent with respect to each other, a conventional shared memory system cannot dynamically adapt to process more of BL transactions or more of the PR transactions as the realtime need may require, and is thus inflexible.
Another prior attempt to address the deadlock problem is to provide a special entry type dedicated to handle write-back PR transactions. This approach requires an additional design effort, which is difficult, often “bug prone”, and the resulting design is often very inflexible.
Moreover, conventional memory access transaction queues allow only one MR transaction to be added per clock cycle, and thus are inefficient.
Thus, there is a need for more efficient method and device for memory access in a distributed shared memory system, which prevents occurrences of deadlocks, which does not require a large MR queue, and which does not require a design of a special dedicated entries for write back transactions.
There is also a need for more flexible method and device for memory access in a distributed shared memory system that may be dynamically adaptive to the current requirement for processing various memory access transactions.
There is also a need for a distributed queuing mechanism that allows multiple addition of entries in a clock cycle.
SUMMARY OF INVENTION
In accordance with the principles of the present invention, a method of preventing a deadlock in a distributed shared memory system having a memory access request transaction queue having a plurality of queue slots comprises preventing a blocking flow control class transaction from being processed in at least one of the plurality of queue slots.
In addition, in accordance with the principles of the present invention, an apparatus for preventing a deadlock in a distributed shared memory system having a memory access request transaction queue having a plurality of queue slots comprises a coherency controller configured to prevent a blocking flow control class transaction from being processed in at least one of the plurality of queue slots.
In accordance with another aspect of the principles of the present invention, a method of providing a distributed memory return transaction queue in a distributed shared memory system having a memory access request transaction queue processing one or more entries comprises providing each of the one or more entries a plurality of addition lines, providing a global pointer that indicates a last occupied posi
Hewlett--Packard Development Company, L.P.
Myers Paul R.
Phan Raymond N.
LandOfFree
Method and apparatus for preventing deadlock in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for preventing deadlock in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for preventing deadlock in a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3167551