Non-uniform memory access (NUMA) data processing system that...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S146000, C711S120000

Reexamination Certificate

active

06711652

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to data processing systems and, in particular, to non-uniform memory access (NUMA) and other multiprocessor data processing systems having improved queuing, communication and/or storage efficiency.
2. Description of the Related Art
It is well-known in the computer arts that greater computer system performance can be achieved by harnessing the processing power of multiple individual processors in tandem. Multi-processor (MP) computer systems can be designed with a number of different topologies, of which various ones may be better suited for particular applications depending upon the performance requirements and software environment of each application. One common MP computer topology is a symmetric multi-processor (SMP) configuration in which each of multiple processors shares a common pool of resources, such as a system memory and input/output (I/O) subsystem, which are typically coupled to a shared system interconnect. Such computer systems are said to be symmetric because all processors in an SMP computer system ideally have the same access latency with respect to data stored in the shared system memory.
Although SMP computer systems permit the use of relatively simple inter-processor communication and data sharing methodologies, SMP computer systems have limited scalability. In other words, while performance of a typical SMP computer system can generally be expected to improve with scale (i.e., with the addition of more processors), inherent bus, memory, and input/output (I/O) bandwidth limitations prevent significant advantage from being obtained by scaling a SMP beyond a implementation-dependent size at which the utilization of these shared resources is optimized. Thus, the SMP topology itself suffers to a certain extent from bandwidth limitations, especially at the system memory, as the system scale increases. SMP computer systems are also not easily expandable. For example, a user typically cannot purchase an SMP computer system having two or four processors, and later, when processing demands increase, expand the system to eight or sixteen processors.
As a result, an MP computer system topology known as non-uniform memory access (NUMA) has emerged to addresses the limitations to the scalability and expandability of SMP computer systems. As illustrated in
FIG. 1
, a conventional NUMA computer system
8
includes a number of nodes
10
connected by a switch
12
. Each node
10
, which can be implemented as an SMP system, includes a local interconnect
11
to which number of processing units
14
are coupled. Processing units
14
each contain a central processing unit (CPU)
16
and associated cache hierarchy
18
. At the lowest level of the volatile memory hierarchy, nodes
10
further contain a system memory
22
, which may be centralized within each node
10
or distributed among processing units
14
as shown. CPUs
16
access memory
22
through a memory controller
20
.
Each node
10
further includes a respective node controller
24
, which maintains data coherency and facilitates the communication of requests and responses between nodes
10
via switch
12
. Each node controller
24
has an associated local memory directory (LMD)
26
that identifies the data from local system memory
22
that are cached in other nodes
10
, a remote memory cache (RMC)
28
that temporarily caches data retrieved from remote system memories, and a remote memory directory (RMD)
30
providing a directory of the contents of RMC
28
.
The present invention recognizes that, while the conventional NUMA architecture illustrated in
FIG. 1
can provide improved scalability and expandability over conventional SMP architectures, the conventional NUMA architecture is subject to a number of drawbacks. First, communication between nodes is subject to much higher latency (e.g., five to ten times higher latency) than communication over local interconnects
11
, meaning that any reduction in inter-node communication will tend to improve performance. Consequently, it is desirable to implement a large remote memory cache
28
to limit the number of data access requests that must be communicated between nodes
10
. However, the conventional implementation of RMC
28
in static random access memory (SRAM) is expensive and limits the size of RMC
28
for practical implementations. As a result, each node is capable of caching only a limited amount of data from other nodes, thus necessitating frequent high latency inter-node data requests.
A second drawback of conventional NUMA computer systems related to inter-node communication latency is the delay in servicing requests caused by unnecessary inter-node coherency communication. For example, prior art NUMA computer systems such as that illustrated in
FIG. 1
typically allow remote nodes to silently deallocate unmodified cache lines. In other words, caches in the remote nodes can deallocate shared or invalid cache lines retrieved from another node without notifying the home node's local memory directory at the node from which the cache line was “checked out.” Thus, the home node's local memory directory maintains only an imprecise indication of which remote nodes hold cache lines from the associated system memory. As a result, when a store request is received at a node, the node must broadcast a Flush (i.e., invalidate) operation to all other nodes indicated in the home node's local memory directory as holding the target cache line regardless of whether or not the other nodes still cache a copy of the target cache line. In some operating scenarios, unnecessary flush operations can delay servicing store requests, which adversely impacts system performance.
Third, conventional NUMA computer systems, such as NUMA computer system
8
, tend to implement deep queues within the various node controllers, memory controllers, and cache controllers distributed throughout the system to allow for the long latencies to which inter-node communication is subject. Although the implementation of each individual queue is inexpensive, the deep queues implemented throughout conventional NUMA computer systems represent a significant component of overall system cost. The present invention therefore recognizes that it would advantageous to reduce the pendency of operations in the queues of NUMA computer systems and otherwise improve queue utilization so that queue depth, and thus system cost, can be reduced.
In view of the foregoing and additional drawbacks to conventional NUMA computer systems, the present invention recognizes that it would be useful and desirable to provide a NUMA architecture having improved queuing, storage and/or communication efficiency.
SUMMARY OF THE INVENTION
The present invention overcomes the foregoing and additional shortcomings in the prior art by providing a non-uniform memory access (NUMA) computer system and associated method of operation that provide precise notification of remote deallocation of a modified cache line.
In accordance with a preferred embodiment of the present invention, a NUMA computer system includes a remote node coupled by a node interconnect to a home node including a home system memory. The remote node includes a plurality of snoopers coupled to a local interconnect. The plurality of snoopers includes a cache that caches a cache line corresponding to but modified with respect to data resident in the home system memory. The cache has a cache controller that issues a deallocate operation on the local interconnect in response to deallocating the modified cache line. The remote node further includes a node controller, coupled between the local interconnect and the node interconnect, that transmits the deallocate operation to the home node with an indication of whether or not a copy of the cache line remains in the remote node following the deallocation. In this manner, the local memory directory associated with the home system memory can be updated to precisely reflect which nodes hold a copy of the cache line.
The above

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Non-uniform memory access (NUMA) data processing system that... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Non-uniform memory access (NUMA) data processing system that..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Non-uniform memory access (NUMA) data processing system that... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3234410

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.