Hybrid NUMA/S-COMA system and method

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S124000, C711S141000, C711S148000, C711S209000

Reexamination Certificate

active

06275900

ABSTRACT:

TECHNICAL FIELD
The present invention relates to the field of distributed shared memory systems and caches. More particularly, the invention relates to a hybrid architecture wherein a first type of memory (simple COMA) is built atop and integral with another type of memory (NUMA).
DEFINITIONS
The following terms are used in this document:
Global Memory: Refers to memory objects which are addressable by processes on different nodes. Created and attached in a UNIX System V like way, and attached into the effective address space of each process which wants to address the global memory object.
DSM: Distributed Shared Memory. A class of architectures which provide the function of shared memory, even though the physical memory is distributed among nodes in the system.
S-COMA: Simple Cache Only Memory Architecture. A DSM scheme in which each node reserves a portion of its local memory to be used as a cache for global memory. This cache is managed through a combination of S-COMA software and hardware. Processes reference the data through process specific virtual addresses, node memory hardware references the data through local real addresses, and S-COMA hardware passes global addresses between nodes. The S-COMA subsystem takes care of translating between local real addresses and global addresses.
NUMA: Non Uniform Memory Access. A DSM scheme in which each of the n nodes in a system holds 1
of the real memory (and real address space) of the system. Processes reference data through the virtual address, and node memory hardware references data through the real address. The NUMA infrastructure passes real addresses between nodes.
UMA: Uniform Memory Access. A shared memory organization whereby any processor can reference any memory location in equal (uniform) time.
Boundary Function (BF): A layer or logical function which performs a set of actions at the boundary of a node. In this invention, the BF performs address translation for addresses entering or leaving a node through the DSM subsystem.
Client: A node which references (caches) data, but is not the home for the data.
Home: A node which is the owner of the data or the owner of the directory which manages the data coherence.
Latency: The delay associated with a particular action or operation, such as fetching data from memory.
Snooping logic: Logic which monitors (snoops) a line or bus, looking for particular addresses, tags, or other key information.
Network logic: Logic which interfaces to a network or communication fabric.
Real address space: The range of real addresses generated by address translation. The addresses of the physical memory.
Local real address: A real address which applies to a local node.
Global real address: A real address which applies to all nodes.
Physical address: A real address. The address of physical memory.
Input address: The address provided as input to a component.
Associated address: In a data structure which consists of pairs of addresses, the second address of the pair, the first address being the input address.
BACKGROUND OF THE INVENTION
Shared memory multiprocessor systems allow each of multiple processors to reference any storage location (memory) in the system through read and write (load and store) operations. The underlying structure of the shared memory is hidden from the processors, or programs, except insofar as performance is concerned.
A single memory location may be updated by multiple processors. The result is a single sequence of updates, and all processors see the updates to that memory location in the same order. This property is known as “coherence”. On a coherent system, no processor can see a different order of updates than another processor.
Cache coherent, shared memory multiprocessor systems provide caches to the memory structure in order to improve performance (reduce latency) of memory accesses. Because the caches are kept coherent, the characteristic of a single sequence of updates for a given memory location, as seen by all processors in the system, is maintained.
The system architectures discussed in this patent are cache coherent, shared memory multiprocessor systems. Three specific variations of these systems are described below, namely, UMA, NUMA, and S-COMA.
“UMA” refers to Uniform Memory Access, and describes a system architecture wherein multiple processors in a computer system share a real address space, and the memory latency from any processor to any memory location is the same or uniform. That is, a given processor can reference any memory location in uniform time. Most modern symmetric multi-processors (SMP) are UMA systems.
FIG. 1
shows a typical UMA system
10
configuration. A number of processors
12
are connected to a common system bus
14
, as is a memory
16
. Because the path from any processor
12
to any location in memory
16
is the same (i.e., across the system bus), the latency from any processor to any memory location is the same.
FIG. 1
also shows caches
18
. There must be a cache coherence protocol which manages caches
18
and ensures that updates to a single memory location are ordered, so that all processors will see the same sequence of updates. In UMA systems, such as the one depicted, this is frequently accomplished by having each cache controller “snoop” on the system bus. This involves observing all transactions on the bus, and taking action (i.e., participating in the coherence protocol) when an operation on the bus refers to a memory location which is being held in the snooper's cache.
The benefit in this type of organization is that parallel programming is simplified, in that processes can be less sensitive to data placement; i.e., data can be accessed in a particular amount of time, regardless of the memory location used to hold the data.
The drawback of this type of organization is that UMA systems do not scale well. As larger and larger systems are designed (with more and more processors and memory), it becomes increasingly difficult and costly to maintain the uniformity of memory access times. Furthermore, schemes which require cache controllers to snoop require a common communications medium, such as a common system bus, for data addresses. However, the system bus is a serial resource which becomes overloaded as more processors and more memory operations are placed on it. When the system bus is saturated, the addition of more or faster processors does not improve system performance.
A further system variation is “NUMA”, which refers to Non-Uniform Memory Access, and describes a system architecture wherein multiple processors in a computer system share a real address space where memory latency varies depending on the memory location being accessed. That is, some memory locations are “closer” to some processors than to others. Unlike an UMA system, all memory locations are not accessible from a given processor in equal time; i.e., some memory locations take longer to access than others, hence memory access times are non-uniform.
As shown in
FIG. 2
, a NUMA system implements distributed shared memory; i.e., the total system memory is the sum of memories M
1
, M
2
, M
3
in nodes
22
. There is a single real address space which is shared by all the nodes
22
in the system
20
and, in
FIG. 2
, each node contains one third of the system memory. Each node
22
includes an UMA system
10
. A number of nodes are connected to a common communications fabric or network
24
, each through a Network Interface (NI)
26
.
A processor in one node may access a memory location in another node via a load or store instruction. The NUMA Memory Controller (NMC)
28
function is responsible for capturing the memory request on the local node's system bus and forwarding it to the node which contains the target memory location (i.e., the home node). Because the path from one processor to a remote memory location is further than the path from the same processor to a local memory location, the memory access times are non-uniform.
As with the UMA system, caches are kept coherent through some protocol. All processors on all nodes will see

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Hybrid NUMA/S-COMA system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Hybrid NUMA/S-COMA system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hybrid NUMA/S-COMA system and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2509149

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.