Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-09-15
2002-09-24
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S122000, C711S129000, C711S130000
Reexamination Certificate
active
06457100
ABSTRACT:
This invention provides a novel non-hierarchical nodal structure for a highly-scaleable high-performance shared-memory computer system having simplified manufacturability. The invention supports a large range of system scaleability using a small number of types of hardware chip components. The system may include a large number of replicated processor chips of each of these types, in which a large system memory is shareable by all processors in the system. The large shared memory is generally comprised of subsets of DRAM chips respectively connected to the processor chips (though other types of memory technology such as SRAM can be substituted). Data in any DRAM subset is accessible to any processor in the system using the same address in an instruction being executed by any processor. Thus, the same memory addresses may be used in the executable instructions of all processors in the system. A unique type of memory busing connects each processor chip to a respective subset of DRAMs in the shared memory to enable faster memory access by the processor directly connected to the DRAM subset. Bus conflicts commonly occurring in shared memories with prior art memory bus designs are minimized by this invention, even though all of the DRAMs in the same shared system memory are addressable by all processors. The subsets of DRAMs need not have equal sizes. A group of the DRAM subsets with their directly connected processors comprise a node of the shared-memory system, in which each node may have a nodal cache with a nodal directory and nodal electronic switches. Multiple nodes may be connected together by internodal buses connected between the nodal caches of the nodes, while including all nodes within a single distributed shared-memory system, in which the nodal directories manage processor accesses to/from, and the coherence of data in, all nodes comprising the system shared memory.
BACKGROUND OF THE INVENTION
Prior Memory System Limitations
This invention does not use any communication links or a “message protocol” to communicate among its nodes, as is often found in prior art nodal systems. Prior systems often provide a memory in each node operating independently of the memory in any other node, which therefore cannot be an internodal shared memory. Such prior systems may include an intra-nodal shared memory within a node limited to being shared only among the processors within its single node. Such prior systems do not allow, and cannot allow, access to their so-called share memories by a processor in a different node without violating coherence requirements in a system essential to preserving the integrity of the data in the memories.
On the other hand, the subject invention allows internodal access to all of its nodal DRAMs by a processor in any node in a system while assuring system coherence of all data in all of the DRAMs in all nodes of the system. Further, the subject invention combines multiple and separately connected DRAMs into a single shared memory whether the DRAMs are in a single node system or in a multiple node system, which are usable by all processors in all nodes of the entire system. Thus, a processor in any node of this invention can address and access data located in any other node by a direct memory access, which access may occur during processor execution of an instruction requiring an operand stored in a DRAM in a different node. No messaging, or packet processing, is used by this invention to access data in a node or between different nodes of a system.
Without internodal cache coherence controls, accessing data from another node could destroy system data integrity. When data is copied between independent nodal memories for execution without adequate coherence controls, there is no assurance that the value of copied data items will not be change in a way uncoordinated with its other copies in the system that could adversely affect the integrity of the data in the system. Coherence controls prevent unknown versions of copies of a data item from being used that may result in obtaining false processing results. The majority of prior art on coherency controls deals with intra-nodal shared memories where a single centralized mechanism is used to maintain coherency.
The prior art dealing with internodal shared memories and distributed coherency mechanisms generally deal with one of three topics: 1) interconnect topologies scaling to a large number of nodes with little attention to the details for maintaining cache coherency across nodes, 2) interface components to interconnect the nodes to an interconnect network, again with little attention to the methodology of maintaining cache coherency across nodes, or 3) maintaining internodal cache coherency through the use of special coherency directories, coherency information stored with the memory arrays, or other special interface and switch components which add extra costs and complexity to the system design and packaging.
In the prior art, shared memory computer systems use hardware coherence controls for checking all operand accesses to detect and control all changes to data anywhere in the shared memory for maintaining the integrity of the data. Coherence checking assured that a data item stored anywhere in the shared memory provides at a given time the same value to all processes using the data item, regardless of which process or processor in the system changes or uses the data item, and regardless of which part of the shared memory stores the data item.
However the design of the conventional shared-memory controllers in prior shared-memory systems limit the scaleability of a system, because conventional controllers are generally designed for the maximum number of processors and maximum size memory, so that they may be scaled up to that maximum size system, even though the controller is installed in a system configuration having smaller number of processors and memory size. As a consequence, the initial cost of such conventional controller does not decrease for system sizes below the maximum, which restricts such conventional systems to a very narrow range of processor and memory scaleability.
Conventional shared-memory controllers often have a common bus provided between the memory controller and the shared memory. The common bus is shared by a large number of processors, and sometimes all processors, in the system. This bus sharing causes bus contention among all concurrent memory addresses concurrently contending for the bus, and only the winning address gets the next access to the shared memory. This all-address conflicting-bus controller design suffers from bandwidth limitations, decreasing the speed of concurrent access requests by multiple processors to the shared memory. Also, latency penalties are suffered while processors are waiting for their access request to use the conventional controller's shared bus. Such prior common storage controller bus designs must therefore be initially built for handling maximum traffic on the bus by the maximum number of processors in a system, which increases the cost of smaller systems using the same memory controller and its busing. Continued increases in semiconductor processor speed have increased the bandwidth and latency mismatch between the processors, their storage controller, and their common bussing in prior art shared memory systems.
An example of a common bus provided between a memory and multiple processors within the same node is disclosed in U.S. Pat. No. 5,524,212 to Somani et al which provides a centralized arbiter of a shared memory bus within its shared memory bus controller for controlling a common memory bus internal to a node. That patent does not disclose inter-nodal shared memory.
Recent trends in semiconductor technology and software design are making more severe the above-described bus conflict problems. The speed of on-chip CMOS circuits is increasing faster than the speed of off-chip drivers and associated buses. Many prior art designs already have internal processor speeds that are many times that of the off chip bus speeds, and the dis
Goldiran Gottfried Andreas
Heller, Jr. Thomas James
Ignatowski Michael
Augspurger Lynn L.
Goldman Bernard M.
Kim Matthew
Peugh B. R.
LandOfFree
Scaleable shared-memory multi-processor computer system... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Scaleable shared-memory multi-processor computer system..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Scaleable shared-memory multi-processor computer system... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2835050