Electrical computers and digital processing systems: memory – Storage accessing and control – Shared memory area
Reexamination Certificate
2000-12-22
2004-05-18
Padmanabhan, Mano (Department: 2188)
Electrical computers and digital processing systems: memory
Storage accessing and control
Shared memory area
C711S124000, C711S130000, C711S147000, C711S148000
Reexamination Certificate
active
06738870
ABSTRACT:
FIELD OF THE INVENTION
This invention is related to computer systems and particularly to a high speed remote storage controller.
BACKGROUND OF THE INVENTION
Today's e-business environment places great demands on the computer systems that drive their infrastructure. This is especially true in the areas of system performance and availability due in large part to the increasing amount of data sharing and transaction processing inherent in large system applications. Another aspect of the e-business infrastructure is the unpredictability of the workloads which mandate the underlying computer systems to be highly scalable. However, the importance of additional performance and salability must always be tempered by the cost of the systems.
Historically system architects have used various means to achieve high performance in large tightly coupled symmetrical multiprocessor (SMP) computer systems. They range from coupling individual processors or processor clusters via a single shared system bus, to coupling processors together in a cluster, whereby the clusters communicate using a cluster-to-cluster interface, to a centrally interconnected network where parallel systems built around a large number (i.e. 32 to 1024) of processors are interconnected via a central switch (i.e. a crossbar switch).
The shared bus method usually provides the most cost efficient system design since a single bus protocol can service multiple types of resources. Furthermore, additional processors, clusters or peripheral devices can be attached economically to the bus to grow the system. However, in large systems the congestion on the system bus coupled with the arbitration overhead tends to degrade overall system performance and yield low SMP efficiency. These problems can be formidable for symmetric multiprocessor systems employing numerous processors, especially if they are running at frequencies that are two to four times faster than the supporting memory subsystem.
The centrally interconnected system usually offers the advantage of equal latency to shared resources for all processors in the system. In an ideal system, equal latency allows multiple applications, or parallel threads within an application, to be distributed among the available processors without any foreknowledge of the system structure or memory hierarchy. These types of systems are generally implemented using one or more large crossbar switches to route data between the processors and memory. The underlying design often translates into large pin packaging requirements and the need for expensive component packaging. In addition, it can be difficult to implement an effective shared cache structure.
The tightly coupled clustering method serves as the compromise solution. In this application, the term cluster refers to a collection of processors sharing a single main memory, and whereby any processor in the system can access any portion of the main memory, regardless of its affinity to a particular cluster. Unlike Non-Uniform Memory Access (NUMA) architectures, the clusters referred to in our examples utilize dedicated hardware to maintain data coherency between the memory and the hierarchical caches located within each cluster, thus presenting a unified single image to the software, void of any memory hierarchy or physical partitions such as memory bank interleaves. One advantage of these systems is that the tightly coupled nature of the processors within a cluster provides excellent performance when the data remains in close proximity to the processors that need it such as the case when data resides in a cluster's shared cache or the memory bank interleaves attached to that cluster. In addition, it usually leads to more cost-efficient packaging when compared to the large N-way crossbar switches found in the central interconnection systems. However, the clustering method can lead to poor performance if processors frequently require data from other clusters, and the ensuing latency is significant, or the bandwidth is inadequate.
One of the ways to combat the performance problem is the use of large shared caches within each cluster. Shared caches are inherently more efficient in large data sharing applications such as those typical of the e-business environment. But even in the most efficient system, the need eventually arises to transfer data across clusters. Therefore, system performance in these types of computer structures can be influenced by the latency involved with cross cluster data transfers. Historically, system performance issues tended to focus on processor fetch operations and minimizing the associated latency of data fetches from the hierarchical caches and main memory.
However, in complex systems like the IBM e-server Z-Series, the fetch is typically just one piece contributing to the system performance. For example, a fetch may necessitate casting aged data out of a clustered cache to make room for the desired fetch data. In addition, one processor's fetch may be competing for the inter nodal data busses with work from the other processors and/or I/O adapters. These operations involve not only fetches for other processors, but cast outs of aged data from a cache on one cluster to main memory on the remote cluster or fetches and stores from the I/O adapters. The need to accommodate all these types of inter nodal operations demands a multitude of large data busses between the clusters. Unfortunately packaging restrictions typically limit the amount of available bandwidth on the inter nodal data bus. Therefore, to truly maximize overall system throughput, performance improvements must be made to all types of inter nodal data transfers, not just processor fetches.
With the disparate rate of advance between processor next generation processors and memory, components such as the system memory controller become increasingly more valuable to overall system throughput. The inventions cited herein provide many improvements in the area of memory and the corresponding controllers, however they fail, both independently and in conjunction with each other, to address all aspects found in the present invention.
U.S. Pat. No. 5,664,162, entitled Graphics Accelerator with Dual Memory Controller, focuses on performing memory accesses with respect to a graphics processor. This invention teaches improvements pertaining to address format translations, frame buffer remapping, object drawing and other tasks related to rendering graphical images using a computer system. U.S. Pat. No. 5,239,639, entitled Efficient Memory Controller with an Independent Clock, provides a means to synchronize the timing of a memory controller with a CPU, without requiring the memory controller and CPU to share the same operating frequency. U.S. Pat. No. 5,896,492, entitled Maintaining Data Coherency Between a Primary Memory Controller and a Backup Memory Controller, describes a fault tolerant memory controller to ensure data availability in the event of a memory controller failure.
U.S. Pat. No. 5,835,947, entitled Central Processing Unit and Method for Improving Instruction Cache Miss Latencies Using an Instruction Buffer Which Conditionally Stores Additional Addresses, U.S. Pat. No. 3,611,315, entitled Memory Controller System for Controlling a Buffer Memory, and U.S. Pat. No. 5,778,422, entitled Data Processing System Memory Controller that Selectively Caches Data Associated with Write Requests, all concentrate on pre fetching instructions or caching data accesses into memory buffers to reduce latency on subsequent CPU fetches. Although the aforementioned inventions teach various improvements in memory controllers, they all fail to address performance issues associated with accessing a shared memory in a symmetric multiprocessing (SMP) computer system.
U.S. Pat. No. 5,752,066, entitled Data Processing System Utilizing Programmable Microprogram Memory Controller, describes a single system-level interface to be presented to the operating system and application programs by allowing a plurality of memory configurations to be reprogrammed via micro code. Unlike
Blake Michael A.
Mak Pak-Kin
Van Huben Gary A.
Augspurger Lynn L.
International Business Machines - Corporation
Padmanabhan Mano
Ross John M
LandOfFree
High speed remote storage controller does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with High speed remote storage controller, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and High speed remote storage controller will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3225825