Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2000-07-25
2002-03-12
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
Reexamination Certificate
active
06356983
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates, in general, to microprocessor systems, and, more particularly, to systems and methods for providing cache coherency and atomic transactions in a multiprocessor computer system.
2. Relevant Background
Microprocessors manipulate data according to instructions specified by a computer program. The instructions and data in a conventional system are stored in memory which is coupled to the processor by a memory bus. Computer programs are increasingly compiled to take advantage of parallelism. Parallelism enables a complex program to be executed as a plurality of less complex routines run in at the same time to improve performance.
Traditionally, microprocessors were designed to handle a single stream of instructions in an environment where the microprocessor had full control over the memory address space. Multiprocessor computer systems were developed to decrease execution time by providing a plurality of data processors operating in parallel. Early multiprocessor systems used special-purpose processors that included features specifically designed to coordinate the activities of the plurality of processors. Moreover, software was often specifically compiled to a particular multiprocessor platform. These factors made multiprocessing expensive to obtain and maintain.
The increasing availability of low-cost high performance microprocessors makes general purpose multiprocessing computers feasible. As used herein the terms “microprocessor” and “processor” include complex instruction set computers (CISC), reduced instruction set computers (RISC) and hybrids. However, general purpose microprocessors are not designed specifically for large scale multiprocessing. Some microprocessors support configurations of up to four processors in a system. To go beyond these limits, special purpose hardware, firmware, and software must be employed to coordinate the activities of the various microprocessors in a system.
Memory management is one of the more difficult coordination problems faced by multiprocessor system designers. Essentially, the problems surround ensuring that each processor has a consistent view of data stored in memory at any given time while each processor is operating simultaneously to access and modify the memory contents. This problem becomes quite complex as the number of processors increases.
Two basic architectures have evolved sometimes referred to as “shared memory” and “distributed memory”. Distributed memory assigns a unique range of memory to each processor or to a small group of processors. In a distributed memory system only the small number of processors assigned to a given memory space need to coordinate their memory access activities. This assignment greatly simplifies the tasks associated with accessing and manipulating the memory contents. However, because the memory is physically partitioned there is less synergism and therefore less performance gain achieved by adding additional processors.
It is advantageous if all of the processors share a common memory address space. In a shared memory system hardware and software mechanisms are used to maintain a consistent view of the memory from each processor's perspective. Shared memory enables the processors to work on related processes and share data between processes. Shared memory systems offer a potential for greater synergism between processors, but at the cost of greater complexity in memory management.
One of the key advances that have resulted in microprocessors performance improvements is the integration of cache memory on chip with other microprocessor functional units. Cache memory enables a processor to store copies of frequently used portions of data and instructions from the main memory subsystem in a closely coupled, low latency cache memory. When data is supplied by cache memory, the latency associated with main memory access is eliminated. Inherent in a cache memory system is a need to guarantee coherency between the data copy in one or more processor cache(s) and the main memory itself.
In a single processor system coherency is a relatively straightforward matter because only one cache system hierarchy exits for a given memory address space. Likewise in partitioned memory systems one, or at most a few, cache subsystems corresponds to a single memory partition. However, in shared memory systems a given memory location may be stored in any cache of any processor in the system. In order for one processor to manipulate the data in its own cache or main memory it must ensure that no other processor can manipulate the data at the same time. In typical systems this requires high bandwidth communication between the cache subsystems and/or processors. Further, conventional microprocessors require specialized hardware support to scale the operating system and application software to operate on processor counts greater than four to eight processors.
Cache systems are organized as a plurality of cache lines that are typically accessed as an atomic unit. Each cache line is associated with a set of state information that indicates, for example, whether the cache line is valid (i.e., coherent with main memory), whether it is “dirty” (i.e., changed) and the like. Shared memory systems often use a (multi-state) protocol where each cache line includes state information. For example, a “MESI” protocol includes state information indicating whether the cache line is modified, exclusive, shared or invalid (MESI). Alternative coherency protocols include “update” protocols that send a new data value to each processor holding a cache copy of the value to update each cache copy. Ownership protocols pass an owner token among caches to indicate which cache has write permission and which cache holds the most recent version of the data.
In a MESI system, when an unshared cache line is accessed it is marked exclusive (E). A subsequent read does not change the state, but a subsequent write to the cache line changes the state to modified (M). If another processor is seen to load the data into that processor's cache, the line is marked shared (S). In order to write data to a shared cache line, an invalidate command must be sent to all processors, or at least to all processors having a copy of the shared data. Before a processor can load data from a modified line the processor having the modified cache line must write the data back to memory and remark it as shared. Any read or write to a cache line marked invalid (I) results in a cache miss.
Similar issues exist for any atomic memory operation. An atomic memory operation is one in which a read or write operation is made to a shared memory location. Even when the shared memory location is uncached, the atomic memory operation must be completed in a manner that ensures that any processors that are accessing the shared memory location are prevented from reading the location until the atomic operation is completed.
In shared memory multiprocessor systems in which all processors and main memory are physically connected using a common bus, a processor can query or “snoop” this state information of the other processors. Moreover, the requesting processor can manipulate this state information to obtain desired access to a given memory location by, for example, causing a cache line to be invalidated. However, accessing the cache of each processor by snooping is a time consuming process. Moreover, it is disruptive because the snoop request must arbitrate for cache access with the multiple ongoing cache requests generated by the processor's efforts to execute its own instructions. As the number of processors grows the overhead associated with this type of coherency protocol becomes impractical.
Other coherency systems require that the processors provide replacement hints to the memory management system. These hints provide a mechanism by which the processors cooperate in the indication of the current state of a cache line (e.g., whether the cache line remains exclusive to a particular processor). Alth
Elmore Stephen
Hogan & Hartson LLP
Kim Matthew
Kubida William J.
Langley Stuart T.
LandOfFree
System and method providing cache coherency and atomic... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method providing cache coherency and atomic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method providing cache coherency and atomic... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2859644