Symmetric multiprocessor coherence mechanism

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S122000, C711S119000

Reexamination Certificate

active

06760819

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to data processing systems in general and, in particular, to improved cache operations within a data-processing system. Still more particularly, the present invention relates to an improved method, system, and processor cache topology that more efficiently supports cache coherency operations within a data-processing system.
2. Description of the Prior Art
A data-processing system typically includes a processor coupled to a variety of storage devices arranged in a hierarchical manner. In addition to a main memory, a commonly employed storage device in the hierarchy includes a high-speed memory known as a cache memory (or cache). A cache speeds up the apparent access times of the relatively slower main memory by retaining the data or instructions that the processor is most likely to access again, and making the data or instructions available to the processor at a much lower latency. As such, caches enable relatively fast access to a subset of data and/or instructions that were recently transferred from the main memory to the processor, and thus improves the overall speed of the data-processing system.
Most contemporary high-performance data processing system architectures include multiple levels of cache memory within the memory hierarchy. Cache levels are typically employed in progressively longer access latencies. Smaller, faster caches are employed at levels within the storage hierarchy closer to the processor (or processors) while larger, slower caches are employed at levels closer to system memory.
In a conventional symmetric multiprocessor (SMP) data processing system, all of the processors are generally identical, insofar as the processors all utilize common instruction sets and communication protocols, have similar hardware architectures, and are generally provided with similar memory hierarchies. For example, a conventional SMP data processing system may comprise a system memory, a plurality of processing elements that each include a processor and one or more levels of cache memory and a system bus coupling the processing elements to each other and to the system memory. Many such systems include at least one level of cache memory shared between two or more processors. To obtain valid execution results in a SMP data processing system, it is important to maintain a coherent memory hierarchy, that is, to provide a single view of the contents of memory to all of the processors.
A coherent memory hierarchy is maintained through the use of a selected memory coherency protocol, such as the MESI protocol. In the MESI protocol, an indication of a coherency state is stored in association with each coherency granule (i.e., cache line) of at least all upper level (cache) memories. Each coherency granule can have one of four states, modified (M), exclusive (E), shared (S), or invalid (I), which can be encoded by two bits in the cache directory. Those skilled in the art are familiar with the MESI protocol and its use to ensure coherency among memory structures.
Each cache line (block) of data in a SMP system typically includes an address tag field, a state bit field, an inclusivity bit field, and a value/data field for storing the actual instruction or data. In current processing systems, both the address tag field and the state bit field are contained in a cache directory. This cache directory may be organized under any caching scheme available, such as fully associative, direct mapped, or set-associative, as are well-known in the art. A compare match of an incoming address with one of the tags within the address tag field indicates a cache “hit.”
Current implementation of a coherent Symmetric MultiProcessor (SMP) requires a coherence bus on which all memory transactions that change the state of any of the lines in the caches can be “snooped” (i.e., observed) by all processors. In response to a snoop operation by a processor, all the processors must interrogate their cache directories to identify if the line involved in the transaction was cached in that processor cache. This process may also involve broadcasting of the snoop out to the coherency buses. If a matching directory entry is found, indicating that the cache line is present, the cache line may have to be written back to the next level of cache, written back to main memory, or invalidated, depending on the transaction observed. This coherency scheme has the disadvantage that either the processors must arbitrate for a coherence bus, and thus incur delays, or that a separate coherence bus must be provided for each processor as illustrated in
FIG. 1A
, requiring the implementation of a large number of external connections (pins) on the limited real estate of the processor.
As shown by
FIG. 1A
, SMP comprises four processing modules
101
A-
101
D, each having a respective central processing unit (CPU)
103
A-
103
D and level 1 (L1) cache
105
A-
105
D. L1 caches
105
A-
105
D each have an associated L1 directory
107
A-
107
D, which are interconnected to each other via a series of cache coherency buses
111
. Cache coherency buses
111
extend from pins (connectors) of processing modules
101
A-
101
D to other pins of the other processing modules and to the L2 cache
109
. The number of pins required for the connections and the real estate required for the coherency buses are dependent on the number of processors within the multi-processing system that support coherency operations. Thus, with current 32-way, 64-way, and larger SMPs, the number of required pins and complexity of coherency buses may be prohibitive to further development of large SMPs on progressively smaller real estate.
An alternative coherency scheme currently being utilized provides a “directory-based coherence,” by which the state information of the L1 directories is included in the L2 directory.
FIG. 1B
illustrates this coherency scheme. As shown, L2 directory
156
contains the directory entries of L1 directory
155
A-
155
D. However, since there are typically many more lines in the L2 cache
159
than in the combined L1 caches or processors
151
A-
151
D, the directory-based scheme utilizes more chip area, requires a large amount of storage devoted to coherence, and therefore takes a longer time to interrogate (snoop) the L1 directory because the entire L2 directory
155
has to be viewed.
In light of the foregoing, the present invention recognizes that it would be desirable to provide a processor-cache configuration that supports more efficient coherency operations without requiring additional hardware. A processor-cache configuration that reduces the number of cache coherency buses and associated coherency bus transactions required to support coherency would be a welcomed improvement. These and other benefits are provided by the invention described herein.
SUMMARY OF THE INVENTION
Disclosed is a processor-cache configuration and operational scheme within a multi-processor data processing system having a shared lower level cache (or memory) by which the number of coherency busses is reduced and more efficient snoop resolution and coherency operations with the processor caches are provided. A copy of the processor's internal (L1) cache directory is provided within the lower level (L2) cache or memory. Lower level snoop operations and coherency operations directed to the L1 cache are evaluated and completed utilizing the copy of the L1 directory in the L2 cache. Updates to the coherency states of the copy of the L1 directory are mirrored in the L1 directory and L1 cache. The configuration and operational scheme eliminates the need for the individual coherency buses interconnecting each processor that is coupled to the L2 cache and speeds up coherency operations because the snoops do not have to be transmitted to the L1 caches for initial resolution.
In the preferred embodiment, the L1 directory and L1 directory copy are initialized during system boot. A processor request for update is received and a check is made for a snoop hit in the L2 cache and in the copy of the L1 d

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Symmetric multiprocessor coherence mechanism does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Symmetric multiprocessor coherence mechanism, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Symmetric multiprocessor coherence mechanism will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3237597

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.