Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2002-02-15
2004-10-19
Gaffin, Jeffrey A. (Department: 2182)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S141000, C710S020000
Reexamination Certificate
active
06807608
ABSTRACT:
TECHNICAL FIELD
The present invention relates to the field of cache coherency in a multiprocessor environment, and more particularly to a multiprocessor system supporting issuing and receiving requests of multiple coherency granules.
BACKGROUND INFORMATION
A multiprocessor system may comprise multiple processors coupled to a common shared system memory. Each processor may comprise one or more levels of cache memory (cache memory subsystem). The multiprocessor system may further comprise a system bus coupling the processing elements to each other and to the system memory. A cache memory subsystem may refer to one or more levels of a relatively small, high-speed memory that is associated with a particular processor and stores a copy of information from one or more portions of the system memory. The cache memory subsystem is physically distinct from the system memory.
A given cache memory subsystem may be organized as a collection of spatially mapped, fixed size storage region pools commonly referred to as “sets.” Each of these storage region pools typically comprises one or more storage regions of fixed granularity. These storage regions may be freely associated with any equally granular storage region (storage granule) in the system as long as the storage region spatially maps to the set containing the storage region pool. The position of the storage region within the pool may be referred to as the “way.” The intersection of each set and way contains a cache line. The size of the storage granule may be referred to as the “cache line size.” A unique tag may be derived from an address of a given storage granule to indicate its residency in a given set/way position.
When a processor generates a read request and the requested data resides in its cache memory subsystem, e.g., L
1
cache, then a cache read hit takes place. The processor may then obtain the data from the cache memory subsystem without having to access the system memory. If the data is not in the cache memory subsystem, then a cache read miss occurs. The memory request may be forwarded to the system and the data may subsequently be retrieved from the system memory as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from the system memory may be provided to the processor and may also be written into the cache memory subsystem due to the statistical likelihood that this data will be requested again by that processor. Likewise, if a processor generates a write request, the write data may be written to the cache memory subsystem without having to access the system memory over the system bus.
Hence, data may be stored in multiple locations, e.g., the cache memory subsystem of a particular processor as well as system memory. If another processor altered the contents of a system memory location that is duplicated in a first processor's cache memory subsystem, the cache memory subsystem may be said to hold “stale” or invalid data. Problems may result if the first processor inadvertently referenced this on a subsequent read. Therefore, it may be desirable to ensure that data is consistent between the system memory and caches. This may commonly be referred to as “maintaining cache coherency.” In order to maintain cache coherency, therefore, it may be necessary to monitor the system bus when the processor does not control the bus to see if another processor accesses system memory. This method of monitoring the bus is referred to in the art as “snooping.”
Each processor's cache memory subsystem may comprise a snooping logic unit configured to monitor the bus for the addresses requested by other processors. Each snooping logic unit may further be configured to determine if a copy of an address requested by another processor is within the cache memory subsystem associated with the snooping logic unit. The snooping logic unit may determine if a copy of the address requested by another processor is within the cache memory subsystem associated with the snooping logic unit using a protocol commonly referred to as Modified, Exclusive, Shared and Invalid (MESI). In the MESI protocol, an indication of a coherency state is stored in association with each unit of storage in the cache memory subsystem. This unit of storage is referred to a coherency granule and is typically the size of a cache line. Each coherency granule may have one of four states, modified (M), exclusive (E), shared (S), or invalid (I), which may be indicated by two or more bits in the cache directory. The modified state may indicate that a coherency granule is valid only in the cache memory subsystem containing the modified or updated coherency granule and that the value of the updated coherency granule has not been written to system memory. When a coherency granule is indicated as exclusive, the coherency granule is resident in only the cache memory subsystem having the coherency granule in the exclusive state. However, the data in the exclusive state is consistent with system memory. If a coherency granule is marked as shared, the coherency granule is resident in the associated cache memory subsystem and may be in at least one other cache memory subsystem in addition to the system memory. If the coherency granule is marked as shared, all of the copies of the coherency granule in all cache memory subsystems so marked are consistent with the system memory. Finally, the invalid state may indicate that the data and the address tag associated with the coherency granule are both invalid and thus are not contained within that cache memory subsystem.
Typically, in a multiprocessor system, the cache memory subsystems associated with the various processors may comprise a plurality of cache line sizes. Such a system may be considered a heterogeneous multiprocessor system. In such a system, the size of the coherency granule for the system is considered to be the size of the smallest coherency granule for any entity within the system. Thus, when a processor with a relatively larger cache line size performs a read or write operation for a cache line in the system, the operation may be associated with a plurality of coherency granules in the system. Similarly, a system may contain some non-processor entities, such as an I/O device or a DMA (Direct Memory Access) controller. Such non-processor entities may also perform operations in the system, which are associated with a particular block of memory. The size of the operation may vary and may consist of a plurality of coherency granules within the system.
When an operation is associated with a plurality of coherency granules, then as part of the operation the snooping logic associated with each processor may examine the coherency status of each of these coherency granules and respond accordingly. This may be accomplished by performing the operation as a series of independent requests where each request may consist of a single coherency granule. By issuing separate requests for each coherency granule involved in the operation, several additional bus cycles may be used and additional power may be consumed. These additional bus cycles and additional power may be associated with the independent requests themselves and the responses by the slaves to those independent requests. The additional bus cycles and additional power may also be associated with the independent snooping operations that may be performed by the snooping logic associated with each of the processors in the system. Alternatively, the system may perform the multi-coherency granule operation as a single request, but the snooping logic associated with each processor in the system may provide a single snoop response for the entire operation. The system in turn may have to wait for the snooping logic associated with each processor in the system to complete all of the snoop operations associated with the request before proceeding to initiate the transfer of data between the master entity making the request and the slave device for which the request is targeted. Again this procedure involves additional delay in performing the
Augsburg Victor Roberts
Dieffenderfer James Norris
Drerup Bernard Charles
Hofmann Richard Gerard
Sartorius Thomas Andrew
Gaffin Jeffrey A.
International Business Machines - Corporation
Kim Harold
Reid Scott W.
Winstead Sechrest & Minick P.C.
LandOfFree
Multiprocessor environment supporting variable-sized... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multiprocessor environment supporting variable-sized..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiprocessor environment supporting variable-sized... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3308714