Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-09-02
2003-09-02
Lane, Jack A. (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
Reexamination Certificate
active
06615323
ABSTRACT:
TECHNICAL FIELD
The present invention relates to high-performance multiprocessor memory subsystems, and more particularly to maintaining cache coherency on multiprocessors using a snoopy protocol.
BACKGROUND OF THE INVENTION
Symmetric multiprocessing systems include multiple processing elements connected to main memory via an interface. Each processing element may contain one or more levels of cache—fast memory close to the central processing unit. A cache contains copies of memory data in blocks, typically 32 or 64 bytes. Each cache block can contain an exact copy of the main memory data, or newer data if the processor has performed stores to the block. Only one processing element may contain, i.e. own data which is different from main memory. To enforce this rule of the cache coherency, each cache block contains state information.
In the MESI protocol, a cache block may be in one of four states:
INVALID—the block is not resident in the cache;
SHARED—the block is resident in the cache and contains the latest data, which is the same data as main memory, but other processing elements may also have copies,
EXCLUSIVE—the block is resident in the cache and contains the latest data, which is the same data as main memory, and no other processing element has a copy, or
MODIFIED—the block is resident in the cache and contains the latest data, which may be newer than that in the main memory.
If a processing element performs a load to a block marked Invalid in its cache, it must perform a load request to main memory. If a processing element performs a store to a block marked Shared or Invalid in its cache, it must perform a store request to main memory. In the case of a block marked Shared, the store request to main memory is not to read the data, since it already has a copy of the latest data, but to claim ownership of the block, i.e. set the state to Modified, and ensure that no other processing element retains a stale copy of data. In other words, a block otherwise valid is stale if it does not contain the latest data.
If a processing element performs a store to a block marked Exclusive, the processing element may mark the block Modified without performing a bus request.
Exclusive implies no other processing element has a copy of the data. Loads to blocks marked Shared, Exclusive, or Modified, or stores to blocks marked Modified, may also be performed without a bus request, and do not change the MESI state of the block.
In a snoopy protocol, every time a processing element requires a copy of data not already resident in its cache, it performs a request to main memory. For a system with one interface to main memory, all processing elements are connected to the same request bus, and can ‘see’ all requests from other processing elements. The process for capturing requests from other processor elements off the request bus, looking up the state of the block of memory in one's own cache (s), and responding with state information is called a snoop. Typically, each processing element performing a snoop must respond with its state or a retry signal within a fixed number of bus cycles.
In a system using the MESI protocol, a processing element which has a block in the Shared or Exclusive state will respond to a read request from another processing element with a shared signal. This signal prompts the processing element performing the read request to set the state of the block in its own cache to Shared instead of Exclusive—marking that the block may have multiple copies on multiple processing elements. In turn, the snooping processing element will change a block in its cache marked Exclusive to Shared when it snoops a read request from another processing element.
If the snooped request is a store instead of a read, the snooping processing element must change the state of its cache block to Invalid, since the type of request store implies that the requesting processor wants to set the state of the block to Modified in its cache, and the MESI protocol prohibits multiple copies of a block marked Modified.
A processing element which has a block in the Modified state must signal with a modified or retry response that main memory cannot service the request with its data if that processing element snoops a read or a store to that block. That processing element must then provide the data to the requesting processor and/or main memory and change its own state to Shared if the request was a read or to Invalid if the request was a store.
This process of providing modified data and then changing cache state is typically a more complicated process and takes longer than simply changing state from Exclusive or Shared to Shared or Invalid. Since a block in the Shared or Exclusive state is merely a copy of main memory data, the state of the block can be downgraded from Exclusive to Shared or Invalid, or Shared to Invalid speculatively (without knowing all the responses to the request) without violating coherency. Changing Modified state requires ensuring that there is somewhere for the data to go, since it is the only valid copy in the system.
Another example of a state change which cannot be performed speculatively involves semaphores. Systems which implement semaphores using load-and-reserve and store-conditional instruction pairs typically have a hardware reservation holding the address of the load-and-reserve instruction. The store-conditional instruction can ‘succeed’ and modify data only if the processing element still has a valid reservation at the time it gains ownership of the block and sets the state of the block Modified. This coherency rule is typically enforced in a snoopy system by invalidating reservations upon snooping a store request on the bus. Since two processing elements may be vying to perform a store-conditional on the same block, it is preferable to invalidate a reservation on a snoop only if the snoop is successful (not retried) on the bus. Only if the store is not retried does it gain ownership of the block.
The state lookup, and state changes which are performed speculatively as part of the lookup, is called the snoop query. The state changes which may not be performed speculatively, or which require many cycles to perform, are called the snoop action. In real-world systems, the percentage of snoop queries that trigger snoop actions is small.
Previous high-performance multiprocessors typically included two or more snoop state machines. Each state machine could handle a snoop query and any snoop action resulting from that snoop query. Since the logic to handle all the cache state transitions and snoop actions is complicated, such an arrangement results in a large duplication of logic to handle the infrequent case, snoop action, as well as adding complexity to arbitrate between multiple state machines.
Further, new memory requests requiring new snoop queries may occur while the response and/or action of a previous snoop is pending. Since all snoop queries require a snoop state machine to look up cache state, if multiple snoops require snoop actions, and fill up the available state machines, all subsequent snoops including those which do not need snoop actions would be retried for lack of snoop resources until one of the pending snoop actions completed.
Various approaches, all somewhat complex, to improving throughput in cache coherency system appear in the prior art. Commonly assigned U.S. Pat. No. 5,659,710 to Sherman, et al., describes a cache coherency system employing serially encoded snoop responses. Commonly assigned U.S. Pat. No. 5,341,487 to Derwin, et al., discloses a memory system in which snoop cycles are pipelined. U.S. Pat. No. 5,774,700 to Fisch, et al., discloses a method and apparatus for determining the timing of snoop windows in a pipelined bus.
It is desirable in view of the complexity of prior art techniques to further simplify snoop handling for supporting a pipelined snoop response.
SUMMARY OF THE INVENTION
The present invention overcomes the shortcomings of the prior art by providing a method and system for maintaining cache coherency in a multiprocessor env
Nunez Jose Melanio
Petersen Thomas Albert
Sullivan Marie Jeannette
LandOfFree
Optimizing pipelined snoop processing does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Optimizing pipelined snoop processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimizing pipelined snoop processing will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3029713