Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-06-25
2002-02-26
Thai, Tuan V. (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S122000, C711S141000, C711S144000
Reexamination Certificate
active
06351791
ABSTRACT:
FIELD OF THE INVENTION
The invention is generally related to cache coherence in a shared memory architecture, and in particular to response collection in a snoopy cache coherence implementation.
BACKGROUND OF THE INVENTION
Computer technology continues to advance at a remarkable pace, with numerous improvements being made to the performance of both microprocessors—the “brains” of a computer—and the memory that stores the information processed by a computer.
In general, a microprocessor operates by executing a sequence of instructions that form a computer program. The instructions are typically stored in a memory system having a plurality of storage locations identified by unique memory addresses. The memory addresses collectively define a “memory address space,” representing the addressable range of memory addresses that can be accessed by a microprocessor.
Both the instructions forming a computer program and the data operated upon by those instructions are often stored in a memory system and retrieved as necessary by the microprocessor when executing the computer program. The speed of microprocessors, however, has increased relative to that of memory devices to the extent that retrieving instructions and data from a memory can often become a significant bottleneck on performance. To decrease this bottleneck, it is desirable to use the fastest available memory devices possible, e.g., static random access memory (SRAM) devices or the like. However, both memory speed and memory capacity are typically directly related to cost, and as a result, many computer designs must balance memory speed and capacity with cost.
A predominant manner of obtaining such a balance is to use multiple “levels” of memories in a memory system to attempt to decrease costs with minimal impact on system performance. Often, a computer relies on a relatively large, slow and inexpensive mass storage system such as a hard disk drive or other external storage device, an intermediate main memory that uses dynamic random access memory devices (DRAM's) or other volatile memory storage devices, and one or more high speed, limited capacity cache memories, or caches, implemented with SRAM's or the like. One or more memory controllers are then used to swap the information from segments of memory addresses, often known as “cache lines”, between the various memory levels to attempt to maximize the frequency that requested memory addresses are stored in the fastest cache memory accessible by the microprocessor. Whenever a memory access request attempts to access a memory address that is not cached in a cache memory, a “cache miss” occurs. As a result of a cache miss, the cache line for a memory address typically must be retrieved from a relatively slow, lower level memory, often with a significant performance hit.
Another manner of increasing computer performance is to use multiple microprocessors operating in parallel with one another to perform different tasks at the same time. Often, the multiple microprocessors share at least a portion of the same memory system to permit the microprocessors to work together to perform more complex tasks. The multiple microprocessors are typically coupled to one another and to the shared memory by a system bus or other like interconnection network. By sharing the same memory system, however, a concern arises as to maintaining “coherence” between the various memory levels in the shared memory system.
For example, in a given multi-processor environment, each microprocessor may have one or more dedicated cache memories that are accessible only by that microprocessor, e.g., level one (L1) data and/or instruction cache, a level two (L2) cache, and/or one or more buffers such as a line fill buffer and/or a transition buffer. Moreover, more than one microprocessor may share certain caches as well. As a result, any given memory address may be stored from time to time in any number of places in the shared memory system.
A number of different mechanisms exist for maintaining coherence within a shared memory system, including among others a directory-based coherence mechanism and a snoopy coherence mechanism. The directory-based coherence mechanism maintains a shared directory of the location of different memory addresses in the shared memory system. However, this mechanism may induce bottlenecks given that most if not all memory access requests need to access the same directory to determine the location of a given memory address.
The snoopy coherence mechanism, on the other hand, in effect distributes the determination of where a given memory address resides among multiple possible memories to snoop logic distributed among and associated with the memories themselves. As such, at least the mechanism that maintains the state information for a memory, e.g., a directory, and the associated snoop logic that updates the state information in response to a memory access request and/or returns a response to the request, is also cooperatively referred to in this context as a “snooper” device. Whenever a memory access request is issued by a given device on a bus, dedicated logic in those snooper devices “snoop” the request and determine whether the cache line for a memory address specified by the request is stored in any of the devices. Typically, if a snooper device has a valid copy of the cache line, that device outputs the cache line to the system bus for access by the requesting device. In some embodiments, “intervention” is also supported, where a snooper device is able to output a cache line directly to a requesting device, e.g., by passing data through the system bus, or even bypassing the system bus altogether.
Another important aspect of the snoopy coherence mechanism, however, is that all possible sourcing devices need to know which device will be handling a memory access request, to prevent more than one device from attempting to handle the request. Yet another important aspect is that all of the snooper devices must update their status information regarding the cache line in response to fulfilling of the request. Therefore, in response to a request, each of the snooper devices must update its status information and output a response indicating the status of the cache line in the device. The responses are then collected and a single response is returned to the requesting device to inform the requesting device of the status of the information being requested.
One conventional snoopy coherence mechanism uses a MESI coherence protocol that tags information stored in a snooper device as one of four states: Modified, Exclusive, Shared, or Invalid. The modified state indicates that the requested cache line is stored in the snooper device, and that the device has the most recent copy thereof—i.e., all other copies, if any, are no longer valid. The Exclusive state indicates that the requested cache line is stored only in the snooper device, but has not been modified relative to the copy in the shared memory. The Shared state indicates that the requested cache line is stored in the snooper device, but that other valid copies of the cache line also exist in other devices. The Invalid state indicates that the cache line is not stored in the snooper device.
If, in response to receipt of a request, a snooper device is capable of determining the state of a cache line, the state is returned with the appropriate response. However, if for some reason the snooper device is unable to determine the state of the cache line, the snooper device typically returns a “Retry” response instead, indicating the failure to process the request. Reasons for returning a Retry response may include, for example, no snoop buffer being available in the snooper device, the snooper device being busy with another operation, or colliding bus transactions, among others.
The various responses from the snooper devices in a shared memory system are typically collected by snoop response collection logic to generate a prioritized snoop response signal that is returned to the requesting device. In a conventional MESI protocol snoo
Freerksen Donald Lee
Lippert Gary Michael
Mounes-Toussi Farnaz
Thai Tuan V.
Wood Herron & Evans
LandOfFree
Circuit arrangement and method of maintaining cache... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Circuit arrangement and method of maintaining cache..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Circuit arrangement and method of maintaining cache... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2939462