Adaptive cache coherence protocols

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S147000, C711S118000, C711S148000, C709S215000

Reexamination Certificate

active

06757787

ABSTRACT:

BACKGROUND
This invention relates to cache coherence in a distributed shared-memory system.
Many current computer systems make use of hierarchical memory systems to improve memory access from one or more processors. In a common type of multiprocessor system, the processors are coupled to a distributed shared-memory (DSM) system made up of a shared-memory system and a number of memory caches, each coupled between one of the processors and the shared-memory system. The processors execute instructions, including memory access instructions, such as “Load” and “Store,” such that from the point of view of each processor a single shared address space is directly accessible to each processor, and changes made to the value stored at a particular address by one processor are “visible” to the other processor. Various techniques, generally referred to as cache coherence protocols, are used to maintain this type of shared behavior. For instance, if one processor updates a value for a particular address in its cache, caches associated with other processors that also have copies of that address are notified by the shared-memory system and the notified caches remove or invalidate that address in their storage, thereby preventing the other processors, which are associated with the notified caches, from using out-of-date values. The shared-memory system keeps a directory that identifies which caches have copies of each address and uses this directory to notify the appropriate caches of an update. In another approach, the caches share a common communication channel (e.g., a memory bus) over which they communicate with the shared-memory system. When one cache updates the shared-memory system, the other caches “snoop” on the common channel to determine whether they should invalidate or update any of their cached values.
In order to guarantee a desired ordering of updates to the shared-memory system and thereby permit synchronization of programs executing on different processors, many processors use instructions, generally known as “fence” instructions, to delay execution of certain memory access instructions until other previous memory access instructions have completed. The PowerPC “Sync” instruction and the Sun SPARC “Membar” instruction are examples of fence instructions in current processors. These fences are very “course grain” in that they require all previous memory access instructions (or a class of all loads or all stores) to complete before a subsequent memory instruction is issued.
Many processor instruction sets also include a “Prefetch” instruction that is used to reduce the latency of Load instructions that would have required a memory transfer between the shared-memory system and a cache. The Prefetch instruction initiates a transfer of data from the shared-memory system to the processor's cache but the transfer does not have to complete before the instruction itself completes. A subsequent Load instruction then accesses the prefetched data, unless the data has been invalidated in the interim by another processor or the data has not yet been provided to the cache.
Two types of cache coherence protocols have been used in prior systems: snoopy protocols for bus-based multiprocessor systems and directory-based protocols for DSM systems. In bus-based multiprocessor systems, since all the processors can observe an ongoing bus transaction, appropriate coherence actions can be taken when an operation threatening coherence is detected. Protocols that fall into this category are called snoopy protocols because each cache snoops bus transactions to watch memory transactions of other processors. Various snoopy protocols have been proposed. For instance in one protocol, when a processor reads an address not in its cache, it broadcasts a read request on the snoopy bus. Memory or the cache that has the most up-to-date copy will then supply the data. When a processor broadcasts its intention to write an address that it does not own exclusively, other caches invalidate their copies.
Unlike snoopy protocols, directory-based protocols do not rely upon the broadcast mechanism to invalidate or update stale copies. They maintain a directory entry for each memory block to record the cache sites in which the memory block is currently cached. The directory entry is often maintained at the site in which the corresponding physical memory resides. Since the locations of shared copies are known, a protocol engine at each site can maintain coherence by employing point-to-point protocol messages. The elimination of broadcast overcomes a major limitation on scaling cache coherent machines to large-scale multiprocessor systems.
A directory-based cache coherence protocol can be implemented with various directory structures. The full-map directory structure maintains a complete record of which caches are sharing the memory block. In a straightforward implementation, each directory entry contains one bit per cache site representing if that cache has a shared copy. Its main drawback is that the directory space can be intolerable for large-scale systems. Alternative directory structures have been proposed to overcome this problem. Different directory structures represent different implementation tradeoffs between performance and implementation complexity and cost.
Shared-memory programs have various access patterns. Empirical evidence suggests that no fixed cache coherence protocol works well for all access patterns. In shared-memory systems, memory references can suffer long latencies for cache misses. To ameliorate this latency, a cache coherence protocol can be augmented with optimizations for different access patterns. Generally speaking, memory accesses can be classified into a number of common sharing patterns, such as the read-modify-write pattern, the producer-consumer pattern and the migratory pattern. An adaptive system can change its actions to address changing program behaviors.
Some cache memory systems employ different memory modes for different address ranges. For example, at a cache one range of addresses may be local addresses while other addresses are global addresses. When a processor updates a value at a local address, the change in not reflected in a shared memory or in the caches of other processors. In this way, access to local addresses can be performed more rapidly than accesses to global addresses. However, the semantics of memory instructions executed by a processor depend on which address range is being accessed.
In other cache memory systems, the cache can support multiple types or modes of write operations. For instance, depending on a variant of a store instruction that is executed or the mode of an address or address range to which the store is directed, the store instruction may complete without necessarily maintaining a coherent memory model, at least for some period of time after the store instruction completes while coherency-related actions are performed. Various other approaches that enhance memory speed at the expense of maintaining a coherent memory model have also been proposed.
SUMMARY
As cache protocols become more complex, for example as a result of incorporating performance enhancing heuristics, correct operation of the overall memory system is difficult to guarantee. In a general aspect, this invention provides a methodology for designing a memory system that incorporates adaptation or selection of cache protocols during operation while guaranteeing semantically correct processing of memory instructions by the multiple processors. Furthermore, the adaptation can be controlled in a decentralized manner, possibly using heuristics local to a particular cache, subject only to specific status messages being passed between caches and a shared memory. As multi-processor systems scale in the number of processors, some prior cache coherence approaches are difficult to implement and to verify their correct operation. For instance, in a directory-based cache coherence approach in which each cache that has a copy of an address is indicated in the directory, the directory must be

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Adaptive cache coherence protocols does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Adaptive cache coherence protocols, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Adaptive cache coherence protocols will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3303569

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.