Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-02-17
2001-12-25
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S122000, C711S141000
Reexamination Certificate
active
06334172
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to a cache coherency protocol which provides a novel coherency state for modified data allowing improvements in cache intervention without requiring writing of the intervened data to system memory.
2. Description of the Related Art
The basic structure of a conventional multi-processor computer system 
10
 is shown in FIG. 
1
. Computer system 
10
 has several processing units, two of which 
12
a 
and 
12
b 
are depicted, which are connected to various peripheral devices, including input/output (I/O) devices 
14
 (such as a display monitor, keyboard, graphical pointer (mouse), and a permanent storage device (hard disk)), memory device 
16
 (such as random access memory or RAM) that is used by the processing units to carry out program instructions, and firmware 
18
 whose primary purpose is to seek out and load an operating system from one of the peripherals (usually the permanent memory device) whenever the computer is first turned on. Processing units 
12
a 
and 
12
b 
communicate with the peripheral devices by various means, including a generalized interconnect or bus 
20
, or direct memory access channels (not shown). Computer system 
10
 may have many additional components which are not shown, such as serial and parallel ports for connection to, e.g., modems or printers. There are other components that might be used in conjunction with those shown in the block diagram of 
FIG. 1
; for example, a display adapter might be used to control a video display monitor, a memory controller can be used to access memory 
16
, etc. The computer can also have more than two processing units.
In a symmetric multi-processor (SMP) computer, all of the processing units are generally identical, that is, they all use a common set or subset of instructions and protocols to operate, and generally have the same architecture. A typical architecture is shown in 
FIG. 1. A
 processing unit includes a processor core 
22
 having a plurality of registers and execution units, which carry out program instructions in order to operate the computer. An exemplary processing unit includes the PowerPC™ processor marketed by International Business Machines Corp. The processing unit can also have one or more caches, such as an instruction cache 
24
 and a data cache 
26
, which are implemented using high speed memory devices. Caches are commonly used to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from memory 
16
. These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip 
28
. Each cache is associated with a cache controller (not shown) that manages the transfer of data and instructions between the processor core and the cache memory.
A processing unit can include additional caches, such as cache 
30
, which is referred to as a level 
2
 (L
2
) cache since it supports the on-board (level 
1
) caches 
24
 and 
26
. In other words, cache 
30
 acts as an intermediary between memory 
16
 and the on-board caches, and can store a much larger amount of information (instructions and data) than the on-board caches can, but at a longer access penalty. For example, cache 
30
 may be a chip having a storage capacity of 256 or 512 kilobytes, while the processor may be an IBM PowerPC™ 604-series processor having on-board caches with 64 kilobytes of total storage. Cache 
30
 is connected to bus 
20
, and all loading of information from memory 
16
 into processor core 
22
 must come through cache 
30
. Although 
FIG. 1
 depicts only a two-level cache hierarchy, multi-level cache hierarchies can be provided where there are many levels (L
3
, L
4
, etc.) of serially connected caches. If a block is present in the L
1
 cache of a given processing unit, it is also present in the L
2
 and L
3
 caches of that processing unit. This property is known as inclusion. Henceforth, it is assumed that the principle of inclusion applies to the caches related to the present invention.
In an SMP computer, it is important to provide a coherent memory system, that is, to cause write operations to each individual memory location to be serialized in some order for all processors. For example, assume a location in memory is modified by a sequence of write operations to take on the values: 1, 2, 3, 4. In a cache coherent system, all processors will observe the writes to a given location to take place in the order shown. However, it is possible for a processing element to miss a write to the memory location. A given processing element reading the memory location could see the sequence 1, 3, 4, missing the update to the value 2. A system that implements these properties is said to be “coherent”. Virtually all coherency protocols operate only to the granularity of the size of a cache block. That is to say, the coherency protocol controls the movement of and write permissions for data on a cache block basis, and not separately for each individual memory location (hereinafter, the term “data” is used to refer to a memory value that is either a numeric value which is used by the program or a value that corresponds to a program instruction).
There are a number of protocols and techniques for achieving cache coherence that are known to those skilled in the art. All of these mechanisms for maintaining coherency require that the protocols allow only one processor to have a “permission” that allows a write operation to a given memory location (cache block) at any given point in time. As a consequence of this requirement, whenever a processing element attempts to write to a memory location, it must first inform all other processing elements of its desire to write the location and receive permission from all other processing elements to carry out the write.
To implement cache coherency in a system, the processors communicate over a common generalized interconnect (i.e., bus 
20
). The processors pass messages over the interconnect indicating their desire to read from or write to memory locations. When an operation is placed on the interconnect, all of the other processors “snoop” (monitor) this operation and decide if the state of their caches can allow the requested operation to proceed and, if so, under what conditions. There are several bus transactions that require snooping and follow-up action to honor the bus transactions and maintain memory coherency. The snooping operation is triggered by the receipt of a qualified snoop request, generated by the assertion of certain bus signals. Instruction processing is interrupted only when a snoop hit occurs and the snoop state machine determines that an additional cache snoop is required to resolve the coherency of the offended sector.
This communication is necessary because, in systems with caches, the most recent valid copy of a given block of memory may have moved from the system memory 
16
 to one or more of the caches in the system (as mentioned above). If a processor (say 
12
a
) attempts to access a memory location not present within its cache hierarchy, the correct version of the block, which contains the actual (current) value for the memory location, may either be in the system memory 
16
 or in one of more of the caches in another processing unit, e.g. processing unit 
12
b
. If the correct version is in one or more of the other caches in the system, it is necessary to obtain the correct value from the cache(s) in the system instead of system memory.
For example, consider a processor, say 
12
a
, attempting to read a location in memory. It first polls its own L
1
 cache (
24
 or 
26
). If the block is not present in the L
1
 cache, the request is forwarded to the L
2
 cache (
30
). If the block is not present in the L
2
 cache, the request is forwarded on to lower cache levels, e.g., the L
3
 cache. If the block is not present in the lower level caches, the request is then presented on the gen
Arimilli Ravi Kumar
Dodson John Steven
Lewis Jerry Don
Anderson Matt
Bracewell & Patterson L.L.P.
Emile Volel
International Business Machines - Corporation
Kim Matthew
LandOfFree
Cache coherency protocol with tagged state for modified values does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cache coherency protocol with tagged state for modified values, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache coherency protocol with tagged state for modified values will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2599102