Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1998-02-17
2001-06-12
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S144000, C711S117000, C711S119000
Reexamination Certificate
active
06247098
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to a cache coherency protocol which provides a novel coherency state for modified data allowing improvements in cache intervention without requiring writing of the intervened data to system memory.
2. Description of the Related Art
The basic structure of a conventional multi-processor computer system
10
is shown in FIG.
1
. Computer system
10
has several processing units, two of which
12
a
and
12
b
are depicted, which are connected to various peripheral devices, including input/output (I/O) devices
14
(such as a display monitor, keyboard, graphical pointer (mouse), and a permanent storage device (hard disk)), memory device
16
(such as random access memory or RAM) that is used by the processing units to carry out program instructions, and firmware
18
whose primary purpose is to seek out and load an operating system from one of the peripherals (usually the permanent memory device) whenever the computer is first turned on. Processing units
12
a
and
12
b
communicate with the peripheral devices by various means, including a generalized interconnect or bus
20
, or direct memory access channels (not shown). Computer system
10
may have many additional components which are not shown, such as serial and parallel ports for connection to, e.g., modems or printers. There are other components that might be used in conjunction with those shown in the block diagram of
FIG. 1
; for example, a display adapter might be used to control a video display monitor, a memory controller can be used to access memory
16
, etc. The computer can also have more than two processing units.
In a symmetric multi-processor (SMP) computer, all of the processing units are generally identical, that is, they all use a common set or subset of instructions and protocols to operate, and generally have the same architecture. A typical architecture is shown in
FIG. 1. A
processing unit includes a processor core
22
having a plurality of registers and execution units, which carry out program instructions in order to operate the computer. An exemplary processing unit includes the PowerPC™ processor marketed by International Business Machines Corp. The processing unit can also have one or more caches, such as an instruction cache
24
and a data cache
26
, which are implemented using high speed memory devices. Caches are commonly used to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from memory
16
. These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip
28
. Each cache is associated with a cache controller (not shown) that manages the transfer of data and instructions between the processor core and the cache memory.
A processing unit can include additional caches, such as cache
30
, which is referred to as a level 2 (L2) cache since it supports the on-board (level 1) caches
24
and
26
. In other words, cache
30
acts as an intermediary between memory
16
and the on-board caches, and can store a much larger amount of information (instructions and data) than the on-board caches can, but at a longer access penalty. For example, cache
30
may be a chip having a storage capacity of 256 or 512 kilobytes, while the processor may be an IBM PowerPC™ 604-series processor having on-board caches with 64 kilobytes of total storage. Cache
30
is connected to bus
20
, and all loading of information from memory
16
into processor core
22
must come through cache
30
. Although
FIG. 1
depicts only a two-level cache hierarchy, multi-level cache hierarchies can be provided where there are many levels (L3, L4, etc.) of serially connected caches. If a block is present in the L1 cache of a given processing unit, it is also present in the L2 and L3 caches of that processing unit. This property is known as inclusion. Henceforth, it is assumed that the principle of inclusion applies to the caches related to the present invention.
In an SMP computer, it is important to provide a coherent memory system, that is, to cause write operations to each individual memory location to be serialized in some order for all processors. For example, assume a location in memory is modified by a sequence of write operations to take on the values: 1, 2, 3, 4. In a cache coherent system, all processors will observe the writes to a given location to take place in the order shown. However, it is possible for a processing element to miss a write to the memory location. A given processing element reading the memory location could see the sequence 1, 3, 4, missing the update to the value 2. A system that implements these properties is said to be “coherent”. Virtually all coherency protocols operate only to the granularity of the size of a cache block. That is to say, the coherency protocol controls the movement of and write permissions for data on a cache block basis, and not separately for each individual memory location (hereinafter, the term “data” is used to refer to a memory value that is either a numeric value which is used by the program or a value that corresponds to a program instruction).
There are a number of protocols and techniques for achieving cache coherence that are known to those skilled in the art. All of these mechanisms for maintaining coherency require that the protocols allow only one processor to have a “permission” that allows a write operation to a given memory location (cache block) at any given point in time. As a consequence of this requirement, whenever a processing element attempts to write to a memory location, it must first inform all other processing elements of its desire to write the location and receive permission from all other processing elements to carry out the write.
To implement cache coherency in a system, the processors communicate over a common generalized interconnect (i.e., bus
20
). The processors pass messages over the interconnect indicating their desire to read from or write to memory locations. When an operation is placed on the interconnect, all of the other processors “snoop” (monitor) this operation and decide if the state of their caches can allow the requested operation to proceed and, if so, under what conditions. There are several bus transactions that require snooping and follow-up action to honor the bus transactions and maintain memory coherency. The snooping operation is triggered by the receipt of a qualified snoop request, generated by the assertion of certain bus signals. Instruction processing is interrupted only when a snoop hit occurs and the snoop state machine determines that an additional cache snoop is required to resolve the coherency of the offended sector.
This communication is necessary because, in systems with caches, the most recent valid copy of a given block of memory may have moved from the system memory
16
to one or more of the caches in the system (as mentioned above). If a processor (say
12
a
) attempts to access a memory location not present within its cache hierarchy, the correct version of the block, which contains the actual (current) value for the memory location, may either be in the system memory
16
or in one of more of the caches in another processing unit, e.g. processing unit
12
b
. If the correct version is in one or more of the other caches in the system, it is necessary to obtain the correct value from the cache(s) in the system instead of system memory.
For example, consider a processor, say
12
a
, attempting to read a location in memory. It first polls its own L1 cache (
24
or
26
). If the block is not present in the L1 cache, the request is forwarded to the L2 cache (
30
). If the block is not present in the L2 cache, the request is forwarded on to lower cache levels, e.g., the L3 cache. If the block is not present in the lower level caches, the request is then presented on the generalized interconnect (
20
Arimilli Ravi Kumar
Dodson John Steven
Lewis Jerry Don
Bracewell & Patterson L.L.P.
Emile Volel
International Business Machines - Corporation
Kim Matthew
Tzeng Fred F.
LandOfFree
Cache coherency protocol with selectively implemented tagged... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cache coherency protocol with selectively implemented tagged..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache coherency protocol with selectively implemented tagged... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2547320