Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2001-03-12
2003-04-08
Nguyen, Hiep T. (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S121000, C711S144000, C711S168000, C709S248000
Reexamination Certificate
active
06546469
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to a method of efficiently scheduling snoop operations between caches in a multiprocessor computer system.
2. Description of Related Art
The basic structure of a conventional multiprocessor computer system
10
is shown in FIG.
1
. Computer system
10
has several processing units, two of which
12
a
and
12
b
are depicted, which are connected to various peripheral devices, including input/output (I/O) devices
14
(such as a display monitor, keyboard, graphical pointer (mouse), and a permanent storage device or hard disk), memory device
16
(such as random access memory or RAM) that is used by the processing units to carry out program instructions, and firmware
18
whose primary purpose is to seek out and load an operating system from one of the peripherals (usually the permanent memory device) whenever the computer is first turned on. Processing units
12
a
and
12
b
communicate with the peripheral devices by various means, including a generalized interconnect or bus
20
, or direct memory access channels (not shown). Computer system
10
may have many additional components which are not shown, such as serial, parallel, and universal system bus (USB) ports for connection to, e.g., modems, printers or scanners. There are other components that might be used in conjunction with those shown in the block diagram of
FIG. 1
; for example, a display adapter might be used to control a video display monitor, a memory controller can be used to access memory
16
, etc. The computer can also have more than two processing units.
In a symmetric multi-processor (SMP) computer, all of the processing units are generally identical, that is, they all use a common set or subset of instructions and protocols to operate, and generally have the same architecture. A typical processing unit includes a processor core
22
having a plurality of registers and execution units, which carry out program instructions in order to operate the computer. An exemplary processing unit includes the PowerPC™ processor marketed by International Business Machines Corp. The processing unit can also have one or more caches, such as an instruction cache
24
and a data cache
26
, which are implemented using high speed memory devices. Caches are commonly used to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the additional latency of loading the values from memory
16
. These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip
28
. Each cache is associated with a cache controller (not shown) that manages the transfer of data and instructions between the processor core and the cache memory.
A processing unit can include additional caches, such as cache
30
, which is referred to as a level 2 (L2) cache since it supports the on-board (level 1) caches
24
and
26
. In other words, cache
30
acts as an intermediary between memory
16
and the on-board caches, and can store a much larger amount of information (instructions and data) than the on-board caches can, but at a longer access penalty. For example, cache
30
may be a chip having a storage capacity of 512 kilobytes, while the processor may be an IBM PowerPC™ 604-series processor having on-board caches with 64 kilobytes of total storage. Cache
30
is connected to bus
20
, and all loading of information from memory
16
into processor core
22
must come through cache
30
. Although
FIG. 1
depicts only a two-level cache hierarchy, multi-level cache hierarchies can be provided where there are many levels (L3, L4, etc.) of serially connected caches.
In a multi-level cache, if a copy of a value is in every level of the cache, the cache hierarchy is referred to as being “inclusive.” It is not necessary, however, to keep a copy of each value in the lower levels, and an inclusivity bit field may be added to the caches to indicate whether or not the cache is inclusive. For example, a three-level cache structure might provide an L3 cache which was not inclusive, such that a value residing in the L2 cache might not be present in the L3 cache. In this example, if an L2 cache issues a read command for a value that is not present in any of the caches of that processing unit, it can be passed to that L2 cache without (necessarily) loading it into the L3 cache.
In an SMP computer, it is important to provide a coherent memory system, that is, to cause write operations to each individual memory location to be serialized in some order for all processors. By way of example, assume a location in memory is modified by a sequence of write operations to take on the values: 1, 2, 3, 4. In a cache coherent system, all processors will observe the writes to a given location to take place in the order shown. However, it is possible for a processing element to miss a write to the memory location. A given processing element reading the memory location could see the sequence 1, 3, 4, missing the update to the value 2. A system that implements these properties is said to be “coherent”. Nearly all coherency protocols operate only to the granularity of the size of a cache block. That is to say, the coherency protocol controls the movement of and write permissions for operand data or instructions on a cache block basis, and not separately for each individual memory location.
There are a number of protocols and techniques for achieving cache coherence that are known to those skilled in the art. All of these mechanisms for maintaining coherency require that the protocols allow only one processor to have a “permission” that allows a write operation to a given memory location (cache block) at any given point in time. As a consequence of this requirement, whenever a processing element attempts to write to a memory location, it must first inform all other processing elements of its desire to write the location and receive permission from all other processing elements to carry out the write.
To implement cache coherency in a system, the processors communicate over a common generalized interconnect (i.e., bus
20
). The processors pass messages over the interconnect indicating their desire to read from or write to memory locations. When an operation is placed on the interconnect, all of the other processors “snoop” (monitor) this operation and decide if the state of their caches can allow the requested operation to proceed and, if so, under what conditions. There are several bus transactions that require snooping and follow-up action to honor the bus transactions and maintain memory coherency. The snooping operation is triggered by the receipt of a qualified snoop request, generated by the assertion of certain bus signals. Instruction processing is interrupted only when a snoop hit occurs and the snoop state machine determines that an additional cache snoop is required to resolve the coherency of the offended sector.
This communication is necessary because, in systems with caches, the most recent valid copy of a given block of memory may have moved from the system memory
16
to one or more of the caches in the system (as mentioned above). If a processor (say
12
a
) attempts to access a memory location not present within its cache hierarchy, the correct version of the block, which contains the actual (current) value for the memory location, may either be in the system memory
16
or in one of more of the caches in another processing unit, e.g. processing unit
12
b
. If the correct version is in one or more of the other caches in the system, it is necessary to obtain the correct value from the cache(s) in the system instead of system memory.
For example, consider a processor, say
12
a
, attempting to read a location in memory. It first polls its own L1 cache (
24
or
26
). If the block is not present in the L1 cache, the request is forwarded to the L2 cache (
30
). If the block is not present in the L2 cache, the requ
Arimilli Ravi Kumar
Fields, Jr. James Stephen
Ghai Sanjeev
Guthrie Guy Lynn
Joyner Jody B.
Bracewell & Patterson L.L.P.
Dinh Ngoc V
International Business Machines - Corporation
Nguyen Hiep T.
Salys Casimer K.
LandOfFree
Multiprocessor system snoop scheduling mechanism for limited... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multiprocessor system snoop scheduling mechanism for limited..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiprocessor system snoop scheduling mechanism for limited... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3046851