Removal of posted operations from cache operations queue

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S141000, C711S142000, C711S143000, C710S054000

Reexamination Certificate

active

06418514

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems, particularly to a method of maintaining cache coherency in a multi-processor computer system, while allowing the posting of certain cache operations such that a broadcast of an operation may be delayed but the operation may nevertheless execute immediately, and further relates to more efficient handling of such posted operations.
2. Description of Related Art
The basic structure of a conventional multi-processor computer system
10
is shown in FIG.
1
. Computer system
10
has several processing units, two of which
12
a
and
12
b
are depicted, which are connected to various peripheral devices, including input/output (I/O) devices
14
(such as a display monitor, keyboard, and permanent storage device), memory device
16
(such as random access memory or RAM) that is used by the processing units to carry out program instructions, and firmware
18
whose primary purpose is to seek out and load an operating system from one of the peripherals (usually the permanent memory device) whenever the computer is first turned on. Processing units
12
a
and
12
b
communicate with the peripheral devices by various means, including a generalized interconnect or bus
20
. Computer system
10
may have many additional components which are not shown, like serial and parallel ports for connection to modems or printers. Those skilled in the art will further appreciate that there are other components that might be used in conjunction with those shown in the block diagram of
FIG. 1
; a display adapter might be used to control a video display monitor, a memory controller can be used to access memory
16
, etc. The computer can also have more than two processing units.
In a symmetric multi-processor (SMP) computer, all of the processing units are generally identical, that is, they all use a common set-or subset of instructions and protocols to operate, and generally have the same architecture. A typical SMP architecture is shown in
FIG. 1. A
processing unit includes a processor core
22
having a plurality of registers and execution units, which carry out program instructions in order to operate the computer. An exemplary processing unit includes the PowerPC™ processor marketed by International Business Machines Corp. The processing unit can also have one or more caches, typically an instruction cache
24
and a data cache
26
, which are implemented using high speed memory devices. Caches are commonly used to temporarily store values (instructions and/or data) that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from memory
16
. These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip
28
. Each cache is associated with a cache controller (not shown) that manages the transfer of data between the processor core and the cache memory.
A processing unit can include additional caches, such as cache
30
, which is referred to as a level
2
(L
2
) cache since it supports the on-board (level
1
) caches
24
and
26
. In other words, cache
30
acts as an intermediary between memory
16
and the on-board caches, and can store a much larger amount of information (instructions and data) than the on-board caches can, but at a longer access penalty. Cache
30
may be a chip having a storage capacity of 256 or 512 kilobytes, while the processor may be an IBM PowerPC™ 604-series processor having on-board caches with 64 kilobytes of total storage. Cache
30
is connected to bus
20
, and all loading of information from memory
16
into processor core
22
must come through cache
30
. Although
FIG. 1
depicts only a two-level cache hierarchy, multi-level cache hierarchies can be provided where there are many levels (L
3
, L
4
, etc.) of serially connected caches.
In an SMP computer, it is important to provide a coherent memory system, that is, to cause write operations to each individual memory location to be serialized in some order for all processors. Assuming that a location in memory is modified by a sequence of write operations to take on the specific successive values of “1,” “2,” “3,” and “4,” in a cache coherent system all processors will observe the writes to the given location to take place in the order shown. However, it is possible for a processing element to miss a write to the memory location. A given processing element reading the memory location could see the sequence 1, 3, 4, missing the update to the value 2. A system that implements these properties is said to be “coherent”. Virtually all coherency protocols operate only to the granularity of the size of a cache block. That is to say, the coherency protocol controls the movement of and write permissions for data on a cache block basis and not separately for each individual memory location.
There are a number of protocols and techniques for achieving cache coherence that are known to those skilled in the art. At the heart of all these mechanisms for maintaining coherency is the requirement that the protocols allow only one processor to have a “permission” that allows a write to a given memory location (cache block) at any given point in time. As a consequence of this requirement, whenever a processing element attempts to write to a memory location, it must first inform all other processing elements of its desire to write the location and receive permission from all other processing elements to carry out the write. The key issue is that all other processors in the system must be informed of the write by the initiating processor before the write occurs. Furthermore, if a block is present in the L
1
cache of a given processing unit, it is also present in the L
2
and L
3
caches of that processing unit. This property is known as inclusion.
To implement cache coherency in a system, the processors communicate over a common generalized interconnect (i.e., bus
20
). The processors pass messages over the interconnect indicating their desire to read or write memory locations. When an operation is placed on the interconnect, all of the other processors“snoop” (monitor) this operation and decide if the state of their caches can allow the requested operation to proceed and if so, under what conditions. There are several bus transactions that require snooping and follow-up action to honor the bus transactions and maintain memory coherency. The snooping operation is triggered by the receipt of a qualified snoop request, generated by the assertion of certain bus signals. Instruction processing is interrupted only when a snoop hit occurs and the snoop state machine determines that an additional cache snoop is required to resolve the coherency of the offended sector.
This communication is necessary because, in systems with caches, the most recent valid copy of a given block of memory may have moved from the system memory
16
to one or more of the caches in the system (as mentioned above). If a processor (say
12
a
) attempts to access a memory location not present within its cache hierarchy, the correct version of the block, which contains the actual (current) value for the memory location, may either be in the system memory
16
or in one of more of the caches in another processing unit, such as processing unit
12
b
. If the correct version is in one or more of the other caches in the system, it is necessary to obtain the correct value from the cache(s) in the system instead of system memory.
For example, consider a processor, say
12
a
, attempting to read a location in memory. It first polls its own L
1
cache (
24
or
26
). If the block is not present in the L
1
cache, the request is forwarded to the L
2
cache (
30
). If the block is not present in the L
2
cache, the request is forwarded on to lower cache levels, like the L
3
cache. If the block is not present in the lower level caches, the request is then presented on the generalized interconnect (
20
) to be serviced. Once a

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Removal of posted operations from cache operations queue does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Removal of posted operations from cache operations queue, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Removal of posted operations from cache operations queue will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2897181

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.