Demand-based issuance of cache operations to a system bus

Electrical computers and digital processing systems: memory – Address formation – Address mapping

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S141000, C711S145000

Reexamination Certificate

active

06182201

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems and, more particularly, to a method of optimizing architectural-level operations such as cache instructions.
2. Description of the Related Art
The basic structure of a conventional computer system
10
is shown in FIG.
1
. Computer system
10
may have one or more processing units, two of which
12
a
and
12
b
are depicted, which are connected to various peripheral devices, including input/output (I/O) devices
14
(such as a display monitor, keyboard, and permanent storage device), memory device
16
(such as random access memory or RAM) that is used by the processing units to carry out program instructions, and firmware
18
whose primary purpose is to seek out and load an operating system from one of the peripherals (usually the permanent memory device) whenever the computer is first turned on. Processing units
12
a
and
12
b
communicate with the peripheral devices by various means, including a generalized interconnect or bus
20
. Computer system
10
may have many additional components which are not shown, such as serial and parallel ports for connection to, e.g., modems or printers. Those skilled in the art will further appreciate that there are other components that might be used in conjunction with those shown in the block diagram of
FIG. 1
; for example, a display adapter might be used to control a video display monitor, a memory controller can be used to access memory
16
, etc. Also, instead of connecting I/O devices
14
directly to bus
20
, they may be connected to a secondary (I/O) bus which is further connected to an I/O bridge to bus
20
. The computer can have more than two processing units.
In a symmetric multi-processor (SMP) computer, all of the processing units are generally identical; that is, they all use a common set or subset of instructions and protocols to operate, and generally have the same architecture. A typical architecture is shown in
FIG. 1. A
processing unit includes a processor core
22
having a plurality of registers and execution units, which carry out program instructions in order to operate the computer. An exemplary processing unit includes the PowerPC™ processor marketed by International Business Machines Corporation. The processing unit can also have one or more caches, such as an instruction cache
24
and a data cache
26
, which are implemented using high-speed memory devices. Caches are commonly used to store temporarily values that might be accessed repeatedly by a processor, in order to speed up processing by avoiding the longer step of loading the values from memory
16
. These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip
28
. Each cache is associated with a cache controller (not shown) that manages the transfer of data between the processor core and the cache memory.
A processing unit
12
can include additional caches, such as cache
30
, which is referred to as a level 2 (L2) cache since it supports the on-board (level 1) caches
24
and
26
. In other words, cache
30
acts as an intermediary between memory
16
and the on-board caches, and can store a much larger amount of information (instructions and data) than the on-board caches can, but at a longer access penalty. For example, cache
30
may be a chip having a storage capacity of 256 or 512 kilobytes, while the processor may be an IBM PowerPC™ 604-series processor having on-board caches with 64 kilobytes of total storage. Cache
30
is connected to bus
20
, and all loading of information from memory
16
into processor core
22
must come through cache
30
. Although
FIG. 1
depicts only a two-level cache hierarchy, multi-level cache hierarchies can be provided where there are many levels of serially connected caches.
A cache has many blocks or lines which individually store the various instructions and data values. An exemplary cache line (block) includes an address tag field, a state bit field, an inclusivity bit field, and a value field for storing the actual instruction or data. The state bit field and inclusivity bit fields are used to maintain cache coherency in a multiprocessor computer system (indicate the validity of the value stored in the cache). The address tag is a subset of the full address of the corresponding memory block. A compare match of an incoming effective address with one of the tags within the address tag field indicates a cache “hit.” The collection of all of the address tags in a cache (and sometimes the state bit and inclusivity bit fields) is referred to as a directory, and the collection of all of the value fields is the cache entry array. The cache
30
of
FIG. 1
depicts such a cache entry array
32
and a cache directory
34
.
When all of the blocks in a set for a given cache are full and that cache receives a request, whether a “read” or “write,” to a memory location that maps into the full set, the cache must “evict” one of the blocks currently in the set. The cache chooses a block by one of a number of means known to those skilled in the art (least recently used (LRU), random, pseudo-LRU, etc.) to be evicted. An LRU unit
36
is depicted in FIG.
1
. If the data in the chosen block is modified, that data is written to the next lowest level in the memory hierarchy which may be another cache (in the case of the L1 or on-board cache) or main memory (in the case of an L2 cache, as depicted in the two-level architecture of FIG.
1
). By the principle of inclusion, the lower level of the hierarchy will already have a block available to hold the written modified data. However, if the data in the chosen block is not modified, the block is simply abandoned and not written to the next lowest level in the hierarchy. This process of removing a block from one level of the hierarchy is known as an “eviction.” At the end of this process, the cache no longer holds a copy of the evicted block.
A conventional cache has many queues: cacheable store queues
38
(which may include read and write queues for each of the cache directory, cache entry array, and other arrays, to fetch data coming in to reload this cache); a cache-inhibited store queue
40
; a snoop queue
42
for monitoring requests to, e.g., intervene some data; and a cache operations queue
44
which handles cache instructions that execute control at an architectural level. For example, the PowerPC™ processor utilizes certain instructions that specially affect the cache, such as a flush instruction, a kill instruction, a clean instruction, and a touch instruction. These instructions are stored in cache operations queue
44
.
Cache instructions allow software to manage the cache. Some of the instructions are supervisory level (performed only by the computer's operating system), and some are user level (performed by application programs). The flush instruction (data cache block flush—“dcbf”) causes a cache block to be made available by invalidating the cache block if it contains an unmodified (“shared” or “exclusive”) copy of a memory block or, if the cache block contains a modified copy of a memory block, then by first writing the modified value downward in the memory hierarchy (a “push”), and thereafter invalidating the block. The kill instruction (data cache block invalidate—“dcbi,” instruction cache block invalidate—“icbi,” or data cache block set to zero—“dcbz”) is similar to the flush instruction except that a kill instruction immediately forces a cache block to an invalidate state, so any modified block is killed without pushing it out of the cache. The clean instruction (data cache block store—“dcbst”) causes a block that has been modified to be written to main memory; it affects only blocks which have been modified. The touch instruction (data cache block touch—“dcbt”) provides a method for improving performance through the use of software-initiated prefetch hints.
All of the foregoing cache instructions operate on a block whose size is referred to as the processor coherenc

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Demand-based issuance of cache operations to a system bus does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Demand-based issuance of cache operations to a system bus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Demand-based issuance of cache operations to a system bus will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2500160

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.