Integrated cache buffers

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S133000, C711S143000

Reexamination Certificate

active

06240487

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to a method of efficiently using buffers located in a cache memory of a processing unit of a computer system.
2. Description of Related Art
The basic structure of a conventional computer system includes one or more processing units connected to various input/output devices (such as a display monitor, keyboard, and permanent storage device), and a system memory device (such as random access memory or RAM) that is used by the processing units to carry out program instructions. The processing units communicate with the other devices by various means, including one or more generalized interconnects. A computer system may have many additional components such as serial and parallel ports for connection to, e.g., modems or printers, and other components that might be used in conjunction with the foregoing; for example, a display adapter might be used to control a video display monitor, a memory controller can be used to access the system memory, etc.
A typical processing unit includes various execution units and registers, as well as branch and dispatch units which forward instructions to the appropriate execution units. Caches are commonly provided for both instructions and data, to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from the system memory (RAM). These caches are referred to as “on-board” when they are integrally packaged with the processor core on a single integrated chip. Each cache is associated with a cache controller or bus interface unit that manages the transfer of values between the processor core and the cache memory.
A processing unit can include additional caches, such as a level 2 (L2) cache which supports the on-board (level 1) caches. In other words, the L2 cache acts as an intermediary between system memory and the on-board caches, and can store a much larger amount of information (both instructions and data) than the on-board caches can, but at a longer access penalty. Multi-level cache hierarchies can be provided where there are many levels of interconnected caches.
A cache has many blocks which individually store the various instruction and data values. The blocks in any cache are divided into groups of blocks called “sets” or “congruence classes.” A set is the collection of cache blocks that a given memory block can reside in. For any given memory block, there is a unique set in the cache that the block can be mapped into, according to preset mapping functions. The number of blocks in a set is referred to as the associativity of the cache, e.g., 2-way set associative means that for any given memory block there are two blocks in the cache that the memory block can be mapped into; however, several different blocks in main memory can be mapped to any given set.
An exemplary cache line (block) includes an address tag field, a state bit field, an inclusivity bit field, and a value field for storing the actual instruction or data. The state bit field and inclusivity bit fields are used to maintain cache coherency in a multi-processor computer system (indicating the validity of the value stored in the cache). The address tag is a subset of the full address of the corresponding memory block. A compare match of an incoming address with one of the tags within the address tag field indicates a cache “hit.” The collection of all of the address tags in a cache is referred to as a directory (and sometimes includes the state bit and inclusivity bit fields), and the collection of all of the value fields is the cache entry array.
When all of the blocks in a set for a given cache are full and that cache receives a request, whether a read or write operation, to access another memory location that maps into the full set (a cache “miss”), the cache must evict one of the blocks currently in the set. The cache chooses a block by one of a number of means known to those skilled in the art (least recently used (LRU), random, pseudo-LRU, etc.) to be evicted. At the end of this process, the cache no longer holds a copy of the evicted block. If another copy of the value was not already present somewhere else in the memory hierarchy, then the value must be written back to system memory (or to some other cache).
For a high-speed processor device such as a superscalar, reduced instruction set computing (RISC) processor wherein more than one instruction can be executed during a single processor cycle, demands for simultaneous multiple accesses to the cache memory are increasing. The processor device may have to access more than one effective address and/or real address of the cache memory in a single processor cycle. Hence, a cache memory is often partitioned into multiple subarrays (interleaved) in order to achieve single-cycle, multi-port access. An interleaved cache memory has the potential of being accessed by more than one address and producing more than one output value in a single processor cycle.
Caches typically buffer a value that will be used to replace a line in the cache (cast-in) as a result of a cache miss. On a cache miss, the value is loaded into the cache reload buffer from memory, or from the next lower level in the memory hierarchy, and then forwarded to the requester. The value in the reload buffer is eventually loaded into the cache entry array. Caches similarly can buffer a value that is cast-out of a cache (evicted), presuming that a write-back is required, i.e., by the coherency state of the subject cache block. The value is loaded into a cast-out (or store-back) buffer from the cache, and forwarded to system memory, or otherwise propagated to another location in the memory hierarchy.
Buffers are used to reduce problems associated with trailing-edge latency. Otherwise, a considerable amount of wire (i.e., a large bus) would be required to dump the subarray contents into an external buffer, presenting additional problems. While these internal buffers can thus improve performance, they take up significant space in the cache and can be totally useless over long periods, i.e., those wherein no cache misses occur (for the cache reload buffer) and those wherein no cast-outs occur (for the cache store-back buffer). This inefficiency is exacerbated with interleaved caches that have separate reload and store-back buffers for each subarray in a given cache. It would, therefore, be desirable to devise an improved method of handling reload and cast-out values to provide more efficient cache utilization. It would be further advantageous if the method could be physically implemented in an efficient manner.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide an improved cache memory for a data processing system.
It is another object of the present invention to provide such a cache memory which has at least one buffer, such as a reload buffer or store-back buffer.
It is yet another object of the present invention to provide an improved method of managing internal cache buffers to provide more efficient cache utilization.
The foregoing objects are achieved in a cache for providing values to a processing unit of a computer system, generally comprising an array for holding a value, a buffer connected to the array, and means for accessing the buffer to retrieve the value for the processing unit. The accessing means preferably includes a plurality of wires having a pitch which is substantially equal to a wire pitch of the cache array. Multiplexers can be used with a plurality of such buffers to create a common output path. The cache is furthermore preferably interleaved, with the array being a first subarray, and the buffer being a first buffer, and further comprising a second subarray and a second buffer, wherein the first and second buffers separate the first and second subarrays. The invention can be applied to a store-back buffer as well as a reload buffer. In the case of a store-back buffer, the i

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Integrated cache buffers does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Integrated cache buffers, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Integrated cache buffers will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2509940

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.