Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-11-10
2002-08-20
Yoo, Do Hyun (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S160000, C365S230030
Reexamination Certificate
active
06438655
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to a method of cache locking on a bank-by-bank basis in a cache implementing a “least recently used” cache replacement scheme, and to a cache implementing such a method.
A popular type of cache is a set-associative cache, shown in FIG.
1
. Cache
100
includes several cache banks
102
, and each bank
102
has a TagRAM section
106
, a StateRAM section
108
, and a DataRAM section
104
. Each cache bank has a number of addressable memory locations (typically, a power of 2), which are divided into a number of blocks. Illustratively in
FIG. 1
, the cache banks contain 16 memory locations each, each bank has 8 blocks, and each block corresponds to 2 of the cache bank's memory locations. TagRAM
106
locations (marked “T”) store some of the more significant bits of a main memory address that corresponds to the information in the block's DataRam
104
(marked “D”), and the StateRAM
108
(marked “S”) indicates whether the information in the associated cache block is valid. Illustratively, if a given cache bank address contains 128 bits, or four 32 bits words, and 2 cache addresses form a block, then each block contains eight 32 bit words. Storing 8 words in a block accounts for the three least significant bits of the main memory addresses. If the employed cache banks have 16 addresses and 2 cache addresses form a block, then the main memory can be divided into 8 sets of addresses, with addresses from each set being cached in designated blocks of the cache. This accounts for three more bits of the main memory addresses.
FIG. 1
includes three cache banks, which make cache
100
a 3-way set-associative cache. A CPU
300
that wishes to obtain information (instructions or data) provides an address to control module
200
, and control module
200
, in turn, provides the address to cache
100
. In response to the received address, the cache determines whether information corresponding to the provided address is found in the cache (i.e. a match is found in the TagRam). If so, the cache raises the “hit” signal of line
111
, places the identity of the bank where the hit occurred on line
112
, and provides the retrieved information to control module
200
. Control module
200
then provides the information to CPU
300
.
Lines
111
and
112
are applied to replacement unit (RU)
150
which updates its information regarding the latest “hit” and provides a signal on line
113
to control module
200
to assist control module
200
in determining the bank into which new information should be written, if necessary.
When cache
100
determines that there is a “miss,” that information is communicated to control module
200
via line
111
, which causes control module
200
to issue a Stall command to CPU
300
and to issue a request for information to memory
500
. This request is processed via memory interface circuitry
400
. Information is responsively fetched from main memory
500
, is provided to control module
200
(via unit
400
), and control module
200
writes the information into cache
100
. Concurrently, control module
200
can provide the fetched information to CPU
300
. The writing to cache
100
is executed by control module
200
, as indicated above, based on information provided to control module
200
by replacement unit
150
on line
113
. While the information is stored in the DataRam of cache
100
, the TagRAM and StateRAM are written to indicate that the corresponding address is now in cache and that the information is valid.
During a lookup of a set-associative partitioned cache, each cache-bank
102
selects one entry based on the sought address. The TagRAM for that entry is compared with the appropriate bits in the sought address, and if any of the comparisons match, a “hit” is declared, provided that StateRAM
108
for that entry indicates that the entry is valid.
Several schemes are known for determining the replacement bank. One is the least recently used (LRU) policy. Another one is the round robin replacement policy. Replacement unit (RU)
150
contains the hardware that determines the replacement bank, i.e., the bank into which information is written, and it uses the history of cache accesses to compute the replacement bank.
The least recently used (LRU) replacement policy uses the notion that the cache entry that has not been used for the longest time is the least likely to be used again. Consequently, it is a good choice to be replaced and is so designated. Since the access patterns differ on different lines, there is a separate LRU bank computed for each cache line. In order to keep track of the LRU index for each line, a history of accesses must be maintained.
In order to improve the performance of programs, a technique called “cache locking” has been proposed. Frequently used instructions or time-critical program segments are locked in the cache. That avoids the penalty of a cache-miss when the program references those instructions. Conversely, by locking some parts of the cache, the cache size available to non-critical program segments is reduced, which can slow them. That tradeoff must be studied for particular intended uses of the cache in order to determine whether locking is advantageous.
Several techniques have been proposed to accomplish Cache Locking. U.S. Pat. No.5,353,425 describes an approach that implements a pseudo LRU by using a set of bits, called the MRU (Most Recently Used) bits, for each direct-mapped address. When an entry is loaded in cache, the corresponding MRU bit is set. The replacement Unit finds a bank that does not have the MRU bit set and uses it as the Replacement Bank. If all the bits are set then it is a deadlock condition since there is no place for the incoming cache-line to be stored. The replacement Unit determines this and resets all the MRU bits except the last one to be loaded. This erases the history of accesses and for this reason it is called a Pseudo LRU scheme. However, by providing an additional bit (called a LOCK bit) for every MRU bit, that technique has the ability to implement cache locking. The LOCK bits are combined with the MRU bits to ensure that the locked line is never selected for replacement. When the cache-line is no longer going to be frequently accessed, the corresponding LOCK bit can be reset, which allows the cache-line to participate in the replacement.
U.S. Pat. No. 5,493,667 describes an arrangement with a true LRU replacement policy. The replacement computation uses a store of bank identifiers for each cache line. That memory uses the history of accesses and orders the banks according to the time since their last use. The technique works as follows. The cache bank that is the most recently used (MRU) is inserted into the right-most column of the memory. All the entries are shifted to the left, stopping at the entry that matches the current MRU index. This shifting operation results in the left-most entry being the least recently used (LRU) bank. When the cache misses, that LRU entry points to the bank that will receive the incoming instructions. When these instructions are loaded into the cache, this LRU index becomes the MRU and is inserted into the right-most column, resulting in a shift of the entries and a new LRU index being computed for the cache line. In order to perform cache locking, a lock signal is asserted. When that signal is asserted, the MRU index is inserted not into the right-most column, but inserted into the column on its left. That causes the cache bank indexed with the right-most column to be always resident in cache. One may view the locking as treating the right-most column as fixed and the remaining N-1 columns as an N-1 way set-associative cache as long as the lock signal is asserted.
In order to use that technique, it is envisioned that the lock signal is not asserted during the first-pass of the time-critical program segment. That causes the critical code to be loaded into the cache and the right-most column of the LRU storing the location of that code fragment. During subsequent iterations of the cri
Nicol Christopher John
Singh Kanwar Jit
Lucent Technologies - Inc.
Namazi Mehdi
Yacura Gary D.
Yoo Do Hyun
LandOfFree
Method and memory cache for cache locking on bank-by-bank basis does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and memory cache for cache locking on bank-by-bank basis, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and memory cache for cache locking on bank-by-bank basis will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2912453