Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2000-05-22
2003-07-01
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S144000, C711S128000
Reexamination Certificate
active
06587923
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to memory addressing schemes in computer systems, and specifically, to a cache directory structure and cache memory access method for reducing memory access latency in systems employing multiple cache memory levels.
BACKGROUND OF THE INVENTION
The design and use of caches in computer systems is a well-known and widely studied area in the field of memory hierarchy design (for example, see the text books
High Performance Computer Architecture,
by Harold Stone, third edition, Addison-Wesley, 1993, chapter 2 (herein referred to as “Stone”), and
Computer Architecture A Quantitative Approach,
by John Hennessy and David Patterson, second edition, Morgan Kaufman, 1996, chapter 5). Presently, typical computer systems have two levels of caching, where the caches are referred to as the L1 and L2 caches, above main memory. The units of transfer between the L1 and L2 caches, and between the L2 cache and main memory, are referred to as cache lines. Today, typical computer systems have a fixed line size, usually with the same line size for the L1 and L2 caches.
As main memory sizes increase, a third level (L3) cache may become necessary for efficient operation. The L3 cache may have a larger line size than that of the L1 and L2 caches. Furthermore, in computer system designs that utilize main memory compression, logically fixed size units in main memory are automatically stored in compressed form and decompressed on cache misses. In such designs, the logically fixed size units of main memory occupy varying amounts of space (due to compression). A number of techniques for efficiently storing and accessing variable size units of compressed data in main memory are now known, for example, and described in: commonly-owned U.S. Pat. No. 5,761,536 entitled SYSTEM AND METHOD FOR REDUCING MEMORY FRAGMENTATION BY ASSIGNING REMAINDERS TO SHARE MEMORY BLOCKS ON A BEST FIT BASIS; commonly-owned U.S. Pat. No. 5,864,859, entitled SYSTEM AND METHOD OF COMPRESSION AND DECOMPRESSION USING STORE ADDRESSING; and, a reference authored by P. Franaszek and J. Robinson entitled DESIGN AND ANALYSIS OF INTERNAL ORGANIZATIONS FOR COMPRESSED RANDOM ACCESS MEMORIES, IBM Research Report RC 21146, IBM Watson Research Center, Oct. 20, 1998.
In the case where main memory is compressed, it is natural to identify the unit of compression as a line size; for purposes of description, in a computer system with three levels of caching (L1, L2, L3), the main memory will be referred to as L4, and the unit of compression will be referred to as the L4 line size. As an example, the L1 and L2 caches could have line sizes of 64 bytes, the L3 cache could have a line size of 256 bytes, and L4 (main memory) could have a line size (that is, size of the unit of compression) of 1024 bytes.
FIG. 1
illustrates a cache/memory hierarchy with CPU
110
, main memory L
n
160
, and multiple cache levels L
i
120
, L
i−1
130
, L
i
140
, L
n−1
150
.
FIG. 2
illustrates a conventional cache directory
210
for a 4-way set associative cache that implements a 4-way content addressable memory
220
to determine whether a line corresponding to a given memory address is currently stored in the cache, and if so, its location and status. To determine whether L
i−1
includes a line of size s (s=256 bytes, for example) corresponding to a CPU generated address “A”, e.g., a 32-bit or 64-bit address, including tag bits
200
(“tag”) and offset bits
201
(“offset”), the set
215
corresponding to the offset
201
(in this example j) is found in the L
i−1
cache directory
210
, and the tag
200
is compared to all tags
216
,
217
,
218
,
219
in set
215
of the L
i−1
cache directory
210
using a 4-way content addressable memory
220
(i.e., comparators). As a result of the comparison, there will be generated either 0 or 1 matches, and if a match exists, the index of the match is used to find the corresponding line of size s in cache L
i−1
. Once the line is found, it is gated to a cache output-data buffer for use by the system/processor.
FIG. 2
corresponds to the cache directory portion as illustrated in FIG. 2-7 on page 38 of Stone.
Referring back to
FIG. 1
, due to possible differences in line sizes between L2 and L3 caches, and between L3 and L4 (main memory), certain problems in efficient operation may occur. In an example scenario, consideration is given to the interface between L3 cache and L4 in the case that main memory (L4) is compressed, and with line sizes as in the previous example. Additionally, it is assumed that caches are of the store-in type, that is, when a line in L3 cache is replaced, if the line is marked as modified, it is necessary to first write the line to L4 (referred to as a writeback). Since the L4 lines are compressed and of size 1024 bytes (when uncompressed), and given that the L3 cache lines are of size 256 bytes, this may involve decompressing the L4 line, performing the writeback (i.e., writing the appropriate 256 byte section of the 1024 byte uncompressed L4 line using the modified data in the 256 byte L3 line), and then re-compressing the L4 line. If a sequence of L3 writebacks from different modified L3 lines occur to the same L4 line, this would involve repeated decompressions and re-compressions for the same L4 line, which may result in excessive overhead. An approach for solving this problem is as follows: at the time the L3 line is written back, find all other modified L3 lines residing in the same L4 line, and write back all such L3 lines; that is, “batching” L3 writebacks. However, current techniques for cache directory design may result in this being an expensive operation. As described in the previously cited text book by Stone for example, a cache directory may be organized using a number of associativity sets, and a content addressable memory may be used to determine whether a line with a given address is in the cache in the associated set (see Stone, Section 2.2.2, pages 36-44). Using the usual mapping of addresses to sets, the four consecutive 256 byte L3 lines residing in a 1024 byte L4 line will be in four consecutive sets. In order to check whether each line is in the L3, and if so, whether it is modified, the L3 cache directory must be accessed four times. This problem is made worse for increasing differences between L3 and L4 line sizes. For example, with a 64 byte L3 line size, the L3 cache directory would have to be accessed 16 times in order to find all modified L3 lines residing in a given 1024 byte L4 line (because 16×64=1024).
Similar problems occur in certain situations even when there is no memory compression. Consider the previous example of an L3 line size of 256 bytes and an L2 line size of 64 bytes. It has been found advantageous, in memory hierarchy design, due to considerations having to do with cache coherency in multiprocessor designs and correct functioning of the I/O system in both uniprocessor and multiprocessor designs, to incorporate a principle known as inclusion. This is defined in the previously referenced text book by Hennessy and Patterson as follows (page 723):
every level of cache hierarchy is a subset of the level further away from the processor.
In the context of the current example, inclusion requires that when a 256 byte line is replaced in the L3, if any of the four 64 byte lines residing in the 256 byte line are currently present in the L2 (or in a multiprocessor system with multiple L2 caches, each for a different processor, any such L2), these lines must be marked as invalid (with a writeback required before invalidation if the line is marked as modified). As in the previous example, using current cache design techniques, this requires four L2 cache directory accesses (or in a multiprocessor system, four L2 cache directory accesses for every L2 cache). As before, this problem is made worse with increasing differences between L2 and L3 line sizes.
It would thus be highly desirable, in a computer system with multiple levels of caches L
1
, L
2
Benveniste Caroline D.
Franaszek Peter A.
Robinson John T.
Bataille Pierre-Michael
International Business Machines - Corporation
Jennings Derek S.
Kim Matthew
Scully Scott Murphy & Presser
LandOfFree
Dual line size cache directory does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Dual line size cache directory, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dual line size cache directory will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3055443