Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-04-15
2001-08-28
Nguyen, Than (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S113000, C711S136000, C711S137000, C711S138000, C712S233000, C712S234000, C712S237000, C712S239000
Reexamination Certificate
active
06282614
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to microprocessors which have two or more levels of memory caches interposed in the memory hierarchy between the main memory and the CPU.
BACKGROUND OF THE INVENTION
A memory cache or “cache” is a mechanism in the memory hierarchy between the main memory and the CPU which improves the effective memory transfer rates and raises processing speeds. The term “cache” refers to the fact that the cache mechanism is not apparent to the user, who only observes an apparently higher speed main memory.
A cache has a smaller memory storage capacity than the main memory but has a much higher access speed. Caches are generally implemented by semiconductor devices, the speeds of which are comparable to that of the processor. By contrast, the main memory generally uses a less costly, lower speed technology, but has a much higher overall storage capacity.
The cache mechanism anticipates the likely re-use by the CPU of information, whether data or code, in the main memory by organizing a copy of the data or code in cache memory. When information is accessed from the main memory, it is common for associated information also to be accessed and stored in the cache. For example, if a required code is part of a sequence of instructions, the subsequent instructions should be received with the first instruction, so that access to the main memory can be minimized.
In modern microprocessors, it is common for one or more levels of memory caches to be included on a particular chip, and for an additional level of memory cache to be off-chip. Currently, some chips contain one or two on-chip caches, and it is anticipated that future products may contain more than two on-chip and off-chip caches.
In current microprocessors which access more than one level of cache, the higher-level caches generally have a larger storage capacity than the lower-level caches. For example, in the case of a two-level cache hierarchy, the second or L
2
cache generally has a larger storage capacity than the first or L
1
cache. The L
2
cache has a significantly greater speed than the main memory, but it also has a significantly smaller storage capacity. The information on the L
1
cache is usually a subset of the information on the L
2
cache. The L
2
cache only needs to be accessed if the desired code or data is not resident in the L
1
cache.
For chips with multiple levels of caches, there is a tradeoff between the maximum processing speed and the minimum power usage of the caches. Processing speed can be maximized by simultaneously addressing more than one cache, for example, by simultaneously addressing the L
1
cache and the L
2
cache. However, simultaneously addressing the L
1
and L
2
caches uses power unnecessarily if the desired data or code is resident in the L
1
cache.
Alternatively, power consumption can be reduced by accessing the higher-level cache or caches only when necessary. In the example of a chip with a 2-level cache hierarchy, power consumption can be reduced by accessing the L
2
cache only when there is a “miss,” that is, when the L
1
cache is addressed but the desired data or code is not currently in the L
1
cache (as opposed to a “hit” where the desired data or code is currently resident). However, this method results in a larger effective access time for the L
2
cache and a consequent reduction in processing speed.
A prediction of L
1
cache data read misses has been implemented in the Compaq Alpha 21264. See R. E. Kessler, E. J. McLellan, and D. A. Webb, “The Alpha 21264 Microprocessor Architecture,”
International Conference on Computer Design
(
ICCD'
98), pp. 90-95 (October 1998) (“Kessler”), which is incorporated herein by reference.
However, the predictor described in Kessler does not disable the L
2
cache for power savings. Instead, the Kessler predictor is used to reduce the penalty in those cases where a read access results in a “miss” of the L
1
cache and the consumer of the read instruction was dispatched for execution before knowing if the data access resulted in a hit, in order to keep the L
1
cache latency low. Therefore, if there is a miss, the consumer instruction and all the subsequent instructions have to be re-fetched with a high penalty in cycles.
To mitigate this problem, the Alpha 21264 has a data read miss predictor which consists of a saturated 4-bit counter that tracks the hit/miss behavior of recent data reads. This counter decrements by two on cycles when there is a read miss and increments by one when there is a hit. The most-significant bit of the counter is used to do the prediction.
It is an object of the present invention to provide a method for reducing power consumption in a multiple-cache microprocessor without creating an unacceptable reduction in processing speed.
SUMMARY OF THE INVENTION
The present invention seeks to overcome the disadvantages of optimizing either the processing speed or power consumption of caches independently. In the present invention, a dynamic predictor is used to provide a lower power consumption than the optimally fast condition, yet a faster processing speed than the condition of lowest power consumption. The misses of the lower-level cache are predicted based on the fact that misses occur in bursts which correspond to changes in the working set.
In the apparatus and method proposed, a window of size S is defined. After a miss occurs, a predicting device predicts that the next miss will occur in the next S accesses to the lower-level cache. If no miss occurs in these S accesses, a hit is predicted until the next miss.
A method of predicting access hits and misses of a lower-level cache and determining when to access one or more higher-level caches in a cache hierarchy includes the steps of: defining a window of size S for information which is sought on a cache, wherein S is measured in a number of accesses; detecting whether a hit or a miss occurs after each access of the cache; predicting, after each miss is detected, that the next miss will occur in a subsequent S accesses to the cache; and predicting, if no miss is detected within the subsequent S accesses, that the next access will be a hit.
This method is used for code access and for data reads, since data writes are handled by means of a write buffer. Therefore, the predictor method can have two parameters, S
c
and S
d
, that correspond to the window sizes for code accesses and data reads, respectively. These two windows can be independent of each other: the distances between L
1
cache misses may be kept separately for code accesses and data reads.
An apparatus which implements the foregoing process includes: one or more detection circuits configured to receive a signal indicating whether a hit or a miss occurs after each access of a cache and in accordance therewith output a detection signal; one or more counters coupled to the one or more detection circuits, wherein the one or more counters are configured to receive the detection signal and in accordance therewith output a counting signal, wherein the one or more counters are set to a level of size S measured in a number of accesses to the cache and wherein the counters are reset each time a miss is detected and incremented each time a hit is detected; one or more comparison circuits coupled to the one or more counters, wherein the one or more comparison circuits are configured to receive the counting signal, to determine whether the number of accesses of the cache is above, below or equal to S based on the counting signal and in accordance therewith output a comparison signal; and one or more prediction circuits coupled to the one or more comparison circuits, wherein the one or more prediction circuits are configured to receive the comparison signal, to predict a hit or miss based on the comparison signal and in accordance therewith output a prediction signal.
The features and advantages of the present invention will be more clearly understood from the following detailed description, in connection with the accompanying drawings.
REFERENCES:
patent: 5506976 (1996-04-01
National Semiconductor Corporation
Nguyen Than
Stallman & Pollock LLP
LandOfFree
Apparatus and method for reducing the power consumption of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for reducing the power consumption of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for reducing the power consumption of a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2536862