Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-06-25
2002-09-17
Hudspeth, David (Department: 2651)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S213000
Reexamination Certificate
active
06453389
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to computer system optimization and, more particularly, to data prefetching methods and apparatus using data compression principles for prefetching data into cache.
2. Description of the Related Art
Prefetching data into cache is one approach to preventing Central Processing Unit (CPU) stalls in modern computer systems. Cache is a relatively small high-speed random access memory that is paged out of main memory and holds the instructions and data most likely to be accessed in the future. When the processor issues an instruction, data is fetched from memory. If the data is not in the cache, a request is made to the main memory to fetch the data. If the slower main memory is accessed, the computer system slows down, causing the CPU to stall.
Prefetching involves a prediction process: given a current memory state, the prefetch hardware determines a future referenced memory address and stores the predicted future reference in fast memory, e.g., cache. By doing so, accesses to the slower main memory are minimized, preventing CPU stalls and improving the overall speed of the computer system.
Several prefetching approaches have been developed over the years. One class of prefetching approaches is termed fixed prefetching. Fixed prefetching involves fetching a fixed function of the sequence of past misses. Examples of fixed prefetching algorithms include stride predictors and stream buffers. Stride predictors prefetch data references characterized by regular strides or address offsets. Stream buffers are designed to prefetch sequential streams of cache lines independent of program content.
Stride predictors tend to be more accurate than stream buffers. Stream buffers, on the other hand, tend to capture more cache lines and use resources more efficiently. Fixed prefetching algorithms like stride predictors and stream buffers work reasonably well on structured workloads. However, system performance degrades when fixed prefetching is used in combination with dynamically changing workloads.
Another class of prefetching approaches is termed adaptive prefetching. Adaptive prefetching algorithms learn the behavior of a sequence of memory misses and then predict and prefetch data accordingly. An example of adaptive prefetching is described in U.S. Pat. No. 5,485,609 to Vitter et al., issued Jan. 16, 1996, titled “Online Background Predictors and Prefetchers for Locality Management.” Vitter teaches an online prediction technique based on data compression principles for prefetching pages from memory. The Vitter prediction technique uses a Lempel-Ziv (“LZ”) data compression algorithm to make predictions based on past history and other information available at the time of the prediction.
One disadvantage of the Vitter invention is that it makes predictions over a set of past events, so that it can only predict events seen before: i.e., it cannot predict unknown or never-before-seen events. Moreover, predicting the next most likely event prevents CPU stalls only if the page is prefetched from memory before it is actually requested by the CPU. Thus, CPU time is only partially spared if the CPU's request for data occurs before the data is actually loaded into cache, even if the prediction is accurate.
There are a number of sources for prefetching information, including the address reference stream. Using the address reference stream has the disadvantage of requiring the prefetcher hardware to be located on the same chip as the CPU. Additionally, the prefetcher hardware needs to be highly efficient since the predictor analyzes a large number of references.
Accordingly, a need remains for a prefetch method and apparatus that is accurate, timely, and provides greater coverage. A need also remains for a prefetch method and apparatus that can predict events never before observed and that use the available resources more efficiently, and that can completely prefetch the data before the CPU needs it.
SUMMARY OF THE INVENTION
An embodiment of the present invention includes a prefetch method for prefetching data into cache. The prefetch method is adaptive, but it also includes a fixed stage that generates a rough estimate of the address to be prefetched, which is subsequently refined. The adaptive stage of the prefetch method includes building a dynamic data structure in tree form, the tree having a root, a plurality of internal nodes, a plurality of branches, and a plurality of leaves. The root is also known as the root node, and leaves are also known as external nodes. Within this application, the term “node” generally refers to the root node, internal nodes, and leaves. The branches emanate from the internal nodes and the root; the leaves have no outgoing branches. The root has no incoming branch; each internal node and leaf has exactly one incoming branch. The branches represent address differences between cache misses and rough predictions; each internal node and leaf stores a corresponding weight of its incoming branch. The method traverses the tree so that the current node corresponds to index j. The method receives an address difference at the current node and updates branches from the current node up toward the root of the tree. The method maintains the data structure in a dynamic fashion. New nodes and branches are created and deleted, and weights are updated at various stages during data processing. The method traverses the tree in the direction of the address difference until the method reaches a new current node. The method uses the weights of the branches at the new current node and prefetches data accordingly.
Another aspect of the present invention is its capacity to predict events that have not been seen before.
Another aspect of the present invention is its capacity to prefetch cache lines far enough in advance to prevent CPU stalls, while simultaneously minimizing the cache lines' residency in the prefetch cache. Although Vitter teaches cache line prediction, Vitter does not teach a prediction technique that loads cache lines sufficiently early to completely prevent CPU stalls. For example, Vitter will always tend to predict the next cache line needed. In contrast, the present invention will predict cache lines several steps in advance if so dictated by the memory latency.
Another embodiment of the present invention includes a method for optimizing computer system performance by taking actions that sequentially minimize a total loss function. Here, a “loss” is a quantity whose minimization results in a system performance benefit. For example, some measurement of expected stall time can be taken as a “loss.” The total loss L is an accumulation of the losses produced by individual outcomes in accordance with
L
=
∑
j
=
0
n
⁢
⁢
L
⁢
⁢
(
x
j
)
L
⁢
⁢
(
x
j
)
=
∑
i
=
0
sl
⁡
(
j
)
-
1
⁢
⁢
C
i
⁢
⁢
(
b
j
-
i
,
x
j
)
where b
j−i
is the (j−i)
th
action taken, x
j
is the j
th
outcome received, and sl(j) (which may depend on j) is the number of past actions contributing to the loss caused by x
j
, sl(j)≦j, each action b
j−i
contributing C
i
(b
j−i
, x
j
). For every j, 1≦j≦n, action b
j
may depend on x
0
, x
1
, . . . , x
j−1
, but is taken before seeing xj. The possible actions are assumed to be in one-to-one correspondence with the possible outcomes, and for each outcome x there is one (and only one) action b for which C
i
(b, x)=0. The method constructs a data structure in tree form, the tree having a root, a plurality of internal nodes, a plurality of branches, and a plurality of leaves. The branches emanate from the internal nodes and the root; the leaves have no outgoing branches. The root has no incoming branch; each internal node and leaf has exactly one incoming branch. The branches represent differences between corresponding outcomes x
j
and rough predictions y
j
, where y
j
depends on x
0
, x
1
, . . . , x
j−1
through a fixed, predetermined function; each intern
Ferrandiz Joesp M.
Gupta Rajiv
Merhav Neri
Rokicki Tomas G.
Seroussi Gadiel
Hudspeth David
Tzeng Fred F.
LandOfFree
Optimizing computer performance by using data compression... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Optimizing computer performance by using data compression..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimizing computer performance by using data compression... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2861641