Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2002-09-30
2003-09-16
Kim, Hong (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S128000, C711S210000, C711S216000, C711S108000, C711S137000, C711S208000, C711S208000, C711S208000
Reexamination Certificate
active
06622209
ABSTRACT:
FIELD OF THE INVENTION
The invention pertains to the storage of data in a cache, and more particularly to the efficient storage of prediction information such as branch prediction information in a cache. Branch prediction information is information which predicts whether an instruction of the form “branch to instruction X if condition Y is met” will require the branch to be taken or not taken.
If branch prediction is to be based on an algorithm other than “branches are always predicted taken” or “branches are always predicted not taken”, some means must exist for storing branch prediction information. Frequently, such a means comprises a table of 2-bit branch prediction history counts which are respectively incremented or decremented in response to branch instructions being taken or not taken.
BACKGROUND OF THE INVENTION
Many of today's microprocessors incorporate structures known as instruction pipelines. Instruction pipelines increase the efficiency of a processor by enabling a processor to simultaneously process a plurality of instructions. Instruction pipelines can be thought of as instruction assembly lines. Instruction
—
0 enters the first stage of the pipeline while Instruction
—
1 is simultaneously processed in the second stage of the pipeline, Instruction
—
2 is simultaneously processed in the third stage of the pipeline, and so on. Periodically, a new instruction is clocked into the instruction pipeline, and each instruction being processed in the pipeline is passed to the next stage of the pipeline, or is output from the pipeline.
To maximize instruction execution efficiency, it is desirable to keep instruction pipelines full as often as possible (with an instruction being processed in each stage of the pipeline) such that each periodic clocking of the instruction pipeline produces a useful output. However, whenever 1) there has been a transfer of program flow control from one section of program code to another, 2) instructions have been speculatively fetched and processed, and 3) it is determined that the speculatively fetched and processed instructions should not have been processed, an instruction pipeline will produce an output that is not useful. For each clock cycle that an instruction pipeline produces an output that is not useful, the instruction pipeline has a negative impact on a processor's efficiency.
Program flow control instructions such as branch instructions are one means by which program flow control can be transferred from one section of program code to another. Branch instructions can be conditional or unconditional. A conditional branch instruction determines program flow control based on the resolution of a specified condition. An unconditional branch instruction always results in a transfer of program flow control.
“Branch to instruction X if A>B” is one example of a conditional branch instruction. If A>B, program control flow will transfer to a section of program code beginning with instruction X (i.e., a target code section). If A≦B, program control flow will continue with a section of program code which sequentially follows the conditional branch instruction (i.e., a sequential code section).
Since instruction pipelines can be several stages deep, conditional branch instructions are often fetched before the conditions specified in the branch instructions can be resolved. A processor must therefore predict whether branches will be taken or not taken. After a prediction is made, instructions are speculatively fetched from either a target code section (if a branch is predicted to be taken) or a sequential code section (if a branch is predicted to be not taken).
Although many branch prediction algorithms exist, mispredictions still occur. By the time a misprediction is identified, it is possible for an instruction pipeline to be operating on many instructions which were fetched from an incorrect code section. On encountering such a misprediction, misfetched instructions which are being processed in one or more pipelines must be flushed from the pipelines, and instructions from the correct code section must be fetched and processed through the pipelines.
When flushing instructions from a pipeline, bubbles (or gaps) are injected into the pipeline. Unfortunately, pipeline flushing sometimes makes it necessary to clock a pipeline through several clock cycles before the instruction pipeline can once again produce a useful output. Since conditional branch instructions and other program flow control instructions are prevalent in program code (e.g., sometimes on the order of once every five instructions), the cumulative effect of branch misprediction can have a significant and detrimental impact on a processor's performance, even when branch prediction accuracy is relatively high.
A table of branch prediction history counts can be stored in any one of a number of cache types, including a direct mapped cache, a set-associative cache, and a non-tagged, n-way cache. However, most forms of these caches suffer from cache conflicts and/or aliasing.
Assuming that a cache is designed to store a table of branch prediction history counts, a cache conflict occurs when a count associated with address A and a count associated with address B need to be stored in the same cache entry. Cache conflicts may be reduced by, for example, increasing the size of a cache or implementing the cache as a set-associative cache (i.e., storing tags in a cache) or non-tagged, n-way cache.
Aliasing occurs when an attempt is made to read a count associated with address A from a cache, and instead, a count associated with address B is read from the cache. Aliasing can therefore lead to false hits being generated by a cache (i.e., a hit on data which is not the desired data).
Aliasing typically assumes one of three forms: compulsory aliasing, capacity aliasing or conflict aliasing. Compulsory aliasing occurs when an attempt to read a count associated with address A is made prior to an initialization of the count. One way to mitigate compulsory aliasing is to initialize the entries of a cache to one or more predetermined count values.
Capacity aliasing occurs when a working number of counts which need to be stored in a cache exceeds the number of cache entries which are available for storing the counts. One way to mitigate capacity aliasing is to increase the number of entries in a cache.
Conflict aliasing occurs as a result of a cache conflict. Conflict aliasing can be reduced by, for example, making a cache larger, implementing a cache as a set-associative cache, or implementing a cache as a non-tagged, n-way cache.
In a direct mapped cache of 2-bit counts, each count may be directly addressed by, for example, a number of bits of an instruction pointer. Unfortunately, a direct mapped cache is not a very effective means for storing branch prediction information, since cache conflicts and aliasing can only be reduced by increasing the size of the cache, and chip area for implementing a branch prediction cache is typically very limited.
One alternative to a direct mapped cache of 2-bit counts is a set-associative cache of 2-bit counts. In a set-associative cache, each count is stored with a tag, and a count is addressed by first reading a line of n counts from the cache (i.e., one count from each of the n ways, or a “set” of counts), and then comparing the tag of a requested count with the tags of the n counts which were just read. Cache conflicts can still occur, but the storage of tags helps to keep a cache conflict from leading to conflict aliasing. In fact, aliasing can be eliminated if complete tags are stored with the counts. Complete tags are tags which enable counts to be uniquely addressed. As the size of a tag is decreased, aliasing tends to increase. However, a smaller set-associative cache can achieve results which are comparable to those of a much larger direct mapped cache. One drawback to storing data such as 2-bit counts in a set-associative cache is that tags which are large enough to mitigate aliasing to an acceptable level will typically be much
Hewlett--Packard Development Company, L.P.
Kim Hong
LandOfFree
Use of non-count data and index hashing to reduce false hits... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Use of non-count data and index hashing to reduce false hits..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Use of non-count data and index hashing to reduce false hits... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3071463