Method and apparatus for caching victimized branch predictions

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S128000, C712S237000, C712S239000

Reexamination Certificate

active

06427192

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to storing branch predictions generated within a microprocessor.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance through the use of pipelining, parallel execution, and high clock rates. Pipelining is an implementation technique whereby multiple instructions are overlapped during the execution process. Parallel execution refers to the simultaneously executing multiple instructions in a clock cycle. As used herein, the term “clock cycle” refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. At the end of a clock cycle, the resulting values are moved to the next pipeline stage.
Pipelining has several hazards associated with it. One particular hazard is stalling the pipeline due to branch instructions. When a branch instruction propagates through the pipeline, it is difficult to determine which instructions after the branch should be processed until the results of the branch instruction are know. For example, if the branch instruction is “taken”, then the next instruction to be executed after the branch may be located at a particular address that is offset from the branch instruction's address. In contrast, if the branch instruction is “not taken”, then the next instruction to be executed may be located at the address immediately following the branch instruction. As a result, the initial stages of the pipeline may be unable to determine which instructions should begin execution in the pipeline following the branch instruction. Thus, the pipeline may stall awaiting the results of the branch instruction.
In order to prevent the instruction pipeline from stalling, microprocessor designers may implement branch prediction schemes to provide the initial pipeline stages with a predicted result for each branch instruction. The initial stages of the pipeline speculatively execute instructions along the predicted path until the branch instruction executes and one of the following occurs: (1) the prediction is found to correct, in which case the instructions continue to execute and are no longer speculative, or (2) the prediction is found to be incorrect, in which case all pipeline stages executing instructions after the branch are flushed and the pipeline starts anew using the correct path.
Many branch predictions schemes involve storing a prediction bit indicating whether the branch instruction is taken or not taken, and a predicted target address for when the branch instruction is taken. If the prediction is determined to be incorrect upon execution of the branch instruction, then the prediction bit is updated to reflect the actual results of the branch instruction. Some microprocessors use more complex schemes for branch prediction rather a simple taken
ot taken prediction. For example, a two-bit prediction scheme may be used to increase prediction accuracy when branch instructions are either taken a high percentage of the time or not taken a high percentage of the time (e.g., in a loop). In two-bit prediction schemes, a prediction must miss twice before it is changed.
While the particular algorithms for each type of branch prediction scheme may vary, all tend to store some form of historical information that is developed as each branch instruction is executed. In some configurations, separate branch prediction information is stored for each branch instruction according to its address. This type of branch prediction scheme is illustrated in FIG.
1
. The hardware used to store the prediction information is typically referred to as a “branch target buffer”. One potential drawback of the branch target buffer illustrated in
FIG. 1
, is that the number of branch predictions is limited by the size of the branch target buffer. For example, assuming the branch target buffer has storage locations sufficient to store 64 branch predictions, then upon detecting a sixty-fifth branch instruction, the buffer must begin discarding the previously generated branch prediction information to make room for new branch prediction information. The size of this type of branch target buffer may be further limited by a number of factors including the desired access speed.
Other schemes that may be capable of storing more prediction information and or having faster access times may use branch target buffers that have structures mirroring the microprocessor's instruction cache. Instruction caches are high speed memory arrays that typically reside within the microprocessor. Instruction caches are characterized as having fast access times and high data output rates when compared with the access times and output rates of other memories that are further away from the microprocessor, e.g., main system memory. Instruction caches are typically organized into a plurality of blocks or “cache lines”. A cache line typically refers to the smallest amount of storage that may be allocated within the instruction cache. For example, an instruction cache may be 32 kilobytes large and may have cache lines that are 16 bytes long.
When instruction bytes are read from main system memory into the instruction cache, they are read in fixed byte-length sequences (e.g., 16 byte sequences) that typically match the cache line length. Each instruction sequence (referred to herein as a “prefetch line”) is typically stored in its own cache line along with an address “tag”. The address tag is a predetermined portion of the instruction sequence's address that serves to identify which instruction bytes are stored within a particular cache line.
Some cache configurations put limits on where a prefetch line having a particular address may be stored. A “fully associative” cache allows a prefetch line to be stored in any cache line within the cache. Conversely, a “direct mapped” cache forces a prefetch line to be stored in a particular location within the cache according to its address. “Set associative” caches define a set of storage locations within which a prefetch line may be stored. Which set the cache line is assigned to is a function of the prefetch line's address. These set associative caches may be visualized as two dimensional arrays with each row defining a set. The number of columns (or “ways”) defines the level of associatively of the cache. For example, a cache having two columns is referred to as a two-way set-associative cache.
The overall size of an instruction cache size is limited by a number of factors, including the process used to manufacture the microprocessor and die space allocated to the instruction cache. Typically, only a small portion of the total instructions for a particular program may reside in the instruction cache at any one time. Thus, various cache management schemes are utilized to load and replace the contents of the instruction cache. The goal of these management schemes is to ensure that the instructions stored in the instruction cache at any given time are the ones most likely to be needed by the microprocessor. Thus cache lines are continually being loaded and overwritten with new instructions.
As previously noted, some branch prediction schemes use branch target buffers that mirror the microprocessor's instruction cache structure. For example, if the instruction cache is 4-way set associative with 512 sets (i.e., a 4 by 512 array), the branch target buffer may be configured into an array having the same dimensions (4 by 512) and will store one set of branch prediction information for each cache line within the instruction cache. By mirroring the instruction cache's configuration, the branch target array may be easily accessed in parallel with the instruction cache using the same portion of the requested instruction address. Thus, the branch target information corresponding to a particular cache line may be available at the same time or sooner than the instruction bytes stored within the corresponding cache line.
However, as previously noted the cache lines within the instruction cache are continually

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for caching victimized branch predictions does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for caching victimized branch predictions, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for caching victimized branch predictions will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2853139

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.