Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
2000-05-25
2004-04-13
Chan, Eddie (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S240000
Reexamination Certificate
active
06721876
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention is related to the field of processors and, more particularly, to branch prediction mechanisms in processors.
2. Description of the Related Art
Superscalar processors may achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term “clock cycle” refers to an interval of time accorded to various stages of an instruction processing pipeline within the processor. On the other hand, superpipelined processors may achieve high performance by implementing numerous stages in the instruction processing pipeline and overlapping execution of a large number of instructions using the numerous stages.
An important feature of a superscalar or superpipelined processor is its branch prediction mechanism. The branch prediction mechanism indicates a predicted direction (taken or not-taken) for a branch instruction and/or a predicted target address, allowing subsequent instruction fetching to continue within the predicted instruction stream indicated by the branch prediction. Instructions from the predicted instruction stream may be speculatively executed prior to execution of the branch instruction, and/or may be placed into the instruction processing pipeline prior to execution of the branch instruction. If the predicted instruction stream is correct, then the average number of instructions executed per clock cycle is advantageously increased. However, if the predicted instruction stream is incorrect (i.e. one or more branch instructions are predicted incorrectly), then the instructions from the incorrectly predicted instruction stream are discarded from the instruction processing pipeline and the number of instructions executed per clock cycle is decreased.
A branch instruction is an instruction which causes subsequent instructions to be fetched from one of at least two addresses: a sequential address identifying an instruction stream beginning with instructions which directly follow the branch instruction; and a target address identifying an instruction stream beginning at an arbitrary location in memory. The target address may be generated from: (i) either the instruction address of the branch instruction or the instruction sequential to the branch instruction (where the instruction address is the memory address at which the instruction is stored, and is often referred to as the program counter address, or PC); and/or (ii) one or more operands of the instruction. Unconditional branch instructions always branch to the target address, while conditional branch instructions may select either the sequential or the target address based on the outcome of a prior instruction. Branch instructions may also be categorized as direct or indirect. Direct branch instructions generate a target address from at most a displacement encoded into the instruction and the instruction address, and thus do not require an operand fetch to generate the target address. Indirect branch instructions require at least one operand fetch (e.g. from a register or a memory location) to generate the target address.
Accurately predicting indirect branch instructions has become increasingly important. Indirect branch instructions are typically more prevalent in object-oriented programming styles (e.g. Java, C++, etc.). For example, class member functions are typically called using indirect branch instructions.
When predicting indirect branch instructions, the target address is predicted since the target address cannot be calculated without fetching the operands of the branch instruction. Since the operands are in registers or memory locations, the operands may be changed between various executions of a particular indirect branch instruction and thus the target address of the particular indirect branch instruction may change from execution to execution. The target address resulting from an execution of the particular indirect branch instruction may be correlated to the previously encountered branch instructions (in other words, the target address may be correlated with the instructions executed prior to execution of the particular indirect branch). An indirect branch predictor, designed with cost of implementation and accuracy of prediction as design goals and taking into account the correlation that may exist between the target address of the particular indirect branch instruction and previously encountered branch instructions, is desired.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by an indirect branch target predictor as described herein. The indirect branch predictor includes a buffer storing branch target addresses corresponding to previously executed indirect branch instructions. The buffer is indexed with an index derived from history information corresponding to previously predicted indirect branch instructions (e.g. a portion of the predicted target address corresponding to previously predicted indirect branch instructions may be used) and from the PC of the particular indirect branch instruction being predicted. The target address from the indexed entry may be used as the prediction for the particular indirect branch instruction. In one embodiment, the buffer may be tagless, thereby reducing cost by eliminating storage for the tags. Additionally, the buffer may be direct mapped in one embodiment, which may reduce power consumption during access to the buffer. In various embodiments, the indirect branch target predictor may generate the index to the buffer using one or more techniques to improve the accuracy of the prediction.
A first index generation technique involves offsetting the history information from the various previously predicted indirect branch instructions. In other words, bits in the same bit position within the history information corresponding to each previously predicted indirect branch instruction affect different bits of the generated index. By offsetting the history information, the indirect branch target predictor may more accurately reflect the order in which the previously predicted indirect branch instructions occur. Prediction accuracy may be increased for cases in which the order of the previously predicted branch instructions affects the target address generated by a particular indirect branch instruction.
A second index generation technique involves weighting the history information based on the age of the previously predicted indirect branch instructions. The number of bits of history information corresponding to more recently predicted indirect branch instructions used in generating the index may be greater than the number of bits of history information corresponding to less recently predicted indirect branch instructions. Prediction accuracy may be improved for those cases in which the correct prediction is more closely correlated to the more recently predicted indirect branch instructions than to the less recently predicted indirect branch instructions.
A third index generation technique involves reversing the bit order of the PC of the particular indirect branch instruction being predicted. In other words, the most significant bits of the portion of the PC used in generating the index may be used in generating least significant bits of the index, while most significant bits of the history information may be used in generating the most significant bits of the index. For code which exhibits locality, the most significant bits of the PC may be relatively stable at any given point in time. The least significant bits of the PC change for each byte and thus may be a quasi-tag for the indirect branch instruction. The most significant bits of the PC are combined with the most recent bits of the history information, and may thus preserve the most recent history information since the most significant bits of the PC are not changing very frequently. The least significant bits of the PC are combined with the least recent bits of the history information. Accordingly, entrie
Chen I-Cheng K.
Matus Francis M.
Advanced Micro Devices , Inc.
Chan Eddie
Harkness Charles
Merkel Lawrence J.
Meyertons Hood Kivlin Kowert & Goetzel P.C.
LandOfFree
Branch predictor index generation using varied bit positions... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Branch predictor index generation using varied bit positions..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Branch predictor index generation using varied bit positions... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3257829