Electrical computers and digital processing systems: processing – Processing control – Branching
Utility Patent
1997-06-27
2001-01-02
Coulter, Kenneth R. (Department: 2758)
Electrical computers and digital processing systems: processing
Processing control
Branching
Utility Patent
active
06170053
ABSTRACT:
TECHNICAL FIELD OF THE INVENTION
The present embodiments relate to microprocessor technology, and are more particularly directed to a microprocessor with circuits, systems, and methods for responding to branch instructions based on the history of past branch prediction accuracy.
BACKGROUND OF THE INVENTION
Significant advances have recently been made in the design of microprocessors to improve their performance, as measured by the number of instructions executed over a given time period. One such advance relates to microprocessors of the “superscalar” type, which can accomplish parallel instruction completion with a single instruction pointer. Typically, superscalar microprocessors have multiple execution units, such as multiple integer arithmetic logic units (ALUs), multiple load/store units (LSUs), and a floating point unit (FPU), each of which is capable of executing a program instruction. As such, multiple machine instructions may be executed simultaneously in a superscalar microprocessor, providing obvious benefits in the overall performance of the device and its system application.
Another common technique used in modern microprocessors to improve performance involves the “pipelining” of instructions. As is well known in the art, microprocessor instructions each generally involve several sequential operations, such as instruction fetch, instruction decode, retrieval of operands from registers or memory, execution of the instruction, and writeback of the results of the instruction. Pipelining of instructions in a microprocessor refers to the staging of a sequence of instructions so that multiple instructions in the sequence are simultaneously processed at different stages in the internal sequence. For example, if a pipelined microprocessor is executing instruction n in a given microprocessor clock cycle, a four-stage pipelined microprocessor may simultaneously (i.e., in the same machine cycle) retrieve the operands for instruction n+1 (i.e., the next instruction in the sequence), decode instruction n+2, and fetch instruction n+3. Through the use of pipelining, the performance of the microprocessor can effectively execute a sequence of multiple-cycle instructions at a rate of one per clock cycle.
Through the use of both pipelining and superscalar techniques, modern microprocessors may execute multi-cycle machine instructions at a rate greater than one instruction per machine clock cycle, assuming that the instructions proceed in a known sequence. However, as is well known in the art, many computer programs do not continuously proceed in the sequential order of the instructions, but instead include branches (both conditional and unconditional) to program instructions that are not in the current sequence. Such operations challenge the pipelined microprocessor because an instruction in the pipeline may not necessarily reach execution. For example, a conditional branch instruction may, upon execution, cause a branch to an instruction other than the next sequential instruction currently in the pipeline. In this event, the instructions currently in the pipeline and following the branch instruction are not used. Instead, these successive instructions are “flushed” from the pipeline and the actual next instruction (i.e., the target of the branch) is fetched and processed through the pipeline (e.g., by decoding, execution, writeback and the like). Flushing in this manner, however, expends multiple machine clock cycles before execution of the actual target instruction occurs, and the intervening clock cycles required to re-fill the pipeline appear as idle cycles from the viewpoint of completed instructions.
The effects of the above-described non-sequential operation, and of the resulting pipeline flush, may be worsened in the case of superscalar pipelined microprocessors. If, for example, a branch or other interruption in the sequential instruction flow of the microprocessor occurs in such microprocessors, the number of lost pipeline slots, or lost execution opportunities, is multiplied by the number of parallel pipelines. The performance reduction due to branches and non-sequential program execution is therefore amplified in superscalar pipelined microprocessors.
In order to minimize microprocessor performance reduction which results from non-sequential program execution, many modem microprocessors incorporate speculative execution based upon branch prediction. Branch prediction predicts, on a statistical basis, the results of each conditional branch (i.e., whether the branch will be “taken” or “not-taken”), and the microprocessor continues fetching instructions and operating the pipeline based on the prediction. For example, if a branch instruction is predicted not taken, then the next instruction fetched into the pipeline is simply the next sequential instruction following the branch instruction. On the other hand, if a branch instruction is predicted taken, then the next instruction fetched into the pipeline is the target instruction (i.e., the instruction to which the branch goes if taken). The instructions fetched based upon such a prediction proceed along the pipeline until the actual result of the condition is determined (typically upon execution of the branch instruction). If the prediction is correct, the speculative execution of the predicted instructions maintains the microprocessor at its highest performance level through full utilization of the pipeline. In the event that the prediction is incorrect, the pipeline is “flushed” to remove all instructions following the branch instruction in the pipeline.
By way of further background, conventional speculative execution techniques include the use of branch target buffers (BTBs). Conventional BTBs are cache-like buffers commonly used in the fetch units of microprocessors. The BTB commonly stores at least three items: (1) an identifier of a previously performed branch instruction as a tag; (2) the target address for the branch (i.e., the address to which the branch points in its predicted taken state); and (3) an indication relating to the branch's actual history, that is, whether or not the branch was taken in past occurrences of the branch. The indication relating to the branch's actual history either directly indicates a prediction, or is used to derive a prediction, of whether the branch is taken. Once a BTB entry is written to include this information for a given branch, subsequent fetches of the same branch are handled using this very information. Specifically, if the branch is predicted taken (based on the branch history), the target address is used as the next address to fetch in the pipeline. The history section of the BTB entry is also updated upon execution of the branch instruction. Specifically, the execution unit determines the actual target address for the branch instruction to determine whether or not the branch is taken. This information updates the history in the BTB entry and, therefore, affects the future prediction for that entry. Note also that the actual target address from the execution unit is also compared to the predicted address; if the two do not match, a misprediction has occurred and the instruction unit is so informed so that the pipeline may be flushed and begin fetching new instructions beginning at the actual address.
While branch prediction techniques are, in general, beneficial in certain instances, mispredictions of branch execution still occur and may be very costly in terms of microprocessor efficiency. For example, as the pipelines of modern superscalar machines get deeper (i.e., hold more instructions at varying stages at once), and as such machines include a greater number of pipelines, a mispredicted branch may heavily penalize performance by requiring a pipeline or pipelines to be emptied and subsequently refilled with instructions from the correct target address. In this instance, numerous cycles are required to reset the pipeline(s) to an operational state and, thus, valuable processor cycle time is lost. Thus, while modern branch target buffer technolog
Anderson Timothy D.
Dutta Simonjit
Shiell Jonathan H.
Coulter Kenneth R.
Donaldson Richard L.
Lake Rebecca Mapstone
Texas Instruments Incorporated
LandOfFree
Microprocessor with circuits, systems and methods for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Microprocessor with circuits, systems and methods for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Microprocessor with circuits, systems and methods for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2537596