Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1998-09-21
2001-08-21
Niebling, John F. (Department: 2812)
Electrical computers and digital processing systems: processing
Processing control
Branching
Reexamination Certificate
active
06279106
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to storing branch target addresses in a microprocessor, and more particularly, to methods for caching indirect branch target addresses.
2. Description of the Related Art
Modem microprocessor designers continually attempt to improve the performance of their products. One method to increase the performance of a microprocessor is to increase the clock frequency at which it operates. However, as clock frequencies increase, each functional unit within the microprocessor has less time in which to perform its designated task. Thus, other methods have been employed to further increase performance at a given clock cycle frequency. Parallel execution is one such method. Parallel execution involves executing more than one instruction during each clock cycle. In order to implement parallel execution, many microprocessors use parallel functional units each configured to operate independently and in parallel.
Another method for increasing the performance of a microprocessor at a given clock cycle is out of order execution. Most programs depend upon their instructions executing in a particular order. This order is referred to as “program order”. However, certain instructions within the program may execute out of program order as long as the desired functionality is still achieved. Instructions that may be executed out of order are those that do not “depend” upon the instructions around them. Microprocessors that implement out of order execution are able to determine which instructions do not depend on other instructions and therefore may be executed out of order.
There are several potential draw backs to executing instructions out of order. One drawback in particular is the use of branch instructions. Examples of branch instructions include conditional jump instructions (Jcc), unconditional jump instructions (JMP), return instructions (RET), and subroutine call instructions (CALL). These instructions make out of order execution particularly difficult because the microprocessor may be unable to determine whether or not instructions after the branch instruction should be executed.
Another method used to increase a microprocessor's performance at a given clock cycle is pipelining. Pipelining involves dividing the tasks the microprocessor must complete in order to execute an instruction. The tasks are divided into a number of stages that form an instruction processing “pipeline”. The stages are typically serial in nature. For example the first stage may be to “fetch” an instruction stored in an instruction cache, while the second stage may be to align the instruction and convey it to decode units for decoding or functional units for execution. Pipelines allow long streams of instructions to be executed in fewer clock cycles. The advantages of a pipelined microprocessor stem from its ability to process more than one instruction at a time. For example, a first instruction may be decoded at the same time a second instruction is being fetched from the instruction cache.
However, as with out of order execution, pipelined microprocessors may suffer performance setbacks as the result of branch instructions. Branch instructions may cause the pipeline “stall” because the initial stages of the pipeline are unable to determine which instructions to fetch until after the branch instruction has been executed. For example, a conditional branch instruction is either “taken” or “not taken”. If the branch instruction is not taken, then the instruction immediately following the branch instruction in program order is executed. Alternatively, if the branch instruction is taken, then a different instruction at some offset relative to the branch target address is executed. In pipelined microprocessors, it is difficult for initial or fetch stages of the pipeline to determine whether to fetch the “taken” or “not taken” target instruction until the branch instruction has completed execution. Thus, the microprocessor pipeline may stall or form “bubbles” (i.e., idle clock cycles that propagate through the pipeline) while the initial stages await the results of the branch instruction.
In order to prevent stalls or bubbles from forming in the pipeline, many microprocessors implement a branch prediction scheme. Branch prediction involves predicting whether each branch instruction is taken or not taken before the branch instruction actually completes execution. While a number of different algorithms may be used, most rely upon storing the taken
ot taken history of the particular branch instruction. For example, when a branch instruction is executed and taken, branch prediction hardware will store that information so that the next time the branch instruction is fetched, the pipeline will automatically assume the instruction is taken based upon its previous performance.
While storing branch prediction information reduces the probability of a stall or bubble forming, it has disadvantages also. One disadvantage is the complex hardware required to maintain and update the branch history information. A second disadvantage is the space required to store the branch history information. Die space within microprocessors is a relatively scarce resource, thus using large amounts of space to store branch predictions is disadvantageous. As instruction caches grow in size, more branch instructions may be stored therein. This in turn requires more storage space for branch history prediction information. Furthermore, modern software tends to have a high concentration of branch instruction (e.g., one out of every four instructions). This high concentration, coupled with the inability to know exactly how many branches are stored in the instruction cache, typically results in large amounts of die space being devoted to storing branch history information.
For these reasons, a method for efficiently storing branch prediction and history information is desired. In particular, a method for reducing the amount of storage space needed for branch history and prediction information is particularly desirable.
SUMMARY OF THE INVENTION
The problems described above may in part be solved by a microprocessor and branch prediction unit in accordance with the present invention. In one embodiment, a microprocessor configured to efficiently store branch prediction information may comprise an instruction cache and a branch prediction unit. The branch prediction unit may be coupled to the instruction cache and may comprise: an indirect branch target cache, a direct branch adder, and a branch selector array. The indirect branch target array may be configured to store predicted branch target addresses for indirect branch instructions, while the direct branch adder may be configured to calculate the branch target addresses for direct branch instructions. The branch selector array may be coupled to the indirect branch target cache and may be configured to store a plurality of selector bits. In one embodiment, the selector bits may correspond to particular instruction bytes stored in the instruction cache. The selector bits indicate a source for the predicted next fetch address associated with the particular instruction bytes.
In another embodiment, the branch prediction unit may further comprise a multiplexer configured to select the predicted next fetch address from either the direct branch adder, the indirect branch target cache, or another source. The multiplexer may be configured to make its selection based upon the value of the selector bits that correspond to the current fetch address. The branch prediction unit may also comprise a sequential address adder coupled to the multiplexer. The sequential address adder may be configured to generate a predicted next sequential fetch address by summing a portion of the current fetch address and a constant corresponding to the fetch block or cache line size of the instruction cache. A return stack may also be utilized to store fetch address corresponding to instructions immediately following call instructions.
A method for storing pre
Advanced Micro Devices , Inc.
Conley Rose & Tayon PC
Niebling John F.
Whitmore Stacy
LandOfFree
Method for reducing branch target storage by calculating... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for reducing branch target storage by calculating..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for reducing branch target storage by calculating... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2512965