Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1998-07-06
2001-07-03
Booth, Richard (Department: 2812)
Electrical computers and digital processing systems: processing
Processing control
Branching
Reexamination Certificate
active
06256728
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is related to the field of processors and, more particularly, to branch prediction and fetch mechanisms within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors which employ wider issue rates. A “wide issue” superscalar processor is capable of dispatching (or issuing) a larger maximum number of instructions per clock cycle than a “narrow issue” superscalar processor is capable of dispatching. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
In order to support wide issue rates, it is desirable for the superscalar processor to be capable of fetching a large number of instructions per clock cycle (on the average). For brevity, a processor capable of fetching a large number of instructions per clock cycle (on the average) will be referred to herein as having a “high fetch bandwidth”. If the superscalar processor is unable to achieve a high fetch bandwidth, then the processor may be unable to take advantage of the wide issue hardware due to a lack of instructions being available for issue.
Several factors may impact the ability of a particular processor to achieve a high fetch bandwidth. For example, many code sequences have a high frequency of branch instructions, which may redirect the fetching of subsequent instructions within that code sequence to a branch target address specified by the branch instruction. Accordingly, the processor may identify the branch target address after fetching the branch instruction. Subsequently, the next instructions within the code sequence may be fetched using the branch target address. Processors attempt to minimize the impact of branch instructions on the fetch bandwidth by employing highly accurate branch prediction mechanisms and by generating the subsequent fetch address (either branch target or sequential) as rapidly as possible.
As used herein, a branch instruction is an instruction which specifies the address of the next instructions to be fetched. The address may be the sequential address identifying the instruction immediately subsequent to the branch instruction within memory, or a branch target address identifying a different instruction stored elsewhere in memory. Unconditional branch instructions always select the branch target address, while conditional branch instructions select either the sequential address or the branch target address based upon a condition specified by the branch instruction. For example, the processor may include a set of condition codes which indicate the results of executing previous instructions, and the branch instruction may test one or more of the condition codes to determine if the branch selects the sequential address or the target address. A branch instruction is referred to as taken if the branch target address is selected via execution of the branch instruction, and not taken if the sequential address is selected. Similarly, if a conditional branch instruction is predicted via a branch prediction mechanism, the branch instruction is referred to as predicted taken if the branch target address is predicted to be selected upon execution of the branch instruction and is referred to as predicted not taken if the sequential address is predicted to be selected upon execution of the branch instruction.
Unfortunately, even if highly accurate branch prediction mechanisms are employed, fetch bandwidth may still suffer. Typically, a plurality of instructions are fetched by the processor, and a first branch instruction within the plurality of instructions is detected. Instructions subsequent to the first branch instruction are discarded if the branch instruction is predicted taken, and the branch target address is fetched. Accordingly, the number of instructions fetched during the clock cycle in which a branch instruction is fetched and predicted taken is limited to the number of instructions prior to and including the first branch instruction within the plurality of instructions being fetched. Since branch instructions are frequent in many code sequences, this limitation may be significant. Performance of the processor may be decreased if the limitation to the fetch bandwidth leads to a lack of instructions being available for dispatch. A method for increasing the achievable fetch bandwidth in the presence of predicted taken branch instructions is therefore desired.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a processor in accordance with the present invention. The processor is configured to detect a branch instruction having a forward branch target address within a predetermined range of the branch fetch address of the branch instruction. If the branch instruction is predicted taken, instead of canceling subsequent instructions and fetching the branch target address, the processor allows sequential fetching to continue and selectively cancels the sequential instructions which are not part of the predicted instruction sequence (i.e. the instructions between the predicted taken branch instruction and the target instruction identified by the forward branch target address). Advantageously, instructions within the predicted instruction sequence which may already have been fetched prior to predicting the branch instruction taken may be retained within the pipeline of the processor, and yet subsequent instructions may be fetched. Higher fetch bandwidth may thereby be achieved, and hence more instructions may be available in wider superscalar processors.
Broadly speaking, the present invention contemplates a method for fetching instructions in a processor. A plurality of instructions are fetched. A first branch instruction is detected within the plurality of instructions. The first branch instruction has a forward branch target address. The first branch instruction is predicted. Instructions within the plurality of instructions which are between the first branch instruction and a subsequent instruction within the plurality of instructions identified by the forward branch target address are cancelled. The canceling is performed responsive to selecting a taken prediction for the first branch instruction and the forward branch target address being within a predetermined range of a first branch fetch address corresponding to the first branch instruction. Additionally, the subsequent instruction is retained within the plurality of instructions even if the predicting selects the taken prediction responsive to the forward branch target address being within the predetermined range.
The present invention further contemplates a processor comprising a branch scanner configured to identify a first branch instruction within a plurality of instructions, a branch history table, and a forward collapse unit. Coupled to the branch scanner, the branch history table is configured to select a first branch prediction from a plurality of branch predictions stored therein responsive to the first branch instruction identified by the branch scanner. Coupled to the branch scanner and the branch history table, the forward collapse unit is configured to indicate: (i) which instructions within the plurality of instructions and subsequent to the first branch instruction to cancel, and (ii) which instructions within the plurality of instructions and subsequent to the first branch instruction to retain. The forward collap
Johnson William M.
Witt David B.
Advanced Micro Devices , Inc.
Booth Richard
Conley Rose & Tayon PC
Merkel Lawrence J.
Whitmore S.
LandOfFree
Processor configured to selectively cancel instructions from... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Processor configured to selectively cancel instructions from..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Processor configured to selectively cancel instructions from... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2436024