Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1998-08-11
2001-06-05
An, Meng-Al T. (Department: 2154)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S220000, C712S234000, C712S235000, C712S236000, C712S237000, C712S217000, C714S767000
Reexamination Certificate
active
06243805
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of microprocessors and, more particularly, a method and circuit for exact branch targeting.
2. Description of the Relevant Art
Pipelined microprocessors achieve high performance by executing an instruction per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term “clock cycle” refers to an interval of time accorded to various stages of the instruction processing pipeline within the microprocessor. Storage devices (e.g., registers) capture values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively.
The term “instruction processing pipeline” is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. The pipeline may be divided into any number of stages at which portions of the instruction processing are performed. The pipeline may include (1) a fetch stage for fetching the instruction, (2) a decode stage for decoding the instruction, (3) an execution stage for executing the instruction, and (4) a write back stage for writing the execution results in the destination identified by the instruction.
Many microprocessors employ a branch prediction mechanism. The branch prediction mechanism indicates a predicted direction (taken or not taken) for a conditional branch instruction, allowing subsequent instruction fetch to continue within the predicted instruction stream indicated by the branch prediction prior to actual resolution of the conditional branch instruction. The predicted instruction stream includes instructions immediately subsequent to the branch instruction in program order if the branch instruction is predicted not taken, or the instructions at the target address of the branch instruction if the branch instruction is predicted taken. Instructions from the predicted instruction stream may be speculatively executed prior to execution and resolution of the branch instruction, and in any case, are placed into the instruction processing pipeline prior to execution of the branch instruction. If the predicted instruction stream is correct, then the number of instructions executed per clock cycle is advantageously increased. However, if the predicted instruction stream is incorrect (i.e., one or more branch instructions are predicted incorrectly), then the instructions from the incorrectly predicted instruction stream are discarded or purged from the instruction processing pipeline and the number of instructions executing per clock cycle is decreased.
Clearly, branch misprediction and the stalling that ensues is contrary to the goal of executing the most instructions per clock cycle in addition to the goal of keeping the pipeline full. There are several alternatives to employing a branch prediction mechanism for handling conditional branch instructions. The easiest alternative is to freeze the pipeline, holding any instructions after the branch until the conditional branch instruction is resolved and the branch destination is known. The attractiveness of this solution lies primarily in its simplicity. However, this solution requires that the stages prior to the execution stage remain empty until the conditional branch instruction is executed. A better and only slightly more complex solution is to always fetch the next instruction in program order as if the conditional branch is not taken. If the branch is taken, however, this alternative requires that the pipeline be flushed and the fetch restarted at the address at which the branch is taken. Another alternative, yet similar approach, is to always presume the branch as taken. Again, problems arise if the branch is not taken.
Some microprocessors employ another alternative to branch prediction mechanisms called delay branching. In delay branching, the compiler places the branch instruction earlier in the instruction stream. The instruction slots between the initial position of the branch instruction and the new position provided by the compiler for the branch instruction are provided with instructions which are independent of the branch instruction, valid, and useful. For example, the conditional branch instruction may be compiled so that four independent, valid, and useful instructions execute after the conditional branch instruction is fetched. Presuming the pipeline to be four stages long, the conditional branch instruction can be evaluated ahead of time so that no prediction occurs. In delayed branching, the branching instruction executes in the cycle it would have been fetched in the original instruction stream.
One problem with delayed branching is that delay branching cannot be implemented in existing x86 code. Delayed branching must use a modified instruction set. However, it appears that delay branching can be implemented in existing x86 instruction set if the microcode program is adequately modified. Another problem associated with delayed branching relates to superscalar microprocessors which have several pipelines. If the branch instruction is moved up to an early position within the instruction stream by the compiler, the branch instruction may not execute in the execute stage at the appropriate time since superscalars can execute multiple instructions simultaneously. This is particularly problematic with respect to loops where the entire loop can be executed in one clock cycle. In other words, superscalar microprocessors may limit the positions at which the branch instruction can be moved. If the branch instruction cannot be moved far enough up into the instruction stream, the pipeline may have to be operated with no OPs to compensate for the limitation. Executing no OPs limits microprocessor operating speed.
SUMMARY OF THE INVENTION
The problems identified above are in large part addressed by a microprocessor having a branch target circuit coupled to a fetch stage, wherein the branch target circuit includes a branch FIFO buffer configured to receive and store results of executing decoded conditional branch calculation instructions. The results identify whether a subsequent conditional branch instruction is taken or not. The fetch stage is configured to fetch and receive instructions from main memory at addresses specified by program counter. When the fetch stage receives a conditional branch instruction, the branch FIFO buffer is accessed to read the results of a previously executed conditional branch calculation instruction. The result identifies whether a target address contained within the conditional branch instruction is dispatched to the program counter (i.e., the conditional branch is taken) or whether the program counter is to be incremented by a fixed integer (i.e., the branch is not taken).
In one embodiment, the microprocessor contains an execution stage which is configured to write the results of executing the decoded conditional branch calculation instructions into the branch FIFO buffer. The execution stage writes the result prior to the fetch stage receiving the conditional branch instruction which depends upon the results stored within the FIFO buffer.
In another embodiment of the present invention, the conditional branch instruction includes a dedicated bit, which is used for reading the branch FIFO buffer. In other words, when the dedicated bit is set, the fetch stage accesses the branch FIFO buffer to read the results of a previously executed conditional branch calculation instruction. If the dedicated bit is not set, the fetch stage does not access the branch FIFO buffer.
In another embodiment of the present invention, the branch target circuit includes a stall circuit configured to receive a signal from the branch FIFO buffer indicating that the branch FIFO buffer is empty. The stall circuit generates a stall signal upon notice that the branch FIFO buffer is empty. This s
Advanced Micro Devices , Inc.
An Meng-Al T.
Conley Rose & Tayon PC
Lin Wen-Tai
Merkel Lawrence J.
LandOfFree
Programming paradigm and microprocessor architecture for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Programming paradigm and microprocessor architecture for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Programming paradigm and microprocessor architecture for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2482731