Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1998-03-19
2002-04-16
Niebling, John F. (Department: 2812)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S233000
Reexamination Certificate
active
06374349
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to computer system central processors and more particularly to predicting outcomes of conditional branch instructions.
2. Description of the Background Art
Computer designers have developed a number of techniques to improve the performance of various computer architectures. These techniques include forms of memory caching and hardware parallelism, including pipelining.
Pipelined processors decompose the interpretation and execution of instructions into separate operations that can be performed in parallel, or simultaneously. Each processor stage of a pipelined processor can, ideally, complete one operation from an instruction during each machine cycle and pass the instruction on to the next stage. In theory, the effective speed of a P-stage pipelined processor is thus P times the speed of a non-pipelined equivalent, since pipelined processors need not wait until one instruction is completely finished before execution of the next instruction can begin.
Various practical limitations on pipeline performance can prevent a pipelined processor from achieving this theoretical improvement in performance. One of the most important of these limitations occurs when the sequence of instructions to be executed is not known in advance. In particular, the instruction to be executed next after a conditional branch instruction may not be known for certain until after the conditional branch is executed. In this case, the pipeline will have to wait, and performance suffers.
This application makes use of the following convention: conditional branch instructions test a condition specified by the instruction. If the condition is true, then the branch is “taken” (T); that is, instruction execution begins at the new address specified by the instruction. If the condition is false, the branch is “not-taken” (N) and instruction execution continues with the instruction sequentially following the branch instruction. Since most program code contains a large number of such branches, their impact is very significant. Avoiding branch condition delay penalties is critical to improving pipelined processor performance.
Branch prediction is the anticipatory designation of the branch condition. By predicting the direction of the conditional branch, the processor can, while waiting for branch condition resolution, begin or prepare to begin execution of the next instructions in that path. In other words, branch prediction mechanisms guide the pre-fetching or the conditional issuance of instructions in a particular path in an attempt to keep the pipeline full and free from stalls. Branch target prediction, i.e. prediction of specific address offsets or specific instructions to be executed, is not addressed by this invention.
Accurate prediction of branch instructions is vital to the efficient use of pipelines. Mispredicting a branch results in discarding much speculative work and delays execution of a program. If instructions in a wrong path have been fetched and decoded, those instructions must be flushed from the pipeline. The pipeline must then be loaded with new instructions corresponding to the correct path before the execution unit can resume processing. Conversely, since the correct instructions were not predicted and started early, an opportunity to advance is missed. Thus, a poor branch prediction scheme can have severe penalties that neutralize the potential parallelism advantages of a long processor pipeline.
Branch prediction techniques are typically categorized as static or dynamic. Static techniques make the same guess regarding branch direction each time a particular branch is encountered. One static branch prediction method simply assumes that all encountered branches follow a fixed assignment, i.e. they are either always “taken” or always “not-taken.” The validity of this assumption can vary greatly with the type of program being executed. For example, many branches are programmed merely for management of potential but rare error conditions, so for such branches it would usually be correct to predict that all branches are “not-taken.” Another static method is to use only the direction of the branches to make a prediction. The branch is predicted to be “taken” if the branch is backward, i.e. the target address is earlier in the program listing than the branch instruction; otherwise the branch is predicted to be “not-taken.” This strategy detects loops in a program and works particularly well when loops are iterated many times, as in scientific programs with equation evaluation loops. Fairly high prediction accuracy is possible with static predictors for loop control branches, but the exit from the loop is incorrectly predicted by this strategy. Yet another static method is to use information from compilation and pre-execution of the program as a profile to guide branch prediction. Ideally, the compiler can assign a branch prediction to every branch in the program, but there are drawbacks to this approach, i.e. pre-execution takes time, and it is not widely used. In the applications where static predictors work well, the outcome of any one branch tends to be independent of the outcomes of other branches.
There are many workloads where control transfers are intensive and thus the relation between branches is not as simple as the situations described above. The outcomes of branch decisions for such applications are usually neither constant nor looping, but are strongly affected by their own past histories and by the outcomes of preceding branches. Static branch prediction methods are therefore generally not adequate for accurately predicting actual program behaviors, and in some cases can actually reduce the branch prediction accuracy below that achievable by mere chance.
Dynamic branch prediction schemes differ from static schemes in that they base their predictions on the actual run-time behaviors of program branches. The execution sequence that a program follows can vary in ways that cannot be predicted by static algorithms. Different input data during different program runs can cause differences in execution sequences that neither optimizing compilers nor static mechanisms can successfully predict. A branch might also execute consistently one way in one part of a given program run, but the other way in another part of the run, so only a branch predictor that adapts to these changes during execution can make accurate predictions. Although branch outcomes are variable, they are usually not the result of random activities; most of the time they are correlated with past branch behavior. By keeping track of the history of branch outcomes, it is possible to anticipate with a high degree of certainty which direction future branches will take, and therefore to optimize program execution. Dynamic branch predictors are popular because they can be implemented entirely in hardware and can therefore accurately predict branches without changes to the processor instruction set or to compiled programs. All of the various dynamic branch prediction methods that have been proposed use the history of previous branches to predict how a current branch will behave.
Bimodal Predictor
Most conditional branches behave in a bimodally biased manner; they are either “taken” most of the time, or “not-taken” most of the time. The assumption that the most recent branch directions represent the probable next branch direction is usually valid, so the past behavior of the branch can provide some predictability about the future behavior of the branch. In U.S. Pat. No. 4,370,711 by Smith, an array of 2-bit saturating up/down counters is proposed to store information about the recent history of each branch in a program.
FIG. 1
is a block diagram of a typical bimodal predictor. Each counter
10
in the branch counter array
12
is addressed by the low order bits of the address of the branch instruction on line
14
in the program counter; building a full array addressed directly by all the program counter bits would be uneconomical.
FIG. 2
s
Carr & Ferrell LLP
Hite Eppa
Niebling John F.
Whitmore Stacey
LandOfFree
Branch predictor with serially connected predictor stages... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Branch predictor with serially connected predictor stages..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Branch predictor with serially connected predictor stages... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2874891