Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1999-04-02
2001-08-07
Pan, Daniel H. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S240000, C711S216000, C711S206000
Reexamination Certificate
active
06272624
ABSTRACT:
BACKGROUND OF THE INVENTION
With the goal of making computers faster and more powerful, many techniques are being developed to execute a microprocessor's resources in parallel and as efficiently as possible. For instance, while microprocessors have, in the past, typically executed a single instruction at a time, modem super-scalar processors execute multiple instructions in a single cycle. To efficiently supply enough instructions to execute simultaneously, multiple instructions are fetched in a single fetch cycle. Due to the pipelined nature of such a microprocessor, it is often necessary to fetch instructions before some conditional branch instruction has executed, where that branch instruction may change the control flow such that the fetched instructions are the wrong instructions to execute. This presents a waste in memory access bandwidth, latency, processing and re-access time.
Research has shown that the outcome of conditional branches can be predicted with a high degree of accuracy based on past behavior. With such a prediction, a much more efficient use of memory-access bandwidth and faster execution can be achieved.
For example, Yeh and Patt, “Two-Level Adaptive Branch Prediction”, The 24th ACM/IEEE International Symposium and Workshop on Microarchitecture, (November 1991), pp. 51-61, developed a global history (ghist) branch predictor in 1991 as a generalization of their two-level branch prediction scheme. Based on a pattern of past branch taken
ot taken decisions, independent of instruction address, a prediction of a next decision is made. The ghist scheme maintains a shift register whose bits represent taken
ot-taken results of previous conditional branches in an instruction sequence. The next decision for a particular pattern is indicated by a value indexed by the shift register. The values are obtained by incrementing and decrementing individual counters associated with the pattern based on decisions already made.
Predictions are stored in n-bit saturating counters, where n is typically two, and an array of n-bit saturating counters is indexed by the ghist shift register. Thus, based on the history in the ghist shift register, an indexed counter is used to predict whether a branch is taken or not taken. Typically, a threshold is used to determine the prediction. For example, if a counter's value is above the threshold, a branch may be predicted as taken. If the counter value is below the threshold, the branch is predicted as not taken. If the threshold is one half of the full range of the counter, this may be simplified by using the most significant bit of the counter as the prediction indicator. Alternatively, various forms of hysteresis have been used or proposed. Regardless of the exact method used, the bit-pattern in the shift register has been found to accurately predict which branch is taken.
Yeh and Patt, “Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache”, Proceedings of the 7th ACM International Conference on Supercomputing, July 1993, extended this work to allow multiple branch predictions to occur almost in parallel. The described branch predictor counter array, or pattern history table, is multi-ported, having seven read ports for three predictions. Two multiplexors (one 2:1, and one 4:1) are required after the arrays to choose the correct predictions. The multiplexor controls are serial, that is, the output of the first is a control input for the second. The capacity of the predictor is limited due to the need to have seven read ports.
McFarling, “Combining Branch Predictors”, WRL Technical Note TN-36, Digital Equipment Corporation, June 1993, extends Yeh and Patt's ghist work by hashing the contents of the global history register with the address of the branch instruction to compute the index into the counter table. This scheme is called “gshare” and it improves the accuracy of the predictions.
Dutta and Franklin, “Block-Level Prediction for Wide-Issue Superscalar Processors,” Proc. Ist International Conference on Algorithms and Architectures for Parallel Processing, pp. 143-152, April 1995, and “Control Flow Prediction with Tree-Like Subgraphs for Superscalar Processor,” Proc. 28th International Symposium on Microarchitecture, pp. 253-263, 1995 also recognized the need to perform multiple branch predictions in a cycle. They perform “block prediction” in which they determine which branch in a group of fetched instructions will be taken, and predict the target address using “block history”. There appear to be two limitations to this scheme. First, although the branch predictor must be able to begin a new access every cycle in order to maintain a fully populated pipeline, two array accesses are required in series which may limit the repeat rate of the predictor. Second, the latency of the predictor is at least two full cycles due to the series array structure.
This research presents a problem in designing a very wide issue superscalar processor. In order to achieve high-performance, many branch predictions must be predicted per cycle. However, the state of the branch predictor must be large to be able to accurately predict the branches of complex programs, and the machine must be buildable and operate at a high frequency. The above schemes suggest either series arrays or arrays with a large number of ports.
SUMMARY OF THE INVENTION
In accordance with a preferred embodiment of the invention, a system for predicting branch outcome comprises:
a register which provides a pattern of program control flow, wherein the register is modified based on a summary of control flow activity of a group of instructions fetched in a given slot;
a hash function which generates an index, by hashing the register's value with the group's program counter (PC) where outcomes of branch instructions within the group are to be predicted;
an array of branch predictions, which, when referenced by the generated index, provides naturally aligned branch predictions; and
a shuffler for reordering the provided branch predictions based on a portion of the register, into the same order as instructions.
The preferred embodiment addresses the problems discussed above. For n sequential instructions fetched, n branch predictions are made with one read port on the predictor array. The single read port allows the predictor to have a large amount of state. Two key concepts are group global history (gghist) and storing the branch predictions sequentially.
In the prior art, if a group of fetched instructions has a number of conditional branches in it, the branch predictor must be queried for each branch. Ghist branch predictors work well because the pattern in the shift register identifies the current execution point in the control flow graph of the program. In fact, given a program of sufficient length, predictor accuracy may be roughly proportional to the length of the ghist register.
A problem of the serial nature of making multiple predictions with a global history predictor, where a conditional branch prediction depends upon the last prediction made, thereby creating a critical loop, is overcome by the present invention. In solving this problem, a key observation is that for a group of sequential instructions, all that matters in constructing the global history is whether or not the group contains a predicted taken branch. For example, in a machine that fetches four instructions per cycle, assume the following instructions are fetched:
add
branch
compare
branch
and that the first branch is predicted to be not-taken and the second is predicted to be taken. In a conventional ghist scheme, “01” is shifted onto the ghist register. The rules for conventional ghist are:
for each conditional branch
if (branch is taken)
then shift 1 into ghist register
else shift 0 into ghist register
end
In the present invention, on the other hand, only a 1 is shifted into the history register. The rile for the present invention is:
for each group of fetched instructions
if (there is a taken conditional branch)
then shift 1 into gghist regi
Edmondson John H.
Giacalone Glenn P.
Compaq Computer Corporation
Hamilton Brook Smith & Reynolds P.C.
Pan Daniel H.
LandOfFree
Method and apparatus for predicting multiple conditional... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for predicting multiple conditional..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for predicting multiple conditional... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2500142