Data processing: software development – installation – and managem – Software program development tool – Translation of code
Reexamination Certificate
1999-04-06
2001-10-23
Powell, Mark R. (Department: 2122)
Data processing: software development, installation, and managem
Software program development tool
Translation of code
C717S152000, C717S152000, C712S240000
Reexamination Certificate
active
06308322
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to optimizing compilers and, more specifically, to the use of optimizing compilers to reduce the degradation of processing speed resulting from indirect branches in computer code.
BACKGROUND OF THE INVENTION
The speed of microprocessors has been increased dramatically over recent years. One reason for this improvement in speed is that microprocessors have become more deeply “pipelined.” Pipelining refers to the division of labor by a microprocessor that allows it to operate much like an assembly line. For example, the popular Pentium® processor from Intel® divides its workload into five stages. As shown in
FIG. 1
, the Pentium® first performs a prefetch of an instruction in memory. While that first instruction, A, is passed to the first stage of decoding, the microprocessor prefetches the next instruction, B. Then, while instruction A is in the second decoding stage and instruction B is in the first decoding stage, the processor prefetches instruction C. This continues until all five stages of the processor are loaded, at which time the processor is essentially executing five different instructions at once. Obviously, pipelining instructions in this manner provides great benefit to overall system speed.
One factor that can severely hamper the performance of deeply pipelined processors, however, is the presence of branches in the computer code being executed by the processor. Generally, there are two types of branches: direct and indirect. A direct branch (such as an if-then-else statement) conditions the flow of execution control in a program on the value of a particular variable. Depending on the value of the variable, the execution flow of control will fall through to the next instruction in the sequence stored in memory, or it will “take the branch.” If the branch is “Taken,” the execution flow of control will jump to an instruction at an out-of-sequence address.
As will be appreciated, direct branches cause problems for deeply pipelined processors because instructions are not always executed in the order in which they are stored in memory. For example, with reference to
FIG. 1
, assume that instruction A is a direct branch instruction and that, depending on the value of some variable, execution flow of control will either fall through to instruction B, or, if the branch is taken, jump to instruction G. The processor is not able to determine until late in the pipelining of instruction A (i.e., execution and write back of instruction A) whether the branch to instruction G will be taken. By that time, as shown, instructions B-F are already in the pipeline. Therefore, all of the information in the pipeline must be “flushed,” and restarted with the prefetch of instruction G. Such pipeline flushes significantly degrade processor performance. Alternatively, some processors are designed to “stall” during execution of direct branch instruction A until it can be determined whether the branch is taken. During a stall, no further instructions enter the pipeline, which can also have significant negative effect on processor speed.
Programmers and hardware engineers have attempted to address the problems caused by direct branches by devising direct branch “prediction” schemes. These schemes are sometimes accomplished by a compiler. A compiler is a computer program that reads a program written in one language (the source language) and translates it into an intermediate code, which it then optimizes and assembles into an object code. The object code is then linked by a linker to create an executable object code that is readable by a computer. Source code is generally written in languages that are humanly readable, such as FORTRAN, C, and PERL. Object code is generally comprised of assembly language or machine language for a target machine, such as an Intel microprocessor-based computer.
Modem compilers are designed to optimize source code as it is translated into object code. One method of optimization, is through direct branch prediction, whereby the optimizing compiler attempts to predict whether each branch in the computer code is Taken Or Not Taken. Branch prediction in the compiler is accomplished using one or more heuristics, which can be either profile-based or rule-based.
FIG. 2A
illustrates a prior art profile-based branch prediction method. Source code
10
is first compiled
20
. During this compilation
20
, in addition to translating the source code
10
into an intermediate code, the compiler “instruments” the intermediate code to collect profile data on all of the direct branches in the code. “Instrumenting” refers to the practice of adding code to trace the performance of direct branches during execution. The intermediate code is then assembled into an object code and linked
30
to create an instrumented executable object code
40
. The instrumented executable object code
40
is then executed
50
using a representative workload
60
. During execution, the performance of the direct branches in the code is traced and analyzed
70
. That profile information is then fed back to the compiler, which predicts whether each direct branch in the code is Taken Or Not Taken and inserts those predictions into the code. Once the code is again linked
30
, it results in a direct-branch optimized executable object code
80
.
Alternatively, as shown in
FIG. 2B
, the direct branches can be predicted using rule-based heuristic(s). Here, the source code
90
is first compiled using 100 rule-based direct branch heuristic(s). A rule-based heuristic is a static rule or assumption. For example, a simple rule-based heuristic in this context is that branches are always Taken. A variety of other rule-based heuristics can be employed alone or in combination, as explained in U.S. Pat. No. 5,655,122, issued to
3
Youfeng Wu on Aug. 5, 1997, which is hereby incorporated by reference. After the source code is compiled
100
, it is linked
120
to create a direct-branch optimized executable object code
130
.
It will be appreciated that the correct prediction of whether a direct branch is Taken Or Not Taken can greatly increase the speed of deeply pipelined processors. In the example above with respect to
FIG. 1
, if it is correctly predicted that the branch from Instruction A to Instruction G will take place, the processor will begin fetching Instruction G directly after Instruction A, thereby avoiding a processor flush or stall. Of course, if the prediction is incorrect, processor flushes are still likely.
In addition, even when a branch is correctly predicted Taken, fetching at the branch target address cannot begin immediately because the branch target address must be calculated. Branch Taken/Not Taken predictions are typically inserted as part of the direct branch itself—not ahead of the direct branch. Because a branch target address is, by definition, not the next sequential address in memory, the processor must add or subtract to the current program counter to calculate the branch address when the branch is predicted Taken. This causes the processor pipeline to stall during calculation even upon a correct prediction of a direct branch Taken.
Considerably less attention has been paid to processor stalls or flushes caused by indirect branches. Indirect branches differ from direct branches in that they are always “Taken.” A typical indirect branch in a source language such as C reads as follows:
Source Code:
Switch (x)
[
case A:
<code for target A>
case B:
<code for target B>
case C:
<code for target C>
]
Through this indirect branch, execution flow of control is switched according to the value of x to one of the target addresses A, B, or C. The indirect branch is always “Taken” in the sense that execution flow of control will always switch to one of the target addresses A, B, or C—none of which are necessarily the next target address stored in memory. Therefore, the direct branch hinting mechanisms of the prior art, which predict only whether a branch is Taken, are inapplicable to indirect branches. Indirect bra
Holler Anne Marie
Serocki Steven
Dam Than Q.
Hewlett--Packard Company
Powell Mark R.
LandOfFree
Method and apparatus for reduction of indirect branch... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for reduction of indirect branch..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for reduction of indirect branch... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2617629