Data processing: software development – installation – and managem – Software program development tool – Translation of code
Reexamination Certificate
1998-09-14
2001-07-10
Powell, Mark R. (Department: 2122)
Data processing: software development, installation, and managem
Software program development tool
Translation of code
Reexamination Certificate
active
06260189
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to microprocessors and other types of digital data processors, and more particularly to digital data processors which utilize pipelined processing techniques.
BACKGROUND OF THE INVENTION
Modem processors are often pipelined, meaning that execution of each instruction is divided into several stages.
FIG. 1
shows a functional block diagram of a conventional pipelined processor
10
. This exemplary pipelined processor includes four stages: a fetch (F) stage
12
, a decode (D) stage
14
, an execute (E) stage
16
, and a writeback (W) stage
18
. Pipelined processors such as processor
10
may be register-based, i.e., other than for load or store instructions, the source(s) and destination(s) of each instruction are registers. The fetch unit
12
retrieves a given instruction from an instruction memory. The decode stage
14
reads the source register(s) of the instruction, and the writeback stage
18
writes to the destination register(s) of the instruction. In the execute stage
16
, the instruction is executed by one of four specialized execution units, for each of which the number of cycles is denoted by the number of boxes: a 1-cycle integer (I) unit
20
, an 8-cycle integer/floating point multiplier (M)
22
, a 4-cycle floating point adder (Fadd)
24
, or a 15-cycle integer/floating point divider (Div)
26
. The execution units in this example are fully pipelined, i.e., can accept a new instruction on every clock cycle. These specialized units are used to execute particular types of instructions, and each of the units may have a different latency. An instruction is said to be “dispatched” when it has completed register read in the decode stage
14
and begun execution in the execution stage
16
. In other words, a dispatch takes place when an instruction passes from the decode stage
14
to one of the execution units in execution stage
16
.
A significant problem with conventional pipelined processors such as processor
10
of
FIG. 1
is that the use of a pipeline introduces data hazards which are not present in the absence of a pipeline, because results of previous instructions may not be available to a subsequent instruction. This is often attributable to the different latencies of the various execution units in the processor. Types of data hazards which can arise in conventional pipelined processors include, for example, Read After Write (RAW) data hazards, Write After Write (WAW) data hazards, and Write After Read (WAR) data hazards.
FIG. 2
illustrates an exemplary RAW data hazard, showing how the pipelined processor
10
of
FIG. 1
executes sub instructions I
1
and I
2
for processor clock cycles
1
through
5
. Instruction I
1
subtracts the contents of its source registers r
2
and r
3
and writes the result to its destination register r
1
. Instruction I
2
subtracts the contents of its source registers r
5
and r
1
, and writes the result to its destination register r
4
. It can be seen that, unless otherwise prevented, the instruction I
2
in the conventional processor
10
will read register r
1
in clock cycle
3
, before the new value of r
1
is written by instruction I
1
, resulting in a RAW data hazard. In a non-pipelined processor, the instructions as shown in
FIG. 2
would not create a hazard, since instruction I
1
would be completed before the start of instruction I
2
.
FIG. 3
illustrates an exemplary WAW data hazard, arising when the processor executes instructions I
1
and I
2
for processor clock cycles
1
through
11
. Instruction I
1
multiplies the contents of its source registers r
2
and r
3
and writes the result to its destination register r
4
. Instruction I
2
subtracts the contents of its source registers r
6
and r
8
and writes the result to destination register r
4
. It can be seen that, unless otherwise prevented, instruction I
2
in the conventional pipelined processor will write to register r
4
in clock cycle
5
, before instruction I
1
, and then I
1
will incorrectly overwrite the result of I
2
in register r
4
in clock cycle
11
. This type of hazard could arise if, for example, instruction I
1
were issued speculatively by a compiler for a branch which was statically mispredicted between I
1
and I
2
. In the case of in-order instruction completion, instruction I
1
will not affect the outcome, since in-order completion will discard the result of I
1
. However, as described above, the hazard is significant in the presence of out-of-order instruction completion.
A WAR hazard occurs, e.g., when register reads are allowed to be performed during later stages and register writes are allowed to be performed in the earlier stages in the pipeline. The exemplary four-stage pipelined processor
10
of
FIG. 1
is thus incapable of producing a WAR hazard, but such hazards can arise in other pipelined processors.
FIG. 4
illustrates an exemplary WAR data hazard arising in a five-stage pipelined processor including stages A, W
1
, B, R
1
and C. In this processor, stages A, B and C are generic pipeline stages, stage W
1
writes an intermediate result to a destination register, and stage R
1
reads the source registers for processing in stage C. The processor executes instructions I
1
and I
2
for processor clock cycles
1
through
6
. Instruction I
1
applies an operation op
1
to the contents of its source registers r
2
and r
3
and writes the result to its destination register r
1
. Instruction I
2
applies an operation op
2
to the contents of its source registers r
4
and r
5
and writes the result to destination register r
3
. Note that an intermediate result is written to destination register r
3
in the W
1
stage of I
2
before the intended value of r
3
can be read in the R
1
stage of I
1
, thereby introducing a WAR hazard.
Predicated instructions also can present a problem for pipelined processors. For example, the processor hardware generally must check the validity of the predicate used for each instruction before it can determine whether or not the instruction should be executed.
FIG. 5
shows an example of a predication hazard which can arise in the conventional four-stage pipelined processor
10
of FIG.
1
. The processor executes instructions I
1
and I
2
for processor clock cycles
1
through
5
. The instruction I
1
is a setpred operation which sets the predicate p
1
to a value of 0. It will be assumed that the predicate p
1
is true, i.e., has a value of 1, before execution of this instruction. The instruction I
2
is a predicated instruction which, if the predicate p
1
is true, performs an add operation using source registers r
2
and r
3
and destination register r
1
. Note that I
2
will be executed in this example even though p
1
should be false at the point that I
2
dispatches, thereby introducing a predication hazard. W
p
and W
d
in
FIG. 5
represent writeback stages to predication and data registers, respectively. It should be noted that predication hazards, like data hazards, can also be grouped into RAW, WAW or WAR hazards.
When using pipelined processors having multiple execution units with different latencies, it is generally necessary to control the dispatch of instructions so as to ensure proper program execution, i.e., so as to avoid the above-described data and predication hazards. A conventional method, known as pipeline interlock, determines the latency of each instruction and stalls the dispatch of subsequent instructions until the latencies are resolved. However, this method often leads to performance degradation, since consecutive instructions are not guaranteed to have interdependence, and thus need not always be stalled. In addition, this method and other conventional approaches can require unduly complex bypass checking hardware or register renaming hardware.
SUMMARY OF THE INVENTION
The invention provides techniques for improving the performance of pipelined processors by, for example, eliminating unnecessary stalling of instructions. These techniques are referred to herein as compiler controlled dynamic dispatch,
Batten Dean
D'Arcy Paul Gerard
Glossner C. John
Jinturkar Sanjay
Thilo Jesse
Ingberg Todd
Lucent Technologies - Inc.
Powell Mark R.
Ryan & Mason & Lewis, LLP
LandOfFree
Compiler-controlled dynamic instruction dispatch in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Compiler-controlled dynamic instruction dispatch in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Compiler-controlled dynamic instruction dispatch in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2538389