Electrical computers and digital processing systems: processing – Architecture based instruction processing
Reexamination Certificate
1998-05-18
2001-11-13
Eng, David Y. (Department: 2155)
Electrical computers and digital processing systems: processing
Architecture based instruction processing
Reexamination Certificate
active
06317821
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to microprocessors and other types of digital data processors, and more particularly to digital data processors which utilize pipelined processing techniques.
BACKGROUND OF THE INVENTION
Modern processors are often pipelined, meaning that execution of each instruction is divided into several stages.
FIG. 1
shows a functional block diagram of a conventional pipelined processor
10
. This exemplary pipelined processor includes four stages: a fetch (F) stage
12
, a decode (D) stage
14
, an execute (E) stage
16
, and a writeback (W) stage
18
. Pipelined processors such as processor
10
may be register-based, i.e., other than for load or store instructions, the source(s) and destination(s) of each instruction are registers. The fetch unit
12
retrieves a given instruction from an instruction memory. The decode stage
14
reads the source register(s) of the instruction, and the writeback stage
18
writes to the destination register(s) of the instruction. In the execute stage
16
, the instruction is executed by one of four specialized execution units: a 1-cycle integer (I) unit
20
, an 8-cycle integer/floating point multiplier (M)
22
, a 4-cycle floating point adder (Fadd)
24
, or a 15-cycle integer/floating point divider (Div)
26
. The execution units in this example are fully pipelined, i.e., can accept a new instruction on every clock cycle. These specialized units are used to execute particular types of instructions, and each of the units may have a different latency. An instruction is said to be “dispatched” when it has completed register read in the decode stage
14
and begun execution in the execution stage
16
. In other words, a dispatch takes place when an instruction passes from the decode stage
14
to one of the execution units in execution stage
16
.
A significant problem with conventional pipelined processors such as processor
10
of
FIG. 1
is that the use of a pipeline introduces data hazards which are not present in the absence of a pipeline, because results of previous instructions may not be available to a subsequent instruction. In addition, even for in-order instruction dispatch, if instructions are allowed to be active in different execution units at the same time, the different latencies of the execution units can result in control hazards and out-of-order instruction completion. Data hazards and control hazards generally must be avoided in order to ensure proper operation of a pipelined processor.
A very common type of data hazard which can arise in a pipelined processor is known as a Read After Write (RAW) data hazard.
FIG. 2A
illustrates an exemplary RAW data hazard, showing how the pipelined processor
10
of
FIG. 1
executes add instructions i
1
and i
2
for processor clock cycles
1
through
5
. Instruction i
1
adds the contents of its source registers r
2
and r
3
and writes the result to its destination register r
1
. Instruction i
2
adds the contents of its source registers r
5
and r
1
and writes the result to its destination register r
4
. It can be seen that, unless otherwise prevented, the instruction i
2
in the conventional processor
10
will read register r
1
in clock cycle
3
, before the new value of r
1
is written by instruction i
1
. In a non-pipelined processor, the instructions as shown in
FIG. 2A
would not create a hazard, since instruction i
1
would be completed before the start of instruction i
2
.
FIG. 2B
illustrates a less common data hazard, referred to as a Write After Write (WAW) data hazard, that can arise in a conventional pipelined processor. In this example, the processor executes instructions i
1
and i
2
for processor clock cycles
1
through
11
. Instruction i
1
multiplies the contents of its source registers r
2
and r
3
and writes the result to its destination register r
1
. Instruction i
2
adds the contents of its source registers r
4
and r
5
and writes the result to its destination register r
1
. It can be seen that, unless otherwise prevented, instruction i
2
in the conventional processor will write to register r
1
in clock cycle
5
, before instruction i
1
, and then i
1
will incorrectly overwrite the result of i
2
in register r
1
in clock cycle
11
. This type of hazard could arise if, for example, instruction i
1
were issued specula and i
2
. In the case of in-order instruction completion, instruction i
1
will not affect the outcome, since in-order completion will discard the result of i
1
. However, as described above, the hazard is significant in the presence of out-of-order instruction completion.
FIG. 2C
shows an example of a control hazard which can arise in a conventional pipelined processor. Control hazards generally result from jumps in the instruction stream. For example, when a branch is taken, an instruction address register, which serves as a program counter, is changed to a new value. As a result, there may be instructions that have been already fetched into the pipeline but should not be executed. In the example of
FIG. 2C
, a control hazard arises when instructions i
1
through i
4
are executed for clock cycles
1
through
11
. Instruction i
2
is a branch instruction brz that will branch to label, i.e., to instruction i
4
, if the contents of its source register r
4
have a particular value. In the pipelined processor
10
of
FIG. 1
, it is assumed that the results of the branch instruction i
2
are not effective until i
2
reaches writeback (W) in clock cycle
5
. If the branch is taken, control should jump to instruction i
4
without ever reaching instruction i
3
, but by the time this is known, instruction i
3
is already executing.
A number of techniques have been developed in an attempt to address the problems associated with data and control hazards. One such technique, known as “scoreboarding,” provides dynamic scheduling of instructions, using a central controller known as a scoreboard, so as to allow out-of-order instruction issue. This approach is often associated with the Control Data 6600 computer, and is described in greater detail in D. A. Patterson and J. L. Hennessy, “Computer Architecture: A Quantitative Approach,” Second Edition, Morgan Kaufmann, San Francisco, Calif., pp. 240-251, 1996. A related technique which also utilizes dynamic scheduling to accommodate out-of-order instruction issue is known as Tomasulo's Algorithm, and is described in the above-cited D. A. Patterson and J. L. Hennessy reference at pp. 251-261. Another known technique involves utilizing a reorder buffer, also referred to as a retire buffer. In accordance with this technique, rather than allowing results to be written back to registers immediately after execution, the results are stored in the retire buffer until they can be written back in sequential program order.
Although these and other conventional techniques can resolve pipeline hazard problems, such techniques generally require the addition of substantial complexity to the processor. For example, scoreboarding requires a separate central control unit, Tomasulo's Algorithm requires additional structures such as a broadcast result bus, a register renaming mechanism, and reservation stations, and a retire buffer requires result storage and ordering logic. A need therefore exists for a different and simpler mechanism for avoiding pipeline hazards.
SUMMARY OF THE INVENTION
The invention provides methods and apparatus for avoiding hazards caused by execution unit latency and out-of-order instruction completion in a pipelined processor. The invention allows the processor to ignore the actual latency of execution units, essentially treating them as if they had single-cycle execution. The invention is thus referred to herein as “virtual single-cycle execution” or “impatient execution.” In accordance with the invention, data and control hazards are avoided by placing locks on registers and stalling instructions when necessary as determined by the register lock status. Instructions are dispatched in a program-specified order, but are permi
Batten Dean
D'Arcy Paul Gerard
Glossner C. John
Jinturkar Sanjay
Thilo Jesse
Eng David Y.
Lucent Technologies - Inc.
LandOfFree
Virtual single-cycle execution in pipelined processors does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Virtual single-cycle execution in pipelined processors, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Virtual single-cycle execution in pipelined processors will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2612488