Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Reducing an impact of a stall or pipeline bubble
Reexamination Certificate
2000-01-27
2003-11-04
Chan, Eddie (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Reducing an impact of a stall or pipeline bubble
C712S218000
Reexamination Certificate
active
06643767
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is related to a processor and in particular to the instruction scheduling of a processor.
2. Description of the Related Art
One of the factors preventing designers of processors from improving performance is the interdependencies between instructions. Dependences disturbing processor performance include the control dependence, the name dependence, the data dependence. There are many studies relating to the control dependence to attempt to remove the dependency by the branch prediction technique and the speculative execution. The name dependence is caused by hardware resource shortage, i.e., the shortage of available registers. This dependence can be eliminated using register renaming (R. M. Keller, Look-Ahead Processors, ACM Computing Surveys, vol.7, no.4, pp.177-195, 1975). On the other hand, however, the data dependence can not be removed by such techniques, as it is called true dependence. Hence, the data dependence is a serious obstacle to improvement of the instruction level parallelism.
An example will be explained with the instruction sequence as illustrated in FIG.
1
. In order to make it easy to understand the discussion, each operation of f
1
to f
4
has a single source operand. It is assumed that the execution latency is 1 except for the instruction I
1
. The instruction I
1
is assumed to be a load instruction, which is executed to occur cache miss, resulting in an execution latency of 4. The instruction sequence as illustrated in
FIG. 1
involves two data dependences. One data dependence occurs between the instruction I
1
and the instruction I
3
while the other occurs between the instruction I
3
and the instruction I
4
. In the case that there is a data dependence, the subsequent instruction can not be executed before completion of the execution of the previous instruction. In this case, the instruction I
3
can not be executed unless the execution of the instruction I
1
is completed while the instruction I
4
can not be executed unless the execution of the instruction I
3
is completed. Accordingly, in the conventional technique, a subsequent instruction having no data dependence are executed in advance of a preceding instruction which is stalled because of a data dependence. This treatment is called the dynamic instruction scheduling which is implemented, for example, by means of reservation station (R. M. Tomasulo, An Effect Algorithm for Exploiting Multiple Arithmetic Units, IBM Journal, vol.11, pp.25-33, 1967).
FIG. 2
shows one entry in an instruction window designed with a register uptake unit (G. S. Sohi, Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers, IEEE Trans. on Computer, vol.39, no.3, pp.349-359, 1990). Each entry of the instruction window is composed of two source operand fields
100
and
110
, a destination field
120
, a dispatch field
130
, a functional unit field
140
, an execution bit
150
and a program counter field
180
. If the source operand is not yet available, a read bit
101
(
11
) of the source operand fields is reset in order to indicate that the source operand is not available. The tag designating the source operand is also set. When the source operand becomes available, the value of the source operand is transferred to the content field
103
(
113
) of the source operand field followed by setting the ready bit
101
(
111
). On the other hand, a destination register number is stored in the register field
121
of the destination field
120
and the result of execution of the instruction is stored in the content field
122
. The dispatched bit
130
is provided in order to indicate whether or not the instruction has been dispatched to the functional unit as designated by the functional unit field
140
. The executed bit
150
is set when the execution of the instruction is completed. If the executed bit
150
is set, it is possible to dispatch a succeeding instruction(s) having the data dependence upon this instruction. Finally, the program counter field
180
is used to recover the status of the processor after the prediction fails and to implement precise exception.
The registration of instructions into the register update unit is performed in the sequential order of the instructions as in the target program. When the execution of a preceding instruction is completed, the result of the execution
122
and the destination register number
121
are broadcasted. The destination register number
121
is monitored for a succeeding instruction in order to obtain the result of the execution as the source operand
103
(
113
) if the source operand tag
102
(
112
) matches with the destination register number
121
. It is possible to execute the succeeding instruction when all the source operands are obtained. In this case, the succeeding instruction can be executed even if execution(s) between the preceding instruction and the succeeding instruction is(are) not yet completed.
FIG. 3
is a block diagram showing the prior art processor. As illustrated in the same figure, the prior art processor is composed of an instruction cache
200
, an instruction decoder
260
, a register file
210
, an instruction window
220
, functional units
240
to
243
and a data cache
250
. The processor serves to decode an instruction as fetched from the instruction cache
200
by means of the instruction decoder
260
and to register the instruction to the instruction window
220
. The source operand is read from the register file
210
. In the case that the source operand can not be obtained from the register file
210
, it is transferred from the functional units
240
to
243
after completion of the execution of the preceding instruction. The instruction of which the source operand(s) as required becomes available is dispatched to the functional units
240
to
243
. The result of the execution is written into the register file
210
through the instruction window
220
.
FIG. 4
is a schematic diagram showing an example of the instruction scheduling in the case that the instruction sequence as illustrated in
FIG. 1
is executed. In order to make it easy to understand the discussion, the following condition is assumed. Both the total number of instructions as fetched and the total number of instructions as dispatched are 1. Description about committing instructions is dispensed with. It is assumed that the destination register number with the tag information is equal to the architecture register number of the processor. Also, it is assumed that the register r
1
and the register r
2
have been available. The instruction scheduling will be explained on the above assumption. In the first cycle as illustrated in FIG.
4
(A), the instruction I
1
is issued. The source operand tag r
1
and the destination register tag r
11
are then saved in the corresponding fields. The source operand in the register r
1
is available so that the ready bit (r) is set. Furthermore, the instruction I
1
is dispatched so that the dispatched bit (d) is set. In the next cycle as illustrated in FIG.
4
(B), the instruction I
2
is fetched and dispatched. In the next cycle as illustrated in FIG.
4
(C), the instruction I
3
is issued. Since the executed bit of the instruction I
1
which generates the source operand r
11
is not yet set, it is impossible to dispatch the instruction I
3
. In the same cycle as illustrated in FIG.
4
(C), the instruction I
2
is executed in order to write back the result of the execution into the register r
12
followed by setting the executed bit (e). Since the destination register tag r
12
of the instruction I
2
does not match the source operand tag r
11
of the instruction I
3
, the instruction I
3
is not dispatched at this time. The arrows as illustrated in FIG.
4
(C) with dotted lines is used to indicate that the source operand tag of the instruction I
3
does not match the destination register tag of the instruction I
2
. In the next cycle as illustrated in FIG.
4
(D), the instruction I
4
is issu
Chan Eddie
Huisman David J.
Kabushiki Kaisha Toshiba
Oblon & Spivak, McClelland, Maier & Neustadt P.C.
LandOfFree
Instruction scheduling system of a processor does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Instruction scheduling system of a processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Instruction scheduling system of a processor will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3157431