Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Scoreboarding – reservation station – or aliasing
Reexamination Certificate
1998-04-03
2001-07-24
Kim, Kenneth S. (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Scoreboarding, reservation station, or aliasing
C712S216000, C712S218000
Reexamination Certificate
active
06266766
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of use of register files. More particularly, the present invention relates to using additional bits in the register file to handle write-after-write hazards and reduce bypass comparators.
2. Description of Related Art
Register files are arrays in processors that store one or more registers. In processors capable of processing more than one instruction at a time, it is common to associate with each of these registers a bit which indicates whether the data inside each respective register is either: (1) updated and ready to be used; or, (2) being modified or produced and therefore not available. This bit is termed a “scoreboard” bit.
For example, if a scoreboard bit for a particular register is set, then the next instruction which needs to access this register cannot execute until the scoreboard bit for this register has been cleared. To clear this register bit, a preceding operation (i.e., the operation that is generating/modifying the data to be placed/returned to this register) needs to complete execution. Thus, if a program were to (1) execute a LOAD of a first value and place it into a register R4; and (2) execute an ADD of the first value with a second value contained in a register R5; then there is clearly a dependency on the LOAD operation. The use of the scoreboard bit by a circuit to “lock-out” access to a register that is being used is referred to as a “hardware interlock.” The hardware interlock is used instead of placing the extra burden in software.
Thus, in a processor where there exists multiple execution units, and where one of the execution units has an operation that is waiting to be executed that depends on a result from a previous operation, the register that is waiting to receive the result is “locked-out” from being accessed until the register's scoreboard bit is cleared. After the result has been placed into the register and the scoreboard bit has been cleared, the execution unit containing the waiting operation can access the data in the register.
In cases where an operation is waiting for a result to return from an execution unit, time can be saved by not having to wait for the result to be first placed into the register and then read out again by the waiting execution unit. Instead, bypassing is used to send the result to the waiting execution unit at the same time the result is sent to the register—significantly speeding-up operations.
Bypassing is used where a processor contains some collection of data in a register file and also contains a set of execution units, each of which may take a varying amount of time to complete an operation. An execution unit can take a varying amount of time to complete an operation because, for example, the execution unit is a multicycle execution unit or because the processor has a pipelined implementation where no operation finishes immediately.
Without bypassing, an execution unit that is waiting for another operation to finish must wait until that operation is finished and the result sent back to the register file before reading the result out again. The execution unit must also wait until the scoreboard bit for the result is cleared and the result is read out before the instruction is issued. Thus, the time that elapses during the writing of the result into the register file and the reading out of the result again before the execution of the instruction that depends on the result adds additional delay.
FIG. 1
shows a prior art bypass circuit where a set of multiplexors (MUX)
12
,
14
,
22
, and
24
is placed into a set of result return data paths
16
and
26
. Set of result return data paths
16
and
26
returns results from execution units
10
and
20
, respectively, to a register file
30
(no control circuit is shown in
FIG. 1
for simplicity).
FIG. 1
contains a set of register file scoreboard bits
28
along with register file
30
. The output of register file
30
is fed to MUX
12
, MUX
14
, MUX
22
, and MUX
24
. The output of MUX
12
is used as one input to execution unit
10
, while the output of MUX
14
is used as the other input to execution unit
10
. The output of MUX
22
is used as one input to execution unit
20
, while the output of MUX
24
is used as the other input for execution unit
20
.
The output of execution unit
10
is returned on a result return data path
16
to register file
30
. Similarly, the output of execution unit
20
is returned to register file
30
on a result return data path
26
. Note that result return data path
16
and result return data path
26
might also be used by other execution units not shown in the figure. In addition, MUX
12
, MUX
14
, MUX
22
, and MUX
24
receive both the output from execution
10
and the output from execution
20
through the use of result return data path
16
and result return data path
26
, respectively.
Thus, in
FIG. 1
, every input of every execution unit has one three (3) input multiplexor that provides, as input, either the output of the register file or the result that is returning on one of the two result return data paths. As described below, every execution unit may also be able to latch the values that appear on its inputs, to handle situations where all the inputs are not available simultaneously.
For example, if execution unit
10
is an adder which executes in one cycle and the next instruction, which is also an ADD instruction, needs the result, both operations can issue sequentially because the result from the first ADD instruction is written into the register file at the same time that result is bypassed into the adder again so that the subsequent ADD can use it immediately.
The output of each MUX selects the data from one of three inputs depending on which control line is active. The control lines come from the system described in
FIG. 2
, below.
FIG. 2
shows a bypass circuit
40
having a select register file control line (SR
F
)
66
, a select B
1
control line (S
B1
)
68
, and a select B
2
control line (S
B2
)
70
for determining from where an execution unit receives an operand. S
RF
66
, S
B1
68
, and S
B2
70
are sent to one of the MUX's of FIG.
1
. Thus, each of the MUX's in
FIG. 1
, specifically, MUX
12
, MUX
14
, MUX
22
and MUX
24
, receive control signals S
RF
66
, S
B1
68
, and S
B2
70
from a bypass control circuit similar to bypass control circuit
40
. A scoreboard bit line, coming out of register file
30
, in
FIG. 2
provides the value of the scoreboard bit for the particular register being accessed for determining whether to use the value from the register file or a value from one of the result return data paths.
Bypass circuit
40
also contains a first comparator
50
and a second comparator
60
. One of the inputs for both first comparator
50
and second comparator
60
indicates the operand register address of the operand for which the current operation is waiting. For first comparator
50
, the other input is the result return data path
16
register address, which indicates the register file address into which the result contained on result return data path
16
is returned after first execution unit
10
has completed the previous operation. For second comparator
60
, the other input is the result return data path
26
register address, which indicates the register file address into which the result contained on result return data path
26
is returned after second execution unit
20
has completed the other previous operation.
First comparator
50
and second comparator
60
both operate in the same manner, which is to output a logical one if both inputs are equal. For example, if the operand register address is equal to the result return data path
16
register address, then first comparator
50
outputs a logical one.
The output of first comparator
50
is received by a first AND gate
52
. First AND gate
52
also receives the output of a NOT gate
64
. Similarly, the output of second comparator
60
is received by a second AND gate
62
. Second
Blakely , Sokoloff, Taylor & Zafman LLP
Intel Corporation
Kim Kenneth S.
LandOfFree
Method and apparatus for increasing throughput when... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for increasing throughput when..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for increasing throughput when... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2522146