Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Scoreboarding – reservation station – or aliasing
Reexamination Certificate
1999-01-27
2002-07-02
Maung, Zarni (Department: 2154)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Scoreboarding, reservation station, or aliasing
C711S137000, C711S204000
Reexamination Certificate
active
06415380
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a data providing unit for processor and a processor having the data providing unit. More particularly, the present invention relates to a technology for improving a process efficiency of the processor by processing a load instruction at a high speed in a data providing unit which provides data to be read according to a load instruction to the processor.
2. Description of the Related Art
The technology for improving a process efficiency of the processor by using a pipeline process has been put to practical use. The “pipeline process” can be defined as such a scheme that a plurality of instructions are executed concurrently in parallel by shifting their processing stages by one cycle (i.e., pipeline pitch) sequentially.
FIG. 1
shows respective stages of standard five-stage pipeline in a RISC (Reduced Instruction Set Computer) type processor. As disclosed in a literature “Computer Architecture” (Hennessy et al.; Morgan Kaufmann Publishers, Inc.), etc., this type pipeline is a pipeline employed in a very basic processor.
In this pipeline process, one arithmetic instruction is divided into five stages and then executed. As shown in
FIG. 1
, these five stages are instruction fetch (IF) stage, instruction decode (ID) stage, execution (EX) stage, memory access (MA) stage, and write back (WB) stage. In the IF stage, an instruction is fetched from an instruction memory. In the ID stage, the instruction is interpreted, while an operand necessary for execution is fetched by accessing a register file. In the EX stage, an arithmetic operation is executed. In this case, when instructions (load instruction, store instruction, etc.) for accessing a data memory are executed, a data address is calculated in the EX stage. In the MA stage, the data memory is accessed and then data are fetched from the data memory by using the address which is calculated in the EX stage. In the WB stage, executed results and data read from the data memory are written back into a register file.
Next, an operation in the pipeline process when the load instruction is to be executed will be explained hereunder. For easy understanding, the operation in the pipeline process will be explained by using an example of a simple scalar processor which can execute only one instruction at a time.
FIGS. 2A and 2B
show behaviors of the pipeline process when instructions are processed successively. As shown in
FIG. 2A
, when the preceding instruction is a standard arithmetic operation instruction (add instruction in FIG.
2
A), it is possible to execute succeeding instructions successively. Arrows in
FIGS. 2A and 2B
indicate bypasses of arithmetic results. On the other hand, as shown in
FIG. 2B
, when the preceding instruction is a load instruction (Load Word (lw) instruction in
FIG. 2B
) to access the data memory, the situation is altered. In
FIG. 2B
, the load instruction is depicted as the Load Word (lw) instruction. The load instruction cannot acquire the data unless the MA stage is terminated. Therefore, the succeeding instruction (add instruction in
FIG. 2B
) cannot acquire the data necessary for operation until its own EX stage is started. In other words, the succeeding instruction (add instruction) must wait execution of the EX stage until execution of the load instruction has been completed. The execution of this load instruction contains two operations, i.e., the data address calculation and the memory access. Therefore, execution of a instruction which executes the process employing the result of the load instruction has a longer period of data dependency than the case where results of other operation are employed. This data dependency generates stall of the pipeline process so as to disturb improvement in processor performances.
Next, an operation in the pipeline process when the load instruction and the load instruction are to be executed successively will be explained hereunder. In this case, the operation will be explained by using an example of an out-of-order type processor in which dynamic rearrangement of the instructions can be attained at the time of execution of the instruction.
FIG. 3
shows an example of instruction sequences in which the store instruction and the load instruction are issued successively. In
FIG. 3
, the store instruction is depicted as Store Word (sw) instruction and the load instruction is depicted as Load Word (lw). In the instruction sequence in
FIG. 3
, assume that the value in a register r
2
which calculates the address of the preceding store (sw) instruction is not determined, while the value in a register r
3
which executes the address of the load (lw) instruction is determined. Assume that the values in registers r
20
, r
21
which are operands of the add instruction are also determined. The sw instruction waits its execution because its operands have not been prepared. The add instruction can start execution to overtake the sw instruction because its operands have been prepared. It seems that the lw instruction can also start execution because its operands have been prepared, nevertheless actually the lw instruction cannot start execution because dependency of the lw instruction upon the sw instruction has not been dissolved. In other words, unless the data address into which data are to be stored by the preceding store instruction can be determined, the succeeding load instruction cannot be executed. This is because, if the data address calculated by the store instruction and the data address calculated by the load instruction coincide with each other, the load instruction must read out the data which the store instruction is trying to save. Therefore, the load instruction cannot be executed to overtake the store instruction which is in its standby even if these instructions employ different registers and their operands are prepared. For this reason, even in the case of the out-of-order type processor, overtaking of the instruction cannot be carried out and thus the stand-by time for the execution of the instruction is increased, so that performances of the processor cannot be improved. This problem in execution stand-by of the load instruction is also applicable for the above scalar processor.
The technology, which can improve a process efficiency of the load instruction by using the correspondence between the store instruction and the load instruction in successive execution, has been disclosed in “Dynamic Speculation and Synchronization of Data Dependence” (“Proceedings of the 24th Annual International Symposium on Computer Architecture”, A. I. Moshovos, et al., 1997). In this technology, the correspondence between the store instruction and the load instruction which depend on particular data stored in the same memory address is held previously, and then such correspondence is checked in execution. If no correspondence between the store instruction and the load instruction is detected by this check, the load instruction can be executed not to wait for execution of the store instruction.
However, in this technology, if the correspondence between the store instruction and the load instruction is detected, i.e., if these instructions access the data stored in the same memory address, the load instruction stalls until execution of the store instruction has been completed, like the conventional scheme. Therefore, this technology has not be able to improve sufficiently an execution efficiency of the load instruction.
As discussed above, there have been following problems in the conventional scheme.
More particularly, first, there has been the problems that a process efficiency of the load instruction is low and also the succeeding load instruction cannot be executed unless the preceding store instruction is executed. Since execution of the load instruction needs two operations such as the address calculation and the memory access, a dependency path between the load instruction and other instructions becomes longer than other instructions.
Second, there has been the problem that, when
Foley & Lardner
Kabushiki Kaisha Toshiba
Lin Wen-Tai
Maung Zarni
LandOfFree
Speculative execution of a load instruction by associating... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speculative execution of a load instruction by associating..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speculative execution of a load instruction by associating... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2902470