Electrical computers and digital processing systems: processing – Instruction issuing
Reexamination Certificate
1997-11-14
2001-07-10
Eng, David Y. (Department: 2155)
Electrical computers and digital processing systems: processing
Instruction issuing
Reexamination Certificate
active
06260135
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a parallel processing unit such as a microprocessor employing a data forwarding technique, and an instruction issuing system for the parallel processing unit.
2. Description of the Prior Art
Recent microprocessors try to improve processing efficiency by increasing operation frequencies, which is realized by increasing the number of stages in each pipeline and by issuing instructions to the pipelines at increased pitches. Increasing the instruction issuing pitches, however, elongates a latency time between the issuance of an instruction and the time when a resultant data of the instruction is ready for use for the next instruction. This results in deteriorating processing efficiency when a given instruction is dependent on a resultant data of the preceding instruction.
To shorten such a latency time and improve processing efficiency, a data forwarding technique is used. This technique writes the resultant data of a given instruction into a data holder, and at the same time, transfers the resultant data to an instruction issuer that issues instructions to be processed next, to save the time for writing and reading data to and from the data holder.
This technique also employs a data state holder that holds the data dependence of a given instruction. A typical example of the data state holder is a scoreboard. The scoreboard is a register to store an address of the data holder at which presently processed data is going to be stored. If the address of data required by a given instruction agrees with an address stored in the scoreboard, the given instruction is dependent on presently processed data, and therefore, the given instruction will not be issued until the data in question is completely processed and ready for use.
High-performance microprocessors employ a parallel processing technique such as a superscalar architecture to simultaneously issue and execute a plurality of instructions, to improve IPC (instructions per clock). They also employ, instead of an in-order instruction issuing technique that issues a stream of instructions in order, an out-of-order instruction issuing technique that issues a stream of instructions out of order if there is no data dependence among the instructions, to improve processing efficiency.
If a first instruction is queued due to data dependence, the in-order instruction issuing technique also queues a second instruction that follows the first instruction. On the other hand, the out-of-order instruction issuing technique issues the second instruction before the first instruction, if the second instruction has no data dependence.
Since the out-of-order instruction issuing technique issues instructions without regard to the order of the instructions, a hind instruction may be processed before a fore instruction. This may happen even in the in-order instruction issuing technique when processors having different processing periods are used. For example, it will occur when an adder having a single pipeline stage and a multiplier having three pipeline stages are used. If the order of instructions after their execution is different from a stream of the instructions, a problem will occur when writing results of the executed instructions into the data holder. In particular, if an exception interrupt occurs, it will be difficult to restore processing conditions. A standard technique to solve this problem is to use a reorder buffer, which rearranges the results of the execution of instructions according to a stream of the instructions and writes the rearranged results into the data holder. The execution results are also forwarded to an instruction issuer so that they are used by the next instructions.
FIG. 1
is a block diagram showing a parallel processing unit according to a prior art employing the techniques mentioned above.
An instruction cache
100
stores instructions, which are executed in parallel by processors
110
,
111
, and
112
.
Each instruction read out of the instruction cache
100
is once stored in a latch
10
A. Thereafter, a register number of source data of the instruction is transferred to a register file
104
, a scoreboard
105
, and a reorder buffer
106
. The register file
104
and scoreboard
105
correspond to the data holder and data state holder mentioned above.
The register file
104
stores resultant data provided by the processors
110
to
112
. The scoreboard
105
stores register numbers of the register file
104
. The reorder buffer
106
rearranges resultant data provided by the processors
110
to
112
according to an instruction stream and writes the rearranged data into the register file
104
.
If valid data for a given register number is in the register file
104
and reorder buffer
106
, the data is sent to an instruction issuer
107
. The instruction issuer
107
issues instructions with source data to the processors
110
to
112
according to the out-of-order instruction issuing technique.
Data from the register file
104
and reorder buffer
106
are stored in an instruction queue (
FIG. 2
) incorporated in the instruction issuer
107
. The instruction queue has, for each instruction, a data field and a validity flag Vi that indicates whether or not data in the data field is valid. If the flag Vi indicates invalidity, the instruction issuer
107
monitors addresses and data coming through a data forwarding path
108
.
The processors
110
to
112
simultaneously fetch instructions from the instruction issuer
107
, execute them, and send results and their register numbers to the reorder buffer
106
as well as to the instruction queue through the path
108
.
This parallel processing unit has a problem that load on the instruction issuer
107
becomes heavier as the number of simultaneously issued instructions increases.
This problem will be explained with reference to
FIG. 2
, which is a block diagram showing the instruction queue incorporated in the instruction issuer
107
.
The instruction queue has comparators whose number is at least equal to the number of the processors
110
to
112
. Each of the comparators corresponds to source data for an instruction stored in the instruction queue. In
FIG. 2
, there are three comparators
201
,
202
, and
203
for the processors
110
,
111
, and
112
. The comparators
201
to
203
compare resultant data addresses, i.e., resultant data register numbers sent through the path
108
with a source register number of a queued instruction. Based on comparison results, a select signal generator
204
generates a select signal S
204
. According to the select signal S
204
, a selector
205
selects a piece of data among those sent through the path
108
. The comparison results are also sent to an OR gate
206
, which provides an enable signal EN, to set/reset a validity flag Vi of the queued instruction. This arrangement involves a large number of comparators and a long path to select data, to extend a delay time.
Namely, the comparators
201
to
203
compare addresses, i.e., register numbers provided by the processors
110
to
112
with a source data address, i.e., a source data register number of a queued instruction. If one of the register numbers from the processors
110
to
112
agrees with the source data register number, data from the processor that provides the agreed register number is fetched from the path
108
through the selector
205
and is used as source data for the queued instruction. Then, the queued instruction is issued to one of the processors
110
to
112
. In practice, data forwarding sources to the instruction queue of the instruction issuer
107
are not only the processors
110
to
112
but also the reorder buffer
106
and dummy pipelines. The dummy pipelines are connected to the processors
110
to
112
, to reduce the scale of the reorder buffer
106
. In this way, there are many data forwarding sources to the instruction queue.
If the register file
104
is of 32 words, each comparison operation needs a 5-bit comparator. If a register renaming t
Eng David Y.
Foley & Lardner
Kabushiki Kaisha Toshiba
LandOfFree
Parallel processing unit and instruction issuing system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Parallel processing unit and instruction issuing system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parallel processing unit and instruction issuing system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2501521