Superscalar processor with direct result bypass between...

Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Commitment control or register bypass

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06233670

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to superscalar processors and, more particularly, to a superscalar processor capable of directly transferring data used in a plurality of instructions executed in parallel between pipelines.
2. Description of the Background Art
“A superscalar” is known as one of the architectures for increasing the processing speed of a microprocessor. Instructions which can be executed simultaneously are detected out of given plurality of instructions, and the detected instructions are processed simultaneously or in parallel by a plurality of pipelines in a microprocessor using a superscalar.
FIG. 7
is a block diagram of a superscalar processor illustrating the background of the present invention. Referring to
FIG. 7
, a superscalar processor
20
includes an instruction fetching stage
2
for fetching a plurality of instructions stored in an instruction memory
1
, an instruction decoding stage
3
for decoding the instructions fetched in instruction fetching stage
2
, function units
14
to
17
each having a pipeline structure, and a register file
9
for temporarily holding data used for executing the instructions. Functional units
14
to
17
can access an external data memory
8
through a data bus
11
. Register file
9
is implemented with a RAM and is accessed from function units
14
to
17
.
Instruction fetching stage
2
includes a program counter (not shown) and gives an address signal generated from the program counter to instruction memory
1
. Designated plurality of instructions designated by the given address signal are fetched and held in instruction fetching stage
2
.
Instruction decoding stage
3
receives the plurality of instructions from instruction fetching stage
2
and decodes them. Simultaneously executable instructions are detected out of the given plurality of instructions by decoding the instructions. In addition, instruction decoding stage
3
relays data between function units
14
to
17
and register file
9
. Specifically, instruction decoding stage
3
reads data to be used by function units
14
to
17
for executing the given instructions from register file
9
and gives the read data to function units
14
to
17
.
Each of function units
14
to
17
has a pipeline structure. Specifically, superscalar processor
20
has four pipelines implemented with four function units
14
to
17
.
The four function units
14
to
17
perform predetermined arithmetic operations as described in the following, for example. Function units
14
and
15
perform integer arithmetic operations. Function unit
16
carries out loading and storing of data into data memory
8
. Function unit
17
performs floating-point arithmetic operations. Each of function units
14
and
15
includes an execution stage (EXC) and a write back stage (WB) to register file
9
. Function unit
16
includes an address processing stage (ADR), a memory accessing stage (MEM), and a write back stage (WB). Function unit
17
includes three execution stages (EX
1
, EX
2
, EX
3
) and a write back stage (WB). Generally, the execution stages perform arithmetic operations and an address calculation, and, on the other hand, the memory access stage performs reading/writing from/into data memory
8
.
Superscalar processor
20
operates in response to externally applied two-phase non-overlap clock signals &phgr;
1
and &phgr;
2
. Specifically, instruction fetching stage
2
, instruction decoding stage
3
, and various stages in function units
14
to
17
are operated in response to clock signals &phgr;
1
and &phgr;
2
under the control of pipelines. An example of two-phase non-overlap clock signals is illustrated in FIG.
6
.
In operation, instruction decoding stage
3
detects simultaneously executable instructions out of given plurality of instructions and gives the detected instructions to function units
14
to
17
(according to circumstances, to some of function units
14
to
17
). Function units
14
to
17
have pipeline structure, so that they can execute the given instructions simultaneously or in parallel.
Now, it is assumed that a superscalar processor has three function units (pipelines), and each function unit has an execution stage (EXC), a memory access stage (MEM), and a write back stage (WB). An example of progress of pipeline processing in this case is illustrated in FIG.
8
A. Referring to
FIG. 8A
, it is assumed that three pipelines PL
1
, L
2
, and PL
3
execute instructions
1
,
2
, and
3
, respectively. Processing in instruction fetching stage
2
is performed in a period T
1
, and processing in instruction decoding stage
3
is performed in a period T
2
in pipeline PL
1
. Processing in the execution stage, the memory access stage, and the write back stage is executed in periods T
3
, T
4
, and T
5
, respectively. On the other hand, in pipeline PL
2
, processing in instruction fetching stage
2
is started in period T
2
. The stages (ID, EXC, MEM, WB) are performed in periods T
3
to T
6
, respectively, as in pipeline
1
. In pipeline PL
3
, after processing in instruction fetching stage
2
is started in period T
3
, processing in respective stages is performed in periods T
4
to T
7
. As seen from
FIG. 8A
, each of pipelines PL
1
to PL
3
executes corresponding one of the given instructions
1
to
3
, so that it is understood that respective stages are made to proceed simultaneously and in parallel. However, a problem arises from the view point of time required for processing in the following case.
Referring to
FIG. 8B
, it is assumed that two instructions
11
and
12
are given, and they are processed by pipelines PL
1
and PL
2
. In addition, it is assumed that the data of a result obtained by executing instruction
11
is used in processing of instruction
12
. In other words, it is assumed that instruction
12
which executes its own processing using the data obtained by executing instruction
11
is given.
Conventionally, instruction
11
is executed and terminated first in such a case. Specifically, in pipeline PL
1
, instruction fetching stage
2
is executed in period T
1
, and instruction decoding stage
3
is executed in period T
2
. The execution stage, the memory access stage, and the write back stage are executed in periods T
3
, T
4
, and T
5
, respectively. Data obtained by executing instruction
11
is once stored in register file
9
illustrated in
FIG. 7
according to execution of the write back stage. On the other hand, in pipeline PL
2
, instruction fetching stage
2
is executed in period T
2
, and instruction decoding stage
3
is executed in period T
3
. However, execution of instruction
12
is stopped in periods T
4
and T
5
. The reason for this is that instruction
12
uses data obtained by executing instruction
11
as described above, so that it should wait for termination of execution of instruction
11
. Accordingly, processing in pipeline PL
2
is stopped until the write back stage in pipeline PL
1
is terminated in period T
5
. In other words, pipeline PL
2
is brought to a standby state (pipeline interlock) in periods T
4
and T
5
.
After period T
5
, the data obtained by executing instruction
11
is stored in register file
9
. Therefore, execution of instruction
12
is restarted in pipeline PL
2
in period T
6
. Specifically, after instruction decoding stage
3
is executed in period T
6
, the execution stage, the memory access stage, and the write back stage are executed in periods T
7
to T
9
, respectively.
As described above, after the data obtained by executing instruction
11
is once written in register file
9
, register file
9
is accessed in processing of another instruction
12
. In other words, the data obtained by executing processing in a pipeline PL
1
is given to another pipeline PL
2
through register file
9
. However, as illustrated in
FIG. 8B
, although the data obtained by executing instruction
11
has been already obtained by processing in the execution stage in period T
3
, transmission of data between two pipe

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Superscalar processor with direct result bypass between... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Superscalar processor with direct result bypass between..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Superscalar processor with direct result bypass between... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2437061

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.