Electrical computers and digital processing systems: processing – Instruction decoding – Decoding by plural parallel decoders
Reexamination Certificate
1999-03-29
2001-11-27
Pan, Daniel H. (Department: 2783)
Electrical computers and digital processing systems: processing
Instruction decoding
Decoding by plural parallel decoders
C712S245000, C712S206000, C712S215000, C712S023000
Reexamination Certificate
active
06324639
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an instruction conversion apparatus, a processor, a storage medium storing parallel execution codes to which a plurality of instructions have been assigned, and a computer-readable storage medium storing an instruction conversion program that generates such parallel execution codes. In particular, the invention relates to a technique for decreasing the number of execution cycles and improving code efficiency by using parallel processing.
2. Description of the Background Art
In recent years, parallel processing methods have been widely used in the development of microprocessors. Parallel processing refers to the execution of a plurality of instructions in each machine cycle. Examples of classic parallel processing techniques are superscalar methods and VLIW (Very Long Instruction Word) methods.
In superscalar methods, specialized circuitry in the processor dynamically analyzes which instructions can be executed in parallel and then has these instructions executed in parallel. These methods have an advantage in that superscalar processors can be made compatible with serial processing methods. This means that object code that has been generated by a compiler for a serial processor can be executed in its original state by a superscalar processor. A disadvantage of superscalar techniques is that specialized hardware needs to be provided in the processor to dynamically analyze the parallelism of instructions, which leads to an increase in hardware costs. Another disadvantage is that the provision of specialized hardware makes it difficult to raise the operation clock frequency.
In VLIW methods, a plurality of instructions that can be executed in parallel are arranged into an executable code of a fixed length, with the instructions in the same executable code being executed in parallel. For VLIW methods, an “executable code” is a unit of data that is fetched from memory in one cycle or is decoded and executed in one cycle.
For VLIW methods, there is no need during execution for the processor to analyze which instructions can be executed in parallel. This means that little hardware is required, and that raising the operation clock frequency is easy. However, the use of fixed-length instructions leads to the problems described below.
In VLIW executable codes, there is a significant variation in the number of bits required to define different kinds of instructions. As examples, instructions that deal with a long constant, such as an address or an immediate, require a large number of bits, while instructions that perform calculations using registers may be defined using fewer bits. As stated above, VLIW deal with executable codes of a fixed length, so that NOP codes need to be inserted into instructions that only require a low number of bits. This increases code size.
To solve this problem, a technique that fetches a fixed amount of code from memory in each cycle but decodes and executes a variable amount of code has been proposed in recent years. Hereafter, this technique will be referred to as the “fixed-supply/variable-execution method”.
FIG. 1A
shows the instruction supply unit used in the fixed-supply/variable-execution method. Since there is variation in the number of bits needed to define different instructions, two different formats are used. Instructions that require a large number of bits use a first format composed of two units, units
1
and
2
, while instructions that only require few bits use a second format composed of one unit, unit
3
. Here, instructions that have a length of one unit are called “short instructions”, while instructions that have a length of two units are called “long instructions”.
While there are both short and long instructions, instructions are supplied three units at a time, with no attention being paid to the differences in types.
FIG. 1B
shows the units (hereafter called “packets”) for fetching instructions from memory in each cycle in this fixed-supply/variable-execution method.
FIG. 1C
, meanwhile, shows the minimum units (hereafter called “execution units”) for decoding and execution by this processor.
During execution, all instructions in an area in
FIG. 1B
demarcated by parallel processing boundaries are executed in parallel in one cycle. This means that in each cycle instructions are executed in parallel as far as the instruction that is set the next parallel processing boundary shown in
FIG. 1B
using shading. Instructions that have been supplied but are not executed are accumulated in an instruction buffer and are executed in a following cycle.
In
FIG. 1B
, the parallel processing boundary is set at unit
6
, so that all units from unit
1
to unit
6
are set as one execution unit. Of these units, unit
1
~unit
2
, unit
3
~unit
4
, and unit
5
~unit
6
each compose a long instruction, so that these three long instructions are executed in parallel.
The next parallel processing boundary in
FIG. 1B
is set at unit
11
, so that all units from unit
7
to unit
11
are executed in one execution unit. Of these units, unit
7
~unit
8
compose a long instruction, unit
9
composes a short instruction, and unit
10
~unit
11
compose a long instruction. These three instructions are executed in parallel.
In this method, instructions are supplied using a fixed-length packet, and a suitable number of units is issued in each cycle based on information that is found through static analysis. Using this method, there is absolutely no need to insert the no operation instructions (NOP codes) that are required in conventional VLIW methods with fixed length instructions. As a result, code size can be reduced.
The following describes the hardware construction of a processor for this fixed-supply/variable-execution method.
FIG. 2
is a block diagram showing the construction of the instruction register and periphery in a processor that is capable of executing three instructions in parallel. The broken lines in
FIG. 2
show the control flows. The unit queue in
FIG. 2
is a sequence of units. These units are transferred to the instruction registers in the order in which they were supplied from the instruction memory (or similar).
In this construction, the instruction register A
52
a
and the instruction register B
52
b
form one pair, as do the instruction register C
52
c
~the instruction register D
52
d
and the instruction register E
52
e
~the instruction register F
52
f
. Instructions are always arranged so as to start from one of the instruction register A
52
a
, the instruction register C
52
c
, and the instruction register E
52
e
. Only when an instruction is formed of two linked units is part of the instruction sent to the other instruction register in a pair. As a result, when the unit transferred to the instruction register
52
a
is a complete instruction in itself, no unit is transferred to the instruction register B
52
b.
The main characteristic of the above processor is that parallel processing can be performed for any combination of short and long instructions.
When three long instructions are to be executed in parallel, the three long instructions will be composed of three pairs unit
1
~unit
2
, unit
3
~unit
4
, and unit
5
~unit
6
in the unit queue
50
. The present processor stores the first long instruction in the pair of the instruction register A
52
a
~instruction register B
52
b
, the second long instruction in the pair of the instruction register C
52
c
~instruction register D
52
d
, and the third long instruction in the pair of the instruction register E
52
e
~instruction register F
52
f
. After being stored in this way, the three long instructions are executed by the first instruction decoder
53
a
~third instruction decoder
53
c.
When the three instructions to be executed in parallel are the long instruction composed of unit
1
~unit
2
, the short instruction composed of unit
3
, and the long instruction composed of unit
5
~unit
6
, the present processor stores the first instruction in the pair of the instruction register A
52
a
~inst
Heishi Taketo
Higaki Nobuo
Odani Kensuke
Takayama Shuishi
Tanaka Tetsuya
Matsushita Electric - Industrial Co., Ltd.
Pan Daniel H.
Price and Gess
LandOfFree
Instruction converting apparatus using parallel execution code does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Instruction converting apparatus using parallel execution code, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Instruction converting apparatus using parallel execution code will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2594476