Electrical computers and digital processing systems: processing – Instruction decoding – Predecoding of instruction component
Reexamination Certificate
1999-11-22
2002-12-03
Trammell, James P. (Department: 3621)
Electrical computers and digital processing systems: processing
Instruction decoding
Predecoding of instruction component
C717S120000, C717S140000, C714S038110
Reexamination Certificate
active
06490673
ABSTRACT:
This application is based on an application Ser. No. 10-337186 filed in Japan, the content of which is hereby incorporated by reference.
BACKGROUND OF THE INVENTION
(1) Field of the Invention
The present invention relates to a processor, compiling apparatus, and compile program recorded on a recording medium, and especially relates to technologies of reducing the number of execute cycles in parallel processing by the processor.
(2) Description of the Related Art
As apparatus with built-in microprocessors have improved functions and speeds, a microprocessor (referred to a “processor” in this specification) with more improved processing performance has been required.
For improved throughput of a plurality of instructions on a processor, the pipeline control is adopted. The pipeline control will be described below. An instruction is divided into a plurality of unit instructions that are to be continuously executed. The process of executing one instruction is also divided into a plurality of continuous smaller processes (referred to “stages” in this specification). The processor has executing units (hardware) which each corresponding to different stages. Each of the unit instructions is continuously executed by a different executing unit at a different stage to execute the instruction. When two instructions are continuously executed, each of the unit instructions of the second instruction is executed by a different executing unit at a different stage one stage behind the first instruction. By doing so, a plurality of instructions are executed in parallel.
For more improved performance, parallel processing is adopted at individual instruction level. The parallel processing at instruction level is to simultaneously execute a plurality of instructions in one machine cycle. The parallel processing at instruction level is executed by dynamic scheduling and static scheduling.
One representative example of the parallel processing at instruction level by dynamic scheduling is the superscalar system. According to the superscalar system, the operations described below are executed when a plurality of instructions are executed on a processor. The instruction codes are decoded. Then, an instruction issuing control unit (hardware) of the processor analyzes the dependency relations of the plurality of instructions using the decoded instruction codes and judges whether the instructions can be executed in parallel. The processor executes instructions in parallel that can be executed in parallel.
On the other hand, one representative example of the static scheduling is the VLIW (Very Long Instruction Word) system. According to the VLIW system, the operations described below are executed. At the time of the generation of the execution code, the dependency relations among the plurality of instructions are analyzed using the compiler and the like. According to the analysis, instruction codes are moved to generate an instruction stream that is more efficiently executed. Generally, a plurality of instructions that can be simultaneously executed are described in an instruction supply unit of fixed length (referred to a “packet” in this specification) in the VLIW system.
In each of the scheduling systems, hazard due to the dependency relations of data is avoided at the instruction parallel processing. More specifically, it is controlled so that an instruction to store a value in a register and an instruction to refer to the stored value are not issued in the same cycle according to the information on the names of registers to which is referred to for the data and in which the data is stored. According to the dynamic scheduling, the instruction issuing control unit controls so that the two instructions are not executed in parallel but executed in serial. On the other hand, according to the static scheduling, the compiler schedules so that a group of instructions that are issued in the same cycle does not include instructions that have data dependency relations at the time of compiling.
Recently, an increasing number of processors have adopted media processing instructions that deal with data whose size is larger than that of data dealt with by basic instructions as well as basic instructions for signal processing performance improvement. In the media processing instruction, a plurality of pieces of data are stored in a register whose length is larger than the length of registers used for basic instructions. The plurality of pieces of data are processed in parallel for the improvement of the signal processing performance. Some processors adopting the media processing instruction are not equipped with registers specifically for the media processing instruction. Instead, in those processors, the registers are shared for the basic instruction and the media processing instruction and data is written in part of the registers for the basic instruction.
When the dependency relations among a plurality of instructions are analyzed in those processors by referring to the register names shown in the instruction codes according to the instruction issuing control method that has been described, an instruction to update the upper half of one register and an instruction to update the lower half of the register are executed in serial since the same register name in the instruction codes is considered the data dependency relation between the instructions This is problematic. Here, the data dependency relation refers to the dependency relation between an instruction to store data in a resource and another instruction to refer to the stored data.
SUMMARY OF THE INVENTION
It is accordingly the object of the present invention to provide a processor, a compiling apparatus, and a compile program recorded on a recording medium that reduce the number of execute cycles when parallel processing is performed in a processor that execute a plurality of instructions in one cycle.
The above-mentioned object may be achieved by a processor that processes a plurality of instructions in one cycle, the processor may include: A) a register; B) an instruction fetching unit for fetching the plurality of instructions that include at least a first instruction and a second instruction from an external program, the first instruction including a first access indication for accessing a first area, which is at least part of an area in the register, the second instruction including a second access indication for accessing a second area, which is at least part of the area in the register, wherein when the first area is a whole of the register, the second area is the part of the register, when the second area is the whole of the register, the first area is the part of the register, and at least one of the first and second access indications is for storing data in at least the part of the register; C) a decoding unit for decoding each of the fetched instructions and outputting at least decoded information on the register and on areas in the register in one cycle, the decoded information including at least information on the register and on the first and second areas; and D) an access unit for accessing the first and second areas according to the decoded information in one cycle.
In the processor, an instruction to access the first-part in one register and another instruction to access the second part in the same register in a program can be executed in one cycle. As a result, the number of execute cycles is reduced compared with a conventional processor.
The above-mentioned object may be also achieved by the processor, wherein the first area, which is an object of the first access indication, and the second area, which is an object of the second access indication, are parts of the register and have no overlap, the first instruction includes an indication for storing data in the first area and the second instruction includes an indication for referring to data in the second area, and the access unit stores data in the first area and refers to data in the second area in one cycle.
In the processor, an instruction to store data in the first part of one regis
Heishi Taketo
Odani Kensuke
Elisca Pierre E.
Matsushita Electric Industrial Co. LTD
Price and Gess
Trammell James P.
LandOfFree
Processor, compiling apparatus, and compile program recorded... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Processor, compiling apparatus, and compile program recorded..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Processor, compiling apparatus, and compile program recorded... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2962939