Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1997-09-10
2001-06-05
Kim, Kenneth S. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C708S212000
Reexamination Certificate
active
06243806
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a program execution method and a device using the same. In particular, the present invention relates to a program execution method which uses data holding unit based on a general-purpose register and to a program execution device using said method. The present invention can, for example, be applied in a microprocessor which uses a pipeline method.
2. Description of the Related Arts
A RISC (Reduced Instruction Set Computer) single-chip microprocessor is a device simultaneously offering high processing performance in all types of usage, low power consumption and small system area. This type of microprocessor often uses a pipeline method of internal data processing, in which processing is divided into multiple stages and multiple instructions are simultaneously processed in different stages, thereby increasing processing speed. Therefore, the consideration of branching instructions which disrupt the flow of pipeline processing is recognized as an important aspect of microprocessor design.
FIG. 3
shows pipeline processing in a generally used conventional microprocessor. In
FIG. 3
, processing is at a different stage at each clock and, in the example depicted, 1 instruction is processed completely in 5 stages. These 5 stages are expressed as IF, ID, EX, MEM, WB, and respectively denote: Instruction Fetch, Instruction Decode (and read-out from register), Computation, Memory Access, and Write-Back of necessary data to the register.
Now let us assume that instruction
1
is a branching instruction which generates branching when data in a general-purpose register referred to in the instruction is zero. Ordinarily, the EX stage determines if the data in the general-purpose register is zero. Consequently, when branching is actually generated, the branching destination address is fetched from the instruction memory by an instruction whose IF stage (shaded in the diagram) commences simultaneously with the end of the EX stage of instruction
1
(shaded in the diagram), namely instruction
4
. Therefore, in this configuration, branching delay is 2 cycles. “Branching delay” denotes the number of cycles (in this example, cycles
2
and
3
) between a cycle which processes a branching instruction and a cycle which actually commences processing the instruction at the branching destination. When there are many such cycles, the penalty for branching increases, hindering high-speed processing.
A technique for reducing branching delay was proposed in “Computer Architecture—A Quantitative Approach to Design, Realization and Evaluation” (David A. Patterson and John L. Hennessy, Nikkei BP Publications), pages 262~264.
FIG. 4
shows a virtual microprocessor DLX branching determining circuit as discussed in the above publication. In DLX, a zero determining portion
6
is provided for determining whether or not register data between a group of registers
2
and a computing unit
4
is zero; a branching determining portion
8
is provided for determining if branching has occurred based on the result of the zero determining portion. The determination result is sent to a circuit comprising an IF stage and the address of the next instruction to be fetched is specified.
In this configuration, data is read out from the register during the ID stage and the computation unit
4
computes the data during the EX stage. So far, this is identical to the conventional method. However, it is the zero determining portion
6
, and not the computation unit
4
, which determines if the register data is zero. The zero determining portion
6
is provided specifically for this function, and determining is thus completed during the ID stage. As a result, fetching of the instruction at a branching destination address can commence simultaneously with the end of the ID stage, without waiting for the end of the EX stage.
FIG. 5
is a diagram showing the pipeline processing in a DLX microprocessor when a program branches. Here, a branching destination address is fetched in the IF stage (shaded) of instruction
3
which commences when the ID stage (shaded) of instruction
1
ends. Branching delay can thereby be reduced by 1 cycle.
DLX has a branching delay of 1 cycle, which is regarded as the minimum delay possible with an ordinary pipeline-system microprocessor. However, even with the same branching delay, how high the clock frequency can be raised with the given circuit configuration is another matter. Even when branching delay is reduced to 1 cycle, the overall performance will lower if clock frequency has to be lowered by 10% due to the critical path resultantly created. In the case of DLX, a zero determining portion
6
is provided to the ID stage as shown in
FIG. 4
, but since this determining circuit determines the zero states of 32 bits of data input thereto, this naturally causes delay. Attempting to conclude the determining of zero and branching in the ID stage inevitably lowers the maximum clock frequency.
SUMMARY OF THE INVENTION
The present invention has been devised after consideration of the above points and aims to provide a program execution method and device capable not only of minimizing branching delay, but also of raising maximum clock frequency.
In order to achieve the above objectives, when storing data in data holding unit, a program execution method of the present invention records whether or not the data has a value. Then, when it is necessary to determine if data stored in the data holding unit has a value in a predetermined set of values, the record is referenced instead of the data holding unit. The “predetermined set of values” here is fixed as required on a case by case basis, and may consist of multiple values such as, for instance, “−1 and 1” or “positive integers.”
As explained above, the problem is the delay caused by judging the content of data stored in the data holding unit. In order to solve this problem, the fact of whether or not the value of the data belongs to a predetermined set of values is recorded together with the data. In order to judge if data has a value in a predetermined set of values, rather than performing an arithmetic computation on the data in the data holding unit, the content of this record is referenced. It is thus possible to reduce the time required for judging. Consequently, in addition to reducing branching delay, it becomes easier to raise clock frequency.
In the program execution device of the present invention, a flag for indicating whether or not data held by the data holding unit has a value belonging to a predetermined set of values is provided in a one-to-one relationship with the data holding unit. When this flag is a zero flag, it can be used for many branching judgement. The data holding unit may, for instance, comprise a general-purpose register. In this configuration, since it is possible to judge if data stored in the data holding unit has a value belonging to a predetermined set of values simply by checking the flag, processing performance can be improved.
A conventional microprocessor includes a system called condition coding and has a flag such as a zero flag which reflects the computation result. However, this flag only reflects the result of the previous computation and does not correspond directly to the register contents. Consequently, in the case of a register whose contents were written some time earlier, it is only possible to determine if the data in the register is zero by performing a computation for the data once again. This restricts high-speed processing.
One aspect of the program execution device of the present invention comprises data holding unit for holding data required to execute a program; a flag provided in a one-to-one relationship with the data holding unit; computing unit for performing computations to data held in the data holding unit; judging unit for judging if the value of data obtained by computation belongs to a predetermined set of values; rewriting unit for rewriting data obtained by computation in the data holding unit; and
Koumura Yasuhito
Matsumoto Kenshi
Miura Hiroki
Hogan & Hartson LLP
Kim Kenneth S.
Sanyo Electric Co,. Ltd.
LandOfFree
Program execution method and apparatus employing data flags... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Program execution method and apparatus employing data flags..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Program execution method and apparatus employing data flags... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2437797