Method and apparatus for reducing code size by executing no...

Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Reducing an impact of a stall or pipeline bubble

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S226000, C712S219000, C712S245000

Reexamination Certificate

active

06564316

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to processor performance generally and to the instruction set architecture (ISA) of such processors, in particular.
BACKGROUND OF THE INVENTION
A chip that includes either a general purpose processor or a Digital Signal Processor (DSP) contains different elements besides the processor. One of the most significant elements in terms of area of the chip used are the memories. These memories may be program memories, data memories, RAM, ROM or any other type of memory. The silicon wafer die size consumed by the chip influences the cost of the chip and it therefore should be minimized. Thus, any reduction in memory size will reduce the cost of the chip.
In a chip using any kind of processor, there is usually a program or code memory as part of the chip die. The program includes the instructions required to be fetched by the processor. The nature of the program, meaning the encoding of the instructions itself, is defined as the instruction set architecture (ISA) of the processor.
One common way to reduce the area of the program memory is in the encoding of the instructions. If each instruction can be made to consume less bits, the program memory will be smaller. For example, a processor whose instructions are encoded in a 32-bit word is likely to consume more program space than another processor that uses 16-bit instruction words.
Pipelines are extensively used in processor architectures. Each pipeline includes the stages, each of which takes one cycle to complete, to be performed by the processor in order to complete an instruction.
Reference is now made to
FIG. 1
which illustrates an exemplary instruction pipeline which will be used for reference purposes throughout this patent application. The exemplary pipeline comprises six stages which each take up one clock cycle of the processor and add up to a total execution of an instruction. The exemplary pipeline comprises the following stages: Instruction Fetch (IF), Instruction Decode (ID), Address Generation (AG), Operand Fetch (OF), Execute (EX) and Flags & Conditions (FC). The architecture of the processor and its state machine together determine the pipeline and its depth.
Reference is now made to
FIGS. 2A and 2B
which illustrate a set of instructions which utilize the pipeline of FIG.
1
. The instructions are denoted I
1,
I
2,
I
3,
I
4
in FIG.
2
A and denoted I
1,
I
2,
I
3,
I
10
in FIG.
2
B. Each instruction goes through the stages of the pipeline where each stage uses one clock cycle, as described hereinabove. As far as the flow of the code is concerned there are two types of instruction: sequential (seq) and non-sequential (non-seq). Examples of sequential instructions are “add” and “subtract”. Non-sequential instructions are instructions which break the pipeline. The major reason for using them is to branch to a different memory location than the following one. The reason that this might be necessary is, for example, to perform a loop which repeats itself, to determine a condition and decide what instruction to take next or to accept an interrupt or any other non-sequential event.
Thus in
FIGS. 2A and 2B
, instructions I
1
and I
2
are sequential instructions whereas instruction I
3
is a non-sequential or branch instruction whose target, should a condition be met, is instruction I
10
(FIG.
2
B). If the condition is not met, the pipeline proceeds with the next sequential instruction, I
4
, which is shown in FIG.
2
A.
The non-sequential instructions usually involve a penalty for breaking the pipeline. There are two main reasons for this penalty. The first is that the non-sequential instruction has to be decoded before the target instruction can be fetched. Thus, the execution of the target instruction cannot begin until after the decoding stage of the non-sequential instruction has been completed. Sometimes the address of the target instruction has to be calculated prior to fetching it, further delaying the beginning of execution of the target instruction.
Another penalty arises when the non-sequential instruction is conditional. In this case, the execution of the non-sequential instruction has to be halted until the condition is checked and a true/false indication is ready.
FIG. 2B
shows the flow of a taken branch (i.e. a conditional branch in which the condition was met) in the pipeline where the target instruction is I
10
. In this case, the penalty of the branch instruction is 4 cycles because, as was mentioned above, the true/false condition is only known at a late stage. The true/false condition is known at the EX (or execute) stage of the I
3
branch instruction and therefore the IF (instruction fetch) stage of target instruction I
10
must wait for four cycles (cycles
4
,
5
,
6
,
7
) in order to start. Thus, the branch instruction takes 5 cycles to execute.
When the condition is found to be false and the branch is not taken, this branch instruction will only take four cycles, as shown in FIG.
2
A. This is illustrated by the I
4
pipeline, I
4
being the instruction following the branch instruction I
3,
as opposed to the target instruction I
10
(FIG.
2
B). This causes a penalty of three cycles (cycles
4
,
5
and
6
) over a single cycle instruction. The lower execution time in this case is due to the early instruction fetch mechanism, which starts fetching the next sequential instruction I
4
in the cycle before the condition is known (cycle
7
). The pipeline of this instruction is then halted if the assumed condition does not, in fact, occur.
The branch instruction may be, for example the machine code “branch, new, neq”, where “branch” indicates a branch instruction, “new” indicates the target instruction and “neq” is the condition for taking this branch, “new” and “neq” each respectively constitute a field in the branch instruction.
The penalty of a certain instruction is calculated by counting the number of cycles between its ID stage and the ID stage of the next instruction. In case of a single-cycle instruction, there is no penalty since the ID stage of the next instruction immediately follows. Since in a non-sequential instruction this is not the case, the intermediate cycles are termed “wasted cycles”.
For a branch instruction, at least three cycles are wasted (cycles
4
,
5
,
6
in
FIGS. 2A and 2B
marked with “wasted IF”), irrespective of whether or not the branch is taken.
Reference is now made to
FIGS. 3A and 3B
which illustrate a common way to eliminate these wasted cycles by using delay-slot instructions.
FIG. 3A
illustrates the case where the branch instruction is not taken and
FIG. 3B
where the branch instruction is taken. Similar items to previous figures have similar reference numerals and will not be described further. Delay slot instructions are instructions which use the wasted cycles. They appear after the branch instruction, but will be executed before the branch instruction is executed. Any sequential instruction can be used as a delay slot instruction.
Thus, instructions DS
1
, DS
2
and DS
3
demonstrate three delay slot instructions added to the pipeline of
FIGS. 2A and 2B
. These delay slot instructions are executed in the spaces corresponding to the “wasted IF” cycle (cycles
4
,
5
,
6
). The following code (written in machine code) demonstrates the concept of these instructions:
comp a0, a1
;the branch condition instruction
branch next,neq
;branch to address ‘next’ if a0 ≠ a1 (neq = not equal)
→move r0, r1
;these 3 instructions will be executed prior to the
;branch instruction, even though they appear after it.
;They don't affect the decision whether to take the
;branch or not.
→add r1, a0
→nop
Using the suggested pipeline of FIG.
1
and the pipeline flow of the branch instructions, a maximum of three delay slot instructions can be supported. These will fit into the three wasted cycles (“wasted IF”) in the branch instruction as mentioned above in respect of
FIGS. 2A and 2B
. The three instructions marked with a→are the three delay slot instructions. The nop instru

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for reducing code size by executing no... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for reducing code size by executing no..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for reducing code size by executing no... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3008129

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.