Pipeline control for high-frequency pipelined designs

Electrical computers and digital processing systems: processing – Instruction issuing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06192466

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to management of pipelined processing to reduce the frequency impact of pipeline synchronization and, more particularly, to in-order and out-of-order pipelined processing.
2. Description of the Related Art
Contemporary high-performance processor designs rely on two aspects to improve performance, namely increasing the speed with which each instruction is processed, and processing multiple instructions (parallel processing) at the same time to process more instructions in a given time period.
Typically, improving one of these aspects results in a degradation of the other aspect, unless appropriate design choices are made. An example of this approach is that using more pipeline stages can increase the achievable frequency, but may not be accompanied by a sustained pipeline utilization, achieving less than the potential peak performance of the architecture.
As processing speed as measured in clock frequency of processor units increases, the synchronization of pipeline elements poses a limiting factor. This is because while speed of process technologies (the processing elements) increase, it is not accompanied by a similar improvement in signal transmission speed. Thus, the cost of transmitting control information from one part of the chip to another part increases relative to the cost of performing operations and is becoming a limiting factor in the achievable processor frequency.
A simple form of parallel processing is pipelining, wherein multiple operations in different stages of processing proceed through a sequence of operation elements.
Referring to
FIG. 1
, a typical processing pipeline is shown. The evaluation of the following expressions: t=a[i]−a[i−1]; i++; may be translated in the following sequence of machine instructions, for example for IBM Power PC™ architecture. In the following code fragment, it is assumed that the address of the array a is stored in a register r5, and that variable i and t are stored in registers r6 and r0, respectively.
PowerPC ™ machine code:
1:
slwi r1, r6, 2
; compute offset i*4
2:
addi r3, r6, −1
; compute i-1
3:
slwi r3, r3, 2
; compute offset (i-1)*4
4:
lwzx r1, r5, r1
; load a[i]
5:
lwzx r3, r5, r3
; load a[i-1]
6:
addi r6, r6, 1
; i++
7:
subfc r0, r3, r1
; compute t
8:
. . .
; program continues
9:
. . .
10:
. . .
The processing steps involved in processing each of these instructions are shown in the block/flow diagram, FIG.
1
. An instruction to be executed by the processor waits to be executed in block
101
. This may be performed using an instruction buffer, or some other indicator which tracks which instructions need to be executed, such as a program counter for an in-order processor. In block
103
, the instruction is issued for execution. The instruction operates in one or more processing units (two processing units are represented by blocks
105
and
107
), passing from one unit to the next until it is passed to a final processing block
109
. In block
109
, the operation commits its results to the machine state, typically by writing the results of a computation to a register file. In some implementations, some of the above described blocks include steps which may be re-arranged or occur multiple times, e.g., some parts of the machine state may be committed to earlier than other parts, e.g., by inserting a second commit stage
109
between blocks
105
and
107
if certain conditions are met.
In a non-pipelined architecture, the above processing may take 1 cycle for each of blocks
103
,
105
,
107
and
109
, for a total of four cycles for the execution of each instruction.
Typically, execution of an instruction needs the operation in a sequence of distinct units, while most or all other units are idle. Thus when instruction 2 executes in unit
105
, the processing elements for steps
103
,
107
, and
109
are idle.
Pipelining addresses this resource inefficiency by processing several instructions in parallel, where each instruction is in a different step of processing, e.g., when operation 2 is processed by unit for block
105
, instruction 1 may be simultaneously processed by the unit implementing block
107
, and unit
103
may be processing instruction number 3.
The procession of processing steps in a pipeline is usually visualized with a “pipeline diagram”. The following Table 1 shows the processing of the previous program segment in the exemplary pipeline shown in FIG.
1
. Presuming a traditional RISC (reduced instruction set computer) type processing pipeline, the blocks
101
,
103
,
105
,
107
and
109
include instruction fetch (IF), instruction decode (ID) execution (EX), memory access (MEM), and register file writeback (WB), and are labeled accordingly in Table 1.
TABLE 1
Exemplary execution without stalls in a typical
RISC pipeline.
Cycle Number
Instr
1
2
3
4
5
6
7
8
9
10
11
12
slwi
IF
ID
EX
MEM
WB
addi
IF
ID
EX
MEM
WB
slwi
IF
ID
EX
MEM
WB
lwzx
IF
ID
EX
MEM
WB
lwzx
IF
ID
EX
MEM
WB
addi
IF
ID
EX
MEM
WB
subfc
IF
ID
EX
MEM
WB
8:...
IF
ID
EX
MEM
WB
9:...
IF
ID
EX
MEM
10:...
IF
ID
EX
This diagram shows instructions in the pipeline at any given time by reading the column for an appropriate time point, and in what cycle each instruction is performed in a particular processing step. Pipeline diagrams and the functioning of pipelines are described in more detail in “Computer Architecture—A Quantitative Approach” by J. L. Hennessy and D. A. Patterson, 2nd edition, Morgan Kaufmann Publishers, 1996.
A fundamental property of a pipeline is that only one instruction can be processed by any given unit at any time, unless units are duplicated for a given step. It is the purpose of a pipeline control mechanism to enforce this property, by ensuring that instructions proceed only if the next processing unit is available, i.e., if it is empty or will be surrendered by the instruction currently being processed. There may be many reasons why an instruction may not surrender a processing unit, such as a long running operation which takes multiple cycles, an operation may be waiting for missing operands, a cache miss may have occurred which has to be serviced, and so forth.
If a processing unit for the next step will not become available, an instruction has to remain in its current processing unit, a process which is referred to as “stall”. As a result it will not vacate its processing unit, and so the next instruction upstream which expects to receive the stalling instruction's processing unit will stall, and so forth. Enforcing these stalls is referred to as flow control, and is the main aim of the pipeline control unit.
An example of a stall operation is presented in the following pipeline diagram, Table
2
, where the first load operation experiences a cache miss, and takes two cycles. As a result, subsequent operations are stalled for one cycle and resume later.
TABLE 2
Exemplary execution with stalls in a typical RISC
pipeline.
Cycle Number
Instr
1
2
3
4
5
6
7
8
9
10
11
12
slwi
IF
ID
EX
MEM
WB
addi
IF
ID
EX
MEM
WB
slwi
IF
ID
EX
MEM
WB
lwzx
IF
ID
EX
MEM
MEM
WB
lwzx
IF
ID
EX
stall
MEM
WB
addi
IF
ID
stall
EX
MEM
WB
subfc
IF
stall
ID
EX
MEM
8:...
IF
ID
EX
WB
9:...
IF
ID
...
10:...
IF
...
The mechanism generating the stall signal, and how each pipeline stage actually processes such signal is now described. It will first be discussed how a single execution pipeline stage in blocks
105
or
107
of
FIG. 1
performs a stall operation, and how this is controlled by the pipeline control. It will then be discussed how the execution of instructions in blocks
105
or
107
is controlled by a pipeline control mechanism implemented according to FIG.
2
.
Referring to
FIG. 2
, a single execution pipeline stage represented by one of blocks
105
or
107
of
FIG. 1
for a pipeline according to the prior art in the implementation of microprocessors is shown. In block
151
, the pipeline stage receives the instruction from the upstream pipeline stage, which may b

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Pipeline control for high-frequency pipelined designs does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Pipeline control for high-frequency pipelined designs, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pipeline control for high-frequency pipelined designs will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2580337

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.