Pipeline decoupling buffer for handling early data and late...

Electrical computers and digital data processing systems: input/ – Input/output data processing – Input/output data buffering

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S241000, C712S218000, C712S219000

Reexamination Certificate

active

06629167

ABSTRACT:

TECHNICAL FIELD
The invention relates to computers and microprocessors. More particularly, this invention relates to the method and apparatus for improving the performance of pipelined microprocessors.
BACKGROUND ART
Making computers run faster has been an eternal goal of the computer industry. Since its introduction in the early 1950's, the pipelining technique has proven to be more than a transient trend, and has taken a foot hold in modem computing as a major performance enhancement technique. Almost all microprocessors today employ some level of pipelining technique to maximize their speed performance.
The pipelining technique involves breaking down a task, e.g., execution of an instruction, processing of data or a performance of an arithmetic operation, etc., into a number of smaller sub-tasks. The task travels down a pipeline having a number of stages arranged in an assembly line fashion, each stage processing one of the sub-tasks. The task is completed when all of the sub-tasks are completed, i.e., when the sub-tasks have processed through every stage of the pipeline. For example, if a pipeline comprises N stages, a task would take N clocks to complete, i.e., N sub-tasks must be completed.
A key feature of the pipelining technique is that a new task can be fed into the pipeline on every clock cycle. For instance, a while the first task has moved on to the second stage of the pipeline, a second task can be fed into the pipeline to occupy the first stage of the pipeline. Thus, ideally, after the first N clock cycles, the pipeline should be completely filled, i.e., hold N tasks. Under this ideal circumstances, a completion of a task can be observed on every clock cycle. Thus, a significant performance enhancement may be realized from pipelined execution of instructions.
Some computer systems employ multiple pipelines arranged in a serial manner as, e.g., shown in
FIG. 1
, which shows a first pipeline
101
—commonly referred to as the front-end pipeline—, and a second pipeline
102
—commonly referred to as the back-end pipeline. The first pipeline
101
may comprise, e.g., stages A, B and C. The second pipeline
102
may comprise, e.g., stages D, E and F.
In this arrangement, a task is completed when it has traveled through each of the stages, A, B, C, D, E and F, i.e., it has to travel through both pipelines
101
and
102
. The decoupling buffer
103
provides a decoupling between the two pipelines
101
and
102
so that a stall condition in one pipeline does not affect the other pipeline.
For example, when the second pipeline becomes “stalled”, i.e., cannot receive data output by the first pipeline
101
, the data output from the last stage of the first pipeline
101
, i.e., from stage C, is temporarily stored in the decoupling buffer
103
, and fed therefrom to the initial stage, i.e., stage D, of the second pipeline
102
when it once again becomes available to receive the data. When the first pipeline
101
is stalled, i.e., produces no data for the second pipeline
102
, the second pipeline
102
receives data from the decoupling buffer
103
. Thus, the buffer
103
may provide each of the first pipeline
101
and the second pipeline
102
an immunity from the effects of any stall conditions in the one another, and thus increase overall throughput.
An example of the above described operation of a conventional pipeline including the decoupling buffer is shown in
FIG. 2
, which shows data objects
0
-
9
progressing through the various stages of the pipelines. In particular,
FIG. 2
shows a back-end pipeline stall condition during clock cycles t+5 through t+7. During the back-end pipeline stall, no progression of data objects were made, i.e., in each of the stages D and E, the data remained as data object
2
and data object
1
, respectively. During the clock cycles, t+6 and t+7, the data objects
3
and
4
have retired from the front-end pipeline, and could not be accepted by the back-end pipeline, and are thus stored in the decoupling buffer
103
.
A front-end pipeline stall condition is illustrated during clock cycles t+
8
through t+10. It can be seen that no data objects are exiting the front-end pipeline, yet the data objects in the back-end pipeline continues their progression uninterrupted by receiving the data objects, e.g., data objects
4
and
5
, from the decoupling buffer
103
.
Decoupling buffers are designed to have a variable size, and can be made not to effect the performances of the pipelines when the buffer is empty, i.e., by providing a direct (un-buffered) path between the pipelines, e.g., between stages C and D. For example, in
FIG. 2
, the decoupling buffer
103
is shown to haves a variable size with a ranging from empty, e.g., during clock cycles t through t+5, to a size sufficient to hold two data objects, e.g., during clock cycles t+7 to t+9.
Unfortunately, while the use of a pipeline decoupling buffer has provided a significant improvement in the overall throughput of a pipelined system, the conventional decoupling buffer described above still suffers from significant drawbacks.
In particular, a particular data object may be made available in an earlier stage of the first pipeline
101
, e.g., in stage B. The same data object may be processed by a stage in the second pipeline
102
, e.g., by stage D. However, the same data object must travel through other stages of the first pipeline, e.g., the stages B and C, to reach the stage D of the second pipeline
102
. That is, stage D ends up waiting for the data object despite the fact that it is ready to process the same. This type of data object that is operable by a stage of the second pipeline before the data object reaches the last stage of the first pipeline are hereinafter referred to as an early data. When an early data is forced to flow through the last stage of the first pipeline in order to reach the second pipeline, the pipeline system is not running at the optimum performance.
On the other hand, there may be a data object that does become available when other data objects are ready to be retired from the first pipelined
101
, i.e., available for the second pipeline
102
for processing. This type of data object is referred to herein as “late data”. That is, the term late data is defined herein as a data object that becomes available in the first pipeline later in time than when at least one other data from the first pipeline is available.
For example, in a typical pipelined system, the first pipeline
101
comprises a front-end pipeline that is responsible for fetching the instructions. The second pipeline
102
comprises a back-end pipeline that executes the instructions fetched by the front-end pipeline
101
.
While the initial stages, e.g., the stage D, of the back-end pipeline
102
may be ready to receive the instruction that is already fetched and available in a stage of the first pipeline, e.g., stage B, some other information associated with the instruction may not be available at the time the instruction reached stage B, and would only become available when the instruction finally reaches the stage C. In this situation, stage C is being provided solely to accommodate the late data, i.e., to add delay so that the instruction does not retire from the front-end pipeline before the late data is available.
For example, the instruction portion of a branch instruction may be fetched and available at stage B. The instruction can be operated upon by the second pipeline at the first stage of execution. However, the branch target of the branch instruction may not be calculated and thus is not available when the instruction is ready at the output of stage B. Thus, stage C is added as a padding to prevent the instruction from entering the back-end pipeline
102
. Moreover, the branch target may not be required during the earlier stages of the execution, e.g., in stage D, and may only be required at a later stage, e.g., at stage E.
Because stage C is fixed in place in the first pipeline
101
, all instructions (whether or not the i

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Pipeline decoupling buffer for handling early data and late... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Pipeline decoupling buffer for handling early data and late..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pipeline decoupling buffer for handling early data and late... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3006585

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.