Mechanism for software pipelining loop nests

Data processing: software development – installation – and managem – Software program development tool – Programming language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S150000

Reexamination Certificate

active

06820250

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates to mechanisms for optimizing computer code, and in particular, to mechanisms for improving the performance of software pipelined loops.
2. Background Art
Software pipelining is a method for scheduling non-dependent instructions from different logical iterations of a program loop to execute concurrently. Overlapping instructions from different logical iterations of the loop increases the amount of instruction level parallelism (ILP) in the program code. Code having high levels of ILP uses the execution resources available on modern, superscalar processors more effectively.
A loop is software pipelined by organizing the instructions of the loop body into stages of one or more instructions each. These stages form a software pipeline having a pipeline depth equal to the number of stages (the “stage count” or “SC”) of the loop body. The instructions for a given loop iteration enter the software pipeline stage by stage, on successive initiation intervals (II), and new loop iterations begin on successive initiation intervals until all iterations of the loop have been started. Each loop iteration is thus processed in stages through the software pipeline in much the same way that an instruction is processed in stages through a processor pipeline. When the software pipeline is full, stages from SC sequential loop iterations are in process concurrently, and one loop iteration completes every initiation interval. Various methods for software pipelining loops are discussed, for example, in B.R. Rau, M.S. Schlansker, P.P. Tirumalai, Code Generation Schema for Modulo Scheduled Loops IEEE MICRO Conference 1992 (Portland, Oregon).
FIG. 1A
is a schematic representation of a software pipeline as it processes an exemplary loop through 100 iterations (trip count=100). The instructions of the loop are organized into five stages, labeled A through E (SC=5), and each stage is indexed by the logical iteration to which it corresponds. For example, A(
1
) is the first iteration of the loop body instruction(s) in stage A, and D(
98
) is the 98
th
iteration of the loop body instruction(s) in stage D. The software pipeline fills during a prolog phase
120
, as stages from the first iterations of the loop begin on successive initiation intervals, II(
1
)-II(
4
). The software pipeline is full during a kernel phase
130
from II(5) through II(100), and it empties during an epilog phase
140
, as the last four iterations complete during II(
101
)-II(
104
).
In practice, the different stages of the loop body do not physically traverse a pipeline, as indicated in FIG.
1
A. For example, a software pipelined loop that is implemented through predication provides all stages of the loop on each initiation interval. The pipeline filling/emptying behavior is effected by predicates associated with the stages (“stage predicates”).
FIG. 1B
represents the software pipeline of
FIG. 1A
in compact, predicated form. Here, stages A through E are associated with predicates p1 through p5, respectively. Setting the predicates p1 through p5 true on successive initiation intervals activates the associated stages and replicates the pipeline filling behavior of prolog phase
120
. Similarly, setting predicates p1 through p5 false on successive initiation intervals, deactivates the associated stages and replicates the pipeline emptying behavior of epilog phase
140
. During kernel phase
130
, all predicates are true, and an instruction stage from each of five sequential loop iterations is active.
Predication thus simplifies the scheduling issues associated with software pipelined loops. All stages of the software pipeline are present in each initiation interval. Only instructions in those stages that are activated by their associated stage predicates update the architectural state of the processor (“processor state”) when they retire. Instructions in stages that are deactivated by their associated stage predicates are treated as no-operations (NOPs), and any results they generate do not update the processor state.
The dynamics of a software pipelined loop depend, in part, on the relative sizes of its trip count (TC) and its stage count (SC). A kernel phase, in which a new iteration begins and one iteration ends on each iteration interval, is present only for loops in which the trip count is greater than the stage count. For loops having relatively short trip counts, e.g. those in which the number of iterations is less than SC, the software pipeline never fills because there are more stages than there are loop iterations to fill the stages. Consequently, a short trip count loop does not have a kernel phase.
Independent of the number of iterations and stages, prolog and epilog phases
120
,
140
, respectively, represent overhead associated with filling and emptying the software pipelined loop. The empty stages in
FIG. 1A
during prolog and epilog phases
120
and
140
, respectively, or the deactivated stages in
FIG. 1B
, represent unused processor resources. These unused resources can lead to significant efficiency losses when the software pipelined loop is nested within an outer loop, because the overhead is incurred on each iteration of the outer loop.
FIG. 1C
represents a nested loop
170
that includes the software pipelined loop of
FIG. 1A
within a counted outer loop.
FIG. 1D
represents a nested loop
180
that includes a short trip count loop within a counted outer loop. In both cases, overhead associated with inner loop prolog and epilog phases
120
and
140
, respectively, is incurred on each iteration of the outer loop. Between each iteration of the inner loop, the outer loop implements any other instructions in its loop body, including those necessary to set up the inner loop for execution.
The present invention addresses these and other inefficiencies related to implementing nested loops using software pipelining techniques.
SUMMARY OF THE INVENTION
The present invention supports efficient processing of nested loops that include a software pipelined inner loop within an outer loop. The inner loop is executed without draining its software pipeline until the final iteration of the outer loop is reached. This eliminates the overhead associated with prolog and epilog stages of the inner loop for all but the first and last iterations, respectively, of the outer loop.
In accordance with the present invention, the software pipelined inner loop of the nested loop is executed in a “no-drain” mode for the first N−1 iterations of an N-iteration outer loop. For the last iteration of the outer loop, the software pipelined inner loop is executed in a “drain” mode.
For one embodiment of the present invention, no-drain mode is established by initializing an epilog counter for the inner loop to a first value for the first N−1 iterations of the outer loop. When the final iteration of the outer loop is detected, the epilog counter for the inner loop is set to a second value. The first value indicates that the pipeline should not be drained. The second value indicates the actual number of stages in the software pipelined inner loop to be drained. For other embodiments, the values written to the epilog counter for the drain and no-drain modes may be adjusted to account for instruction dependencies between the inner and outer loops.


REFERENCES:
patent: 5230053 (1993-07-01), Zaiki
patent: 5361354 (1994-11-01), Greyzck
patent: 5704053 (1997-12-01), Santhanam
patent: 5774370 (1998-06-01), Giomi
patent: 5790859 (1998-08-01), Sarkar
patent: 5842022 (1998-11-01), Nakahira et al.
patent: 5852734 (1998-12-01), Komatsu et al.
patent: 5901318 (1999-05-01), Hsu
patent: 5920724 (1999-07-01), Chang
patent: 5950007 (1999-09-01), Nishiyama et al.
patent: 5958048 (1999-09-01), Babaian et al.
patent: 5974538 (1999-10-01), Wilmot, II
patent: 6016399 (2000-01-01), Chang
patent: 6055627 (2000-04-01), Kyushima et al.
patent: 6070011 (2000-05-01), Liu et al.
patent: 6074433 (2000-06-01), Haraguchi et al.
patent: 6148437 (2000-11-01), Shah et al.
pa

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Mechanism for software pipelining loop nests does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Mechanism for software pipelining loop nests, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Mechanism for software pipelining loop nests will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3346216

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.