Electrical computers and digital processing systems: support – Synchronization of clock or timing signals – data – or pulses
Reexamination Certificate
2000-10-31
2004-02-10
Heckler, Thomas M. (Department: 2185)
Electrical computers and digital processing systems: support
Synchronization of clock or timing signals, data, or pulses
C713S400000, C712S029000, C712S224000, C714S100000
Reexamination Certificate
active
06691240
ABSTRACT:
BACKGROUND
1. Field of the Invention
The invention relates to methods and apparatus for improving the scheduling of instructions in a pipelined loop. In particular, the invention relates to methods and apparatus for implementing variable length delay instructions.
2. Description of Related Art
Generally, a software pipelined loop implements multiple loop iterations, which execute in parallel. This is accomplished by exploiting the fine-grained, instruction level parallelism (IPL) that is available across the branches of a loop. The goal of software pipelining is to maximize the number of loop iterations executing in parallel and, thereby, increase loop throughput. Software pipelining is an instruction scheduling technique. Pipelining is a method for executing instructions in an assembly-line fashion. It is a design technique for reducing the effective propagation delay per series of stages, each of which performs a portion of the operation. A series of data is typically clocked through the pipeline in sequential fashion, advancing one stage per clock period. A scheduling algorithm for use in software pipelining constructs a legal instruction schedule.
The rate at which new loop iterations are started is called the iteration interval (II). The minimum iteration interval (MII) is the lower bound on the iteration interval for which a valid schedule may be established. Modulo scheduling is a known technique for establishing software pipelined-loops. In modulo scheduling, the schedule for a single iteration is divided into a sequence of stages, each having the length of the iteration interval. In the steady state execution of the software pipelined-loop, each of the stages executes parallel.
The instruction schedule for a software-pipelined loop has at least three (3) components: a kernel, a prolog, and an epilog. The kernel is the instruction schedule that executes in the steady state. In the kernel, an instruction scheduled at the kth cycle executes in parallel with all the instructions scheduled at the kth cycle modulo iteration interval. This is known as the modulo constraint, and this constraint defines the basic feature of “modulo scheduling.” The prolog and epilog are the instruction schedules that setup, i.e., “prime” the software pipelines, and conclude the execution of the loop kernel, i.e., “drain” the software pipelines, respectively.
Referring to
FIG. 1
, an example of modulo scheduling is depicted. A, B, C, D, and E are the stages of a loop iteration. In the steady state, all five (5) stages execute in parallel. However, each executes from a different iteration, i.e., at a different stage with respect to the previous iteration. A stage in the software pipeline is completed after the cycles required for each iteration interval (II) are completed. A single iteration produces a predetermined result in the time required to complete five (5) stages. In the steady state of the software-pipelined loop, however, a result is available upon completion of each stage, i.e., after the cycles required to complete each iteration interval. This is one motivation for performing software pipelining. The minimum iteration interval required for a software pipelined-loop is determined by resource and dependency constraints.
The lifetime of any value in any such loop is the number of cycles between the placement in the schedule for the operation that defines a particular value (the “def”) and the last operation that uses that value (the “last use”). The value must be “alive,” i.e., available for recovery, until its last use. A schedule is invalidated if a value has a lifetime that is greater than iteration interval. If a value “lives” too long, i.e., its lifetime is greater than the iteration interval, an operation in a successive iteration executing in parallel with the current iteration may overwrite the value before its last use in the current iteration. Referring to
FIG. 2
, two (2) iterations are shown from the execution trace of a software-pipelined loop. Each S
i
is an operation that computes a result in one cycle. Assuming an architecture similar to the c6x architecture, available from Texas Instruments, Inc., of Dallas, Tex., a variable may be written and read on the same cycle, such that the read step logically occurs before the write step. The operations depicted in
FIG. 2
are annotated to indicate whether they define or use the variable v.
A register is a small area of a high-speed memory, located within a processor or electronic device that is used for temporarily storing information or instructions. Each register is given a name, contains a few bytes of information and is referenced by programs. Values (v1, v2, v3, . . . ) generated in each loop iteration exist sequentially in a single register.
Referring again to
FIG. 2
, however, the lifetimes of values v1 and v2 are depicted as overlapping. This overlapping causes v2 to overwrite (or kill) v1 before the last values are used. Such excessive lifetimes may occur as an artifact of the scheduling algorithm or due to data dependencies that split and then join in the data precedence graph for the loop block. The values for v1 and v2 (actually v1, v2, v3, . . . , vN; where N equals the number of loop iterations, i.e., the loop trip count) are sharing the same register. However, only one value v# may exist on the register at a time.
This problem is experienced in VLIW architectures. Moreover, this problem is experienced in architectures that implement and support instruction level parallelism and use instruction scheduling techniques, such as software pipelining, to exploit parallelism.
One technique for solving the problem of excessive lifetimes is to insert move instructions into the loop in order to write the value to another register before the value is overlapped by a successive iteration. Nevertheless, the lifetime of a register value may be so long that it may take multiple move instructions to split the register value into multiple live ranges and thereby avoid overlapping. Moreover, insertion of these additional move instructions may reduce the performance of the loop and may further complicate the scheduling and register allocation processes. Another technique is modulo variable expansion. In this technique, the kernel is unrolled, and values with overlapping lifetimes are renamed.
Pipelined loops may be especially important for the efficient and effective programming of digital signal processors (DSPs), such as those used extensively in the telecommunications industry. In DSPs, operating time and information storage are critical constraints on programming. In particular, information storage in registers and memory is extremely limited, and allocation of these resources limits the number of operations, which may be performed simultaneously. Further, the need to store information to and to retrieve information memory imposes significant time constraints on DSP operation. The time which a DSP spends “waiting” for the retrieval of information from memory limits the speed with which the DSP may complete operations and, therefore, may limit the total number of operations which the DSP may execute within a given period of time.
SUMMARY OF THE INVENTION
Thus, a need has arisen for a method for eliminating, reducing, or compensating for the problem of excessive and overlapping value lifetimes by writing information (values) to another register before the information (values) is (are) overwritten by a successive iteration. Further, a need has arisen for a method of writing information one register to another register with a variable delay operand. Further, a need has arisen for a method and apparatus for delaying the use of information without storing it to memory and retrieving it or holding it for a prolonged period within a single register.
An illustrative embodiment of the present invention seeks to provide a microprocessor, and method for operating a microprocessor, that improves digital signal processing performance. Aspects of the invention are specified in the claims. In an embodiment of the present invention, a
Hoyle David
Stotzer Eric J.
Zbiciak Joseph
Brady III W. James
Heckler Thomas M.
Marshall, Jr. Robert D.
Patel Nitin C
Telecky , Jr. Frederick J.
LandOfFree
System and method of implementing variabe length delay... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method of implementing variabe length delay..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method of implementing variabe length delay... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3281346