Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1998-12-15
2001-01-23
Treat, William M. (Department: 2783)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S244000
Reexamination Certificate
active
06178499
ABSTRACT:
FIELD OF THE INVENTION
The present invention pertains generally to the operation of multiple execution unit microprocessors, and pertains more particularly to methods usable on such microprocessors for exiting and reentering software pipelined loops in response to runtime conditions.
BACKGROUND OF THE INVENTION
A microprocessor is a circuit that combines the instruction-handling, arithmetic, and logical operations of a computer on a single chip. A digital signal processor (DSP) is a microprocessor optimized to handle large volumes of data efficiently. Such processors are central to the operation of many of today's electronic products, such as high-speed modems, high-density disk drives, digital cellular phones, and complex automotive systems, and will enable a wide variety of other digital systems in the future. The demands placed upon DSPs in these environments continue to grow as consumers seek increased performance from their digital products.
Designers have succeeded in increasing the performance of DSPs generally by increasing clock frequencies, by removing architectural bottlenecks in DSP circuit design, by incorporating multiple execution units on a single processor circuit, and by developing optimizing compilers that schedule operations to be executed by the processor in an efficient manner. As further increases in clock frequency become more difficult to achieve, designers have embraced the multiple execution unit processor as a means of achieving enhanced DSP performance. For example,
FIG. 1
shows a block diagram of a DSP having eight execution units, L
1
, S
1
, M
1
, D
1
, L
2
, S
2
, M
2
, and D
2
. These execution units operate in parallel to perform multiple operations, such as addition, multiplication, addressing, logic functions, and data storage and retrieval, simultaneously.
Theoretically, the performance of a multiple execution unit processor is proportional to the number of execution units available. However, utilization of this performance advantage depends on the efficient scheduling of operations so that most of the execution units have a task to perform each clock cycle. Efficient scheduling is particularly important for looped instructions, since in a typical runtime application the processor will spend the majority of its time in loop execution.
One effective way in which looped instructions can be arranged to take advantage of multiple execution units is with a software pipelined loop. In a conventional scalar loop, all instructions execute for a single iteration before any instructions execute for following iterations. In a software pipelined loop, the order of operations is rescheduled such that one or more iterations of the original loop begin execution before the preceding iteration has finished. Referring to FIG.
2
a
, a simple loop containing
7
iterations of the operations A, B, and C is shown. FIG.
2
b
depicts an alternative execution schedule for the loop of FIG.
2
a
, where a new iteration of the original loop is begun each clock cycle. For clock cycles I
3
-I
7
, the same instruction (A
n
,B
n−1
,C
n−2
) is executed each clock cycle in this schedule; if multiple execution units are available to execute these operations in parallel, the code can be restructured to perform this repeated instruction in a loop. The repeating pattern of A,B,C (along with loop control operations) thus forms the loop kernel of a new, software pipelined loop that executes the instructions at clock cycles I
3
-I
7
in
5
loops. FIG.
2
c
depicts such a loop. The instructions executed at clock cycles I
1
and I
2
of FIG.
2
b
must still be executed first in order to properly “fill” the software pipelined loop; these instructions are referred to as the loop prolog. Likewise, the instructions executed at clock cycles I
8
and I
9
of FIG.
2
b
must still be executed in order to properly “drain” the software pipeline; these instructions are referred to as the loop epilog (note that in many situations the loop epilog may be deleted through a technique known as speculative execution).
The simple example of FIGS.
2
a-
2
c
illustrates the basic principles of software pipelining, but other considerations such as dependencies and conflicts may constrain a particular scheduling solution. For an explanation of software pipelining in more detail, see Vicki H. Allan,
Software Pipelining,
27 ACM Computing Surveys 367 (1995).
Another technique commonly used with multiple execution unit processors and software pipelined loops is multiple assignment of registers. Referring again to
FIG. 1
, registers A
0
-A
15
and B
0
-B
15
are connected to execution units L
1
, S
1
, M
1
, D
1
, L
2
, S
2
, M
2
, and D
2
. These registers are used to store, e.g., operands, results of operations, counter values, and conditional values during execution. Typically, single assignment of such registers is preferred; in single assignment, once an execution unit utilizes a register in an operation, that register may not be re-used until the original operation completes. With multiple assignment of registers, however, registers may be re-used according to known hardware delays. For example, a first operation may begin execution on unit D
1
to load a value into register A
4
from memory, an operation that requires five clock cycles to complete (if an operation requires more than one clock cycle to complete, the additional clock cycles are typically referred to as delay slots). With single assignment, A
4
could not be used for any other purpose during those five clock cycles, although the value loaded from memory will not actually appear in register A
4
until the end of the fifth cycle. With multiple assignment, A
4
could be used for other purposes during the delay slots of the load from memory operation, as long as the register is freed prior to completion of the load operation.
Multiple assignment of registers is often desirable in pipelined loops, where the overlapping of many multiple-clock cycle instructions such as loads, multiplies, and branches is required. Single assignment of registers under these conditions may require that parallelism be decreased, as some schedules will be impossible to implement because of the large number of registers required by single assignment. Multiple assignment will often result in reduced register pressure and increased parallelism under these conditions. The main drawback of multiple assignment is that it requires that the instructions execute in an expected order—if an unexpected order of execution is encountered, there is a high possibility that register data will be corrupted.
SUMMARY OF THE INVENTION
In microprocessor operation, it is often desirable to allow for prompt response to the occurrence of runtime conditions (e.g. interrupts, data conditions, or processing irregularities). Unfortunately, because of the data corruption condition that may result from unexpected execution order, responses to runtime conditions such as interrupts must generally be disabled around code blocks utilizing multiple assignment of registers. It has now been recognized that if such a code block is a software pipelined loop requiring a large number of iterations to complete, the time delay in responding to runtime conditions due to disabling around such a code block may be unacceptable.
The prior art has attempted to solve this problem by using software pipelined loops with single assignment of registers. Although this approach does allow a microprocessor to respond relatively quickly to runtime conditions such as interrupts, it is recognized herein that this approach often results in suboptimal scheduling for a software pipelined loop itself. Since performance of a software pipelined loop may be as critical as interruptability, the single register assignment approach may also be unacceptable.
The prior art has also attempted to solve this problem by adding state-saving hardware to a processor architecture. This specialized hardware saves the entire operating state of the processor before responding to an interrupt, and restores this state up
Scales Richard H.
Stotzer Eric
Brady III W. James
Marshall, Jr. Robert D.
Telecky , Jr. Frederick J.
Texas Instruments Incorporated
Treat William M.
LandOfFree
Interruptable multiple execution unit processing during... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Interruptable multiple execution unit processing during..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Interruptable multiple execution unit processing during... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2470371