Method and apparatus that supports multiple assignment code

Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Reducing an impact of a stall or pipeline bubble

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S216000, C712S228000, C712S241000, C712S244000

Reexamination Certificate

active

06360315

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of computers, and more specifically to techniques for pipeline control in processors.
2. Description of the Related Art
The ever-growing requirement for high performance computers demands that computer hardware architecture maximize software performance. One technique for improving computer performance is to increase the number of instructions executed per clock cycle by the processor (“processor throughput”). Processor throughput may be increased by pipelining, where the processor is divided into separate processing stages (collectively termed a “pipeline”). Whereas early computer processors executed only one instruction at a time, pipelined designs allow a new instruction to begin execution before a previous instruction is finished executing, increasing the rate at which instructions can be executed.
Two common types of pipeline operand management that can be implemented in a processor are “protected” and “unprotected”. They each have an effect on the portability of code and implementation difficulty. Historically, general-purpose microprocessors have implemented protected pipelines, because the original microprocessors were not pipelined and executed code serially. For code compatibility, later designs were required to maintain this “serial execution model” and implement protected pipelines. Digital signal processors, on the other hand, generally did not migrate code from one generation to the next, so they tend to implement unprotected pipelines.
Software for protected pipelines is written as if each instruction is executed in its entirety before the next one is started. This means that hardware must detect dependencies between instructions in order to correctly execute them in parallel. Interrupts (e.g., page faults or divide-by-zero) and unknown latencies (e.g., cache misses or data-dependent instruction timing) are handled through hardware interlocks that track register targets and register sources. In this scheme, once an instruction is issued, it can write its results at anytime thereafter, because all subsequent instructions expect to see its result, and no subsequent instructions expect to see the prior value of that result register.
The unprotected pipeline approach moves complexity out of the hardware and into the software by requiring that the code understand how instructions are executed in parallel and how long each instruction takes to produce a result, and also that the code handle interrupts and unknown latencies. Thus, rather than tracking dependencies between instructions, the hardware simply assumes there are not any. This saves a significant amount of hardware and exposes the operation of the pipeline to the programmer.
Efficient software can be written for an unprotected pipeline. If a register-writing instruction requires more than one clock cycle to complete, a register may be read after a write to it is initiated, but before that write actually occurs. Since the hardware is not checking dependencies between instructions as in protected pipelines, it can intentionally read “stale” values from the register file before they are updated. The ability to read “stale” values provides two benefits. First, the program gets to see a slightly larger register set for a short period of time, and second, the scheduling of operands in loops can be made simpler (which reduces code size). Multiple Assignment Code is code that relies on reading these “stale” values from registers.
Multiple Assignment Code
Multiple assignment code can increase the apparent size of a register file and decrease the size of iterative code loops. The following example in TABLE 1 shows a typical case where a value is loaded into register R
0
and then added to another value, contained in register R
1
. (In all of the following examples, the LOAD instructions require a two cycle latency period to write their result into the register file, and all other instructions require one cycle to write their result.)
TABLE 1
LOAD
(address)
−> R0;
cycle 1
ADD
R0, R1
−> R2;
cycle 2
FIG. 1A
illustrates a protected pipeline stalling an instruction until a particular value is available. A protected pipeline would detect that register R
0
is not available until cycle
3
(because of the latency) and stall the ADD until it is available
70
.
FIG. 1B
illustrates an unprotected pipeline using a “stale value”. An unprotected pipeline relies on the code itself to properly schedule operand accesses. In this case, because there is no cycle (caused by a non-operation or other instruction) between the LOAD and ADD, the value of R
0
before the end of the latency period, the “stale value”, will be used by the ADD instruction
72
.
A protected pipeline experiences inefficiency by introducing stalls, and an unprotected pipeline experiences inefficiency by including non-operation instructions. If an unprotected pipeline is to appear to operate as a protected pipeline does, it must assume outstanding register results make registers “off limits” until the result has been written. Accordingly, any LOAD requires one register to be allocated the following cycle simply to keep track of where the result is written; for loads with more than one additional cycle of latency and multiple pipelined loads, more than one such register is required. This is not a big problem with two cycle latency loads, but if the latency increases, then so does the number of registers required.
LOAD instructions on the Texas Instruments TMS320C62XX family of processors require five cycles of latency, or at most four registers for scheduling the return of these results. With a register file containing 32 entries, the incremental advantage is to make the register file appear to be about 13 percent larger (four more registers). Future designs, moreover, are likely to increase operation latencies, and not just for LOAD instructions. It is probable that all operations will require more than one cycle of latency, and LOADs could require as many as a dozen. It should be obvious that, without support for a multiple assignment code, a register file would have to be unreasonably large just to schedule operands. Accordingly, what is needed is an apparatus and method that supports multiple assignment code to control a processor pipeline and allows for pipeline stalls and interruptions.
SUMMARY
The present invention is an apparatus and method that supports multiple assignment code, comprising a register that may be assigned multiple values, instructions that receive a value and dispatch a result, a stall queue that stores the results of an instruction during a pipeline stall, and an interrupt queue that stores the state of the register when an interrupt is taken. The value assigned to a register may be available for processing only after a latency period has passed. The value received by an instruction from a register is the most recent value assigned to the register for which the latency period has passed. The result of an instruction that requires an N unit latency period appears in a register file before the Nth following group of instructions are dispatched, and after the group of instructions immediately preceding the present one. The stall queue eliminates the need for global stalls in the context of various pipeline implementations. The interrupt queue allows for interruptibility.
The present invention further comprises a mode where the software identifies that it does not intend to exploit multiple assignment code.


REFERENCES:
patent: 4970641 (1990-11-01), Hester et al.
patent: 5201057 (1993-04-01), Uht
patent: 5325495 (1994-06-01), McLellan
patent: 5588113 (1996-12-01), Johnson
patent: 5682492 (1997-10-01), McFarland et al.
patent: 6035122 (2000-03-01), Ando

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus that supports multiple assignment code does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus that supports multiple assignment code, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus that supports multiple assignment code will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2854361

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.