Changing instruction order by reassigning only tags in order...

Electrical computers and digital processing systems: processing – Instruction issuing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S215000, C712S241000

Reexamination Certificate

active

06813704

ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
The present invention is directed, in general, to digital signal processors (DSPs) and, more specifically, to a instruction queue for executing and retiring instructions in a DSP.
BACKGROUND OF THE INVENTION
Over the last several years, DSPs have become an important tool, particularly in the real-time modification of signal streams. They have found use in all manner of electronic devices and will continue to grow in power and popularity.
As time has passed, greater performance has been demanded of DSPs. In most cases, performance increases are realized by increases in speed. One approach to improve DSP performance is to increase the rate of the clock that drives the DSP. As the clock rate increases, however, the DSP's power consumption and temperature also increase. Increased power consumption is expensive, and intolerable in battery-powered applications. Further, high circuit temperatures may damage the DSP. The DSP clock rate may not increase beyond a threshold physical speed at which signals may traverse the DSP. Simply stated, there is a practical maximum to the clock rate that is acceptable to conventional DSPS.
An alternate approach to improve DSP performance is to increase the number of instructions executed per clock cycle by the DSP (“DSP throughput”). One technique for increasing DSP throughput is pipelining, which calls for the DSP to be divided into separate processing stages (collectively termed a “pipeline”). Instructions are processed in an “assembly line” fashion in the processing stages. Each processing stage is optimized to perform a particular processing function, thereby causing the DSP as a whole to become faster.
“Superpipelining” extends the pipelining concept further by allowing the simultaneous processing of multiple instructions in the pipeline. Consider, as an example, a DSP in which each instruction executes in six stages, each stage requiring a single clock cycle to perform its function. Six separate instructions can therefore be processed concurrently in the pipeline; i.e., the processing of one instruction is completed during each clock cycle. The instruction throughput of an n-stage pipelined architecture is therefore, in theory, n times greater than the throughput of a non-pipelined architecture capable of completing only one instruction every n clock cycles.
Another technique for increasing overall DSP speed is “superscalar” processing. Superscalar processing calls for multiple instructions to be processed per clock cycle. Assuming that instructions are independent of one another (the execution of each instruction does not depend upon the execution of any other instruction), DSP throughput is increased in proportion to the number of instructions processed per clock cycle (“degree of scalability”). If, for example, a particular DSP architecture is superscalar to degree three (i.e., three instructions are processed during each clock cycle), the instruction throughput of the DSP is theoretically tripled.
These techniques are not mutually exclusive; DSPs may be both superpipelined and superscalar. However, operation of such DSPs in practice is often far from ideal, as instructions tend to depend upon one another and are also often not executed efficiently within the pipeline stages. In actual operation, instructions often require varying amounts of DSP resources, creating interruptions (“bubbles” or “stalls”) in the flow of instructions through the pipeline. Consequently, while superpipelining and superscalar techniques do increase throughput, the actual throughput of the DSP ultimately depends upon the particular instructions processed during a given period of time and the particular implementation of the DSP's architecture.
The speed at which a DSP can perform a desired task is also a function of the number of instructions required to code the task. A DSP may require one or many clock cycles to execute a particular instruction. Thus, in order to enhance the speed at which a DSP can perform a desired task, both the number of instructions used to code the task as well as the number of clock cycles required to execute each instruction should be minimized.
Among superscalar DSPs, some execute instructions in order (so-called “in-order issue” DSPs). In such DSPS, each instruction is written into the slots of a register within an instruction queue of an instruction logic circuit and marked with a “tag” to identify the order of the instructions. Typically, such tags are numerically arranged to specify only the order that the instructions are written into the registers, and not the order of execution of the instructions. At each clock cycle, one or more instructions within the registers are executed (“grouped”) in accordance with grouping rules embedded in the DSP. After being grouped, if an instruction is no longer needed, it is simply overwritten (“retired”).
Unfortunately, in even the most advanced DSPs found in the prior art, the re-ordering of instructions within the registers of an instruction logic circuit suffers from significant problems. If some instructions within the instruction register are grouped and retired in a given clock cycle in an order that differs from the order in which the instructions were originally written into the register, the remaining instructions are re-ordered within the individual slots of the register.
To illustrate this point, if four instructions are written into four consecutive slots, the instructions are conventionally identified by four consecutive tags associated with the slots. If only the first and third instructions are grouped and retired in a first clock cycle, the second and fourth instructions are re-ordered, or “shifted,” within the slots. More specifically, the second and fourth instructions are shifted into the first and second slots, and are thus associated with the first and second tags. Those skilled in the art understand that, since each instruction may comprise a number of data bits, shifting remaining instructions from slot to slot within a register so that they are associated with appropriate tags requires shifting a relatively large number of bits after each clock cycle.
The shifting of such large numbers of bits after each clock cycle typically leads to routing congestion within the instruction queue. In addition, shifting a large number of bits may also result in other routing problems, due primarily to a combination of the complexity of the routing circuit employed and the number of bits shifted. Of course, such routing congestion and other problems typically leads to undesired timing delay within the DSP.
Accordingly, what is needed in the art is an instruction queue for a DSP or other processor that consumes less power than those found in the prior art.
SUMMARY OF THE INVENTION
To address the above-discussed deficiencies of the prior art, the present invention provides for use in an instruction queue having a plurality of instruction slots, a mechanism for queueing and retiring instructions. In one embodiment, the mechanism includes a plurality of tag fields corresponding to the plurality of instruction slots, and control logic, coupled to the tag fields, that assigns tags to the tag fields to denote an order of instructions in the instruction slots. In addition, the mechanism includes a tag multiplexer, coupled to the control logic, that changes the order by reassigning only the tags.
In one embodiment of the present invention, the mechanism further includes loop detection logic, coupled to the control logic, that prevents ones of the instructions that are in a loop from being retired. In addition, the loop detection logic may prevent multiple instructions that are in a loop from being retired.
In one embodiment of the present invention, the mechanism further includes an input ordering multiplexer coupled to the control logic and to the plurality of instruction slots and configured to write the instructions into the plurality of instruction slots. In a related embodiment, the mechanism further includes an output ordering multiplexer coupled to

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Changing instruction order by reassigning only tags in order... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Changing instruction order by reassigning only tags in order..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Changing instruction order by reassigning only tags in order... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3340477

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.