Method and apparatus for instruction queue compression

Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Scoreboarding – reservation station – or aliasing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S226000, C712S206000

Reexamination Certificate

active

06185672

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to microprocessors and, more particularly, to compressing instruction queues that are accessed in an out-of-order fashion.
2. Description of the Relevant Art
Superscalar microprocessors are capable of attaining performance characteristics which surpass those of conventional scalar processors. Superscalar microprocessors achieve this greater performance by concurrently executing more than one instruction per clock cycle. Superscalar microprocessors use a number of different techniques to allow them to execute more than one instruction per clock cycle. Pipelining is one such technique. Pipelining refers to dividing the execution of an instruction into a number of stages. This allows multiple instructions to be overlapped during the execution process. For example, one pipeline stage may be configured to fetch instructions from the microprocessor's instruction cache. Another pipeline stage may be configured to decode the fetched instructions. Decoding typically refers to determining the boundaries of the instruction and what (if any) operands the instruction requires (e.g., source and destination operands). Additional pipeline stages may include instruction execution and instruction retiring (i.e., storing the results generated by executing the instruction). After an instruction completes the first pipeline stage, it advances to the second stage while next instruction in program order enters the first pipeline stage.
In addition to pipelining, most superscalar microprocessors are configured with multiple functional units. The functional units (also referred to as functional pipelines or execution units) are responsible for executing most instructions. For example, a superscalar microprocessor may have two or more add/subtract functional units, each configured to execute a separate instruction in parallel. Examples of these instructions may be integer operations such as addition and subtraction, logic functions such as ORs, and ANDs, and other simple operations such as shifts and compares.
In addition to pipelining and multiple functional units, many superscalar microprocessors rely upon branch prediction to further improve performance. Branch prediction attempts to prevent the functional units from stalling. Branch instructions control the flow of program execution and dictate which instructions are executed and which are not. During program execution, when a branch instruction is received, the microprocessor determines whether or not the instruction is “taken” or “not taken”. When a branch instructions is taken, the next instruction fetched is non-sequential (i.e., the next instruction is read from a destination address specified in the branch instruction). Conversely, when a branch instruction is “not taken”, the destination address in the branch instruction is ignored, and the next instruction fetched is the instruction immediately following the branch instruction. Branch prediction attempts to predict whether or not the branch instruction will be “taken” or “not taken” before the branch instruction has actually been executed.
The advantages of branch prediction are particularly evident in a pipelined microprocessor. Without branch prediction, when a branch instruction completes the first stage of an instruction processing pipeline, the microprocessor will have to wait until the branch instruction completes execution before fetching the next instruction. Thus, the first pipeline stage would sit idle (i.e., stall) while waiting for the results of the branch instruction. Branch prediction allows the first pipeline stage to fetch the predicted “target” instruction without stalling. If the prediction is incorrect, all pipeline stages are flushed and the microprocessor begins anew using by fetching the “correct” next instruction according to the results of the executed branch instruction. While branch prediction techniques vary, most achieve at least a 90% prediction accuracy rate.
Another technique used in superscalar microprocessors is out-of-order execution. Software programs consist of a number of instructions that are executed in a particular order. In some cases, if this order is changed, the functionality of the program may be changed. Turning now to
FIG. 1A
, a sample portion of a program is illustrated. As the figure illustrates, instructions are ordered to achieve a desired result (i.e., A=4, B=6, C=10, and D=9). Turning now to
FIG. 1B
, an example of out-of-order execution is shown. The instruction “D=A+5” is executed out-of-order, but the functionality of the original code segment is maintained (i.e., the same results are achieved as with the original code segment). This is possible because the instruction “D=A+5” is not dependent upon either of the instructions “B=2” or “B=A+B”. However, not all instructions are capable of out-of-order execution. Turning now to
FIG. 1C
, an example of improper out-of-order instruction execution is shown. In this example, the instruction “C=B+A” is executed out of order before instructions “B=A+B” and “D=A+5”. This changes the functionality of the original code segment (i.e., resulting with C=6 instead of C=9).
Turning now to
FIGS. 2A and 2B
, an example illustrating why out-of-order instruction execution is particularly desirable in superscalar microprocessors is shown. For simplicity, this example assumes a microprocessor having only one addition pipeline and one multiplication pipeline, with each operation taking one clock cycle to execute.
FIG. 2A
shows that the original code sequence will take four clock cycles to complete execution. In contrast, by executing the instruction “C=C*5” out-of-order, the code sequence in
FIG. 2B
takes only three clock cycles to complete. Thus, out-of-order execution may allow instructions that are “ready to execute” to bypass those that are not, thereby more efficiently utilizing the hardware resources of the microprocessor.
To effectively implement an out-of-order microprocessor, many designers have resorted to large buffers called “instruction queues” that store decoded instructions waiting to be executed. The instruction queue is searched each clock cycle to determine which instructions should be dispatched for execution. The larger the buffer, the greater the number of decoded instructions that may be stored. The greater the number of instructions that may be stored, the greater the probability of finding a set of instructions to execute in parallel (i.e., thereby preventing any functional units from stalling).
Turning now to
FIG. 3A
, a figure illustrating the functionality of an instruction queue
160
is shown. As instructions are decoded, they are stored into instruction queue
160
. During normal operation, each functional pipeline
162
-
168
is configured to receive one instruction per clock cycle. For example, add pipeline
162
may be configured to receive one add instruction per clock cycle from instruction queue
160
. Similarly, add pipeline
164
may also be configured to receive one add instruction per clock cycle, while multiply pipeline
166
may receive one multiply instruction per clock cycle, and load/store pipeline
168
may receive one load or store instruction per clock cycle.
As the figure illustrates, the instructions may be conveyed to pipelines
162
-
168
in an out-of-order fashion. For example, assuming all instructions stored in instruction queue
160
are ready for dispatch (i.e., ready to be conveyed to functional pipelines
162
-
168
), the two oldest add instructions are conveyed to add pipelines
162
and
164
. Instruction queue
160
may comprise control logic (not shown) that is configured to select the oldest instruction ready for dispatch to each functional pipeline. The control logic is also responsible for shifting the instructions remaining in instruction queue
160
after dispatch to make room for new instructions in the next clock cycle.
Si

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for instruction queue compression does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for instruction queue compression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for instruction queue compression will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2611172

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.