Method for compacting an instruction queue

Electrical computers and digital processing systems: processing – Instruction issuing – Simultaneous issuance of multiple instructions

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S217000, C712S218000, C712S023000

Reexamination Certificate

active

06704856

ABSTRACT:

BACKGROUND OF THE INVENTION
An instruction queue is typically a random-access storage array which holds instructions between the time they are fetched from memory and when they are issued to an execution unit. The queue is typically structured as a set of rows, each of which holds one instruction.
In many modern microprocessors, instructions issue from the instruction queue out-of-order, with instruction prioritization managed with pointers to the oldest and newest instructions in the queue. The concept of out-of-order execution is also called “dynamic execution” or “dynamic scheduling”. The queue structure itself may also be called an “instruction buffer”, “re-order buffer”, or “scoreboard”.
In some CPUs, for example, the instruction queue is called a “Re-order Buffer.” There are two buffers, one for ALU instructions and one for memory operations, each containing twenty-eight entries. Instructions remain in a buffer from the time they are fetched until they are retired, and are not removed at issue time. Instructions are inserted into a queue in a round-robin fashion based on the “newest” instruction pointer. Issue arbitration starts at the oldest instruction pointer, looking for the first data-ready instruction. Because instructions remain in the buffer during their life and retire inorder, empty rows are not produced within the buffer.
Other instruction queue architectures, sometimes called re-order buffers, appear to hold twenty-four instructions through similar execute and retirement operations.
Other out-of-order issue machines with a 16-entry or larger re-order buffer track the status of each in-flight instruction, and twelve integer and eight floating-point “rename buffers” assign instructions to execution units. Each execution unit has a “reservation station,” that is, an instruction buffer dedicated to an execution unit from which data-ready instructions are issued.
SUMMARY OF THE INVENTION
In each cycle of a computer system encompassing the present invention, any combination of up to four instructions can be issued from the queue, and up to four new instructions can enter the queue. Instructions are removed from the queue two cycles after they are issued, creating empty queue rows. New instructions can enter the queue only when there are a sufficient number of empty rows in which to place the instructions.
While in the queue, instructions are prioritized to prevent deadlock by insuring all instructions are issued from the queue in a finite amount of time, and to meet performance goals by issuing oldest instructions first.
A preferred embodiment of the present invention compacts older instructions toward the bottom of the queue each cycle while maintaining their original order, using an update logic circuit which generates control signals to perform the compaction. This creates room at the top of the queue where new instructions enter. This greatly simplifies the issue prioritization process, allowing the use of fast, simple arbitration circuits.
One system employing the present invention has two instruction queues: an integer queue with twenty entries, or rows, and a floating point queue with fifteen entries. The bottom row is numbered row
0
, and the top row is
19
for the integer queue and
14
for the floating-point queue. Operation of the queues is similar, thus attention is focused primarily on the integer queue.
Pointers are not used. Instructions in the queue are ordered, from the bottom to the top, in the relative order in which they entered the queue. Instructions are removed from the queue when they are issued, i.e., sent to functional units for execution.
Because instructions are issued out-of-order, removal of instructions from the queue leaves empty rows, marked as invalid, scattered throughout the queue. The remaining instructions are physically compacted in the queue toward the bottom, i.e., row
0
, each cycle. This leaves empty queue rows toward the top of the queue, where they are filled with instructions entering in subsequent cycles. This operation is preferably simplified by moving instructions at most four rows lower each cycle. Since no more than four instructions enter the queue each cycle, maximum input bandwidth is guaranteed.
Instructions are moved through the queue via multiplexors associated with each queue row. Each multiplexor has five data inputs. For row N, these inputs correspond to the contents of rows N through N+4. An instruction in row N+2 is moved to row N by asserting the “N+2” multiplexor select signal for the multiplexor associated with row N. The Update Logic circuit generates each row's multiplexor selects to control the compaction of the queue.
Maintaining instructions in-order from bottom to top of the queue eliminates the use of pointers to track oldest
ewest queue instructions. This greatly simplifies the issue prioritization process, allowing the use of fast, simple arbitration circuits.
In a fast computer system, e.g., one having a frequency of 600 MHZ, simplifying the arbitration stage, i.e., the primary critical path in the issue logic, is essential to meet performance goals. Adding an extra stage of logic to the issue signal critical path to prioritize instructions based on pointers would mean running at a much slower cycle time, reducing the performance of the entire machine.
Accordingly, a method of compacting an instruction queue in an out of order processor, comprises determining the number of invalid instructions below and including each row in the queue, by counting invalid bits or validity indicators associated with rows below and up to the current row.
For each row, a select value is determined from the previously determined counts for the N rows above and including the present row, and from the validity indicators associated with the N rows, where N is a predetermined value. A multiplexor associated with a particular row selects one of the N rows according to the select value, and moves or passes the instruction held in the selected row to the present row.
Where a maximum of N new instructions can enter the queue during any given cycle, it is sufficient to limit each count to N.
Preferably, each count is a flat vector, where each position in the vector indicates a different number of valid instructions up to the present row, and in which only one bit is set at any time.
A row's select value is preferably determined by forming a diagonal from the N counts corresponding to the N rows above and including the present row, and logically ANDing, or masking, each diagonal bit with the valid bit associated with the same row.
Preferably, only valid queue instructions are moved.
In a preferred implementation, however, validity indicators must be moved regardless of the validity indicator. Thus, for each row, an additional modified select value is determined, similar to the select value already determined. However, the most significant bit is not masked, and is derived from a modified diagonal. A second multiplexor associated with each row moves a valid bit from a row indicated by the modified select value to the validity indicator storage location associated with the present row.
Preferably, each row's count is determined in two stages. In the first stage, a local count is determined for each row in a local group of rows, and a global count is determined for the entire local group. Each local count is determined by counting the validity indicators associated with rows in the local group. In the second stage, a final count is determined for each row in the queue, by combining the local and global counts generated for the local group in the first stage, with global counts generated in local groups below the local group.
Preferably, the N rows can extend to the queue's input pipeline.


REFERENCES:
patent: 4847755 (1989-07-01), Morrison et al.
patent: 5155843 (1992-10-01), Stamm et al.
patent: 5627983 (1997-05-01), Popescu et al.
patent: 5822559 (1998-10-01), Narayan et al.
patent: 5870578 (1999-02-01), Mahalingaiah et al.
patent: 5872946 (1999-02-01), Narayan et al.
patent: 6112

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for compacting an instruction queue does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for compacting an instruction queue, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for compacting an instruction queue will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3229020

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.