Method and circuits for early detection of a full queue

Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Scoreboarding – reservation station – or aliasing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S219000, C712S245000, C714S021000

Reexamination Certificate

active

06542987

ABSTRACT:

BACKGROUND OF THE INVENTION
An instruction queue is typically a random-access storage array which holds instructions between the time they are fetched from memory and when they are issued to an execution unit. The queue is typically structured as a set of rows, each of which holds one instruction.
In many modern microprocessors, instructions issue from the instruction queue out-of-order, with instruction prioritization managed with pointers to the oldest and newest instructions in the queue. The concept of out-of-order execution is also called “dynamic execution” or “dynamic scheduling”. The queue structure itself may also be called an “instruction buffer”, “re-order buffer”, or “scoreboard”.
In some CPUs, for example, the instruction queue is called a “Re-order Buffer.” There are two buffers, one for ALU instructions and one for memory operations, each containing twenty-eight entries. Instructions remain in a buffer from the time they are fetched until they are retired, and are not removed at issue time. Instructions are inserted into a queue in a round-robin fashion based on the “newest” instruction pointer.
Other instruction queue architectures, sometimes called re-order buffers, appear to hold twenty-four instructions through similar execute and retirement operations.
Other out-of-order issue machines with a 16-entry or larger re-order buffer track the status of each in-flight instruction, and twelve integer and eight floating-point “rename buffers” assign instructions to execution units. Each execution unit has a “reservation station,” that is, an instruction buffer dedicated to an execution unit from which data-ready instructions are issued.
SUMMARY OF THE INVENTION
In processors with out-of-order instruction queues, the instruction input pipeline must stall when the queue is full. Traditional queue-full mechanisms rely on a threshold indicator which asserts a pipeline stall signal when the number of instructions in the queue reaches a fixed level. This signal is typically based on pointer comparisons, and the fixed threshold must be reached before the queue is determined to be completely filled, thus wasting queue space.
The present invention does not use pointers to manage queue allocation and prioritization, and hence, the above scheme is not generally applicable. Instead, a pipelined detection scheme is used which first counts the number of free queue slots, and then modifies this count based on queue events which free or allocate queue space each cycle. The resulting free-entry count is then used in one of two ways.
In a floating point queue, the queue-free-slot count is compared to the number of instructions entering that cycle. Pipeline stall is asserted if there are not enough free slots available for the entire entering instruction block.
In an integer queue, a simpler, faster scheme is used which stalls the pipeline if there are less than a predetermined number of free queue entries in any cycle, regardless of the number of instructions in the enqueue stage that cycle.
At least one computer system employing the present invention has a 20-row integer queue and a 15-row floating-point queue. Each cycle, up to four instructions can be issued from the integer queue, and up to four new instructions can enter the queue. Up to two instructions can be issued from and can enter the floating-point queue. Instructions are removed from the queues two cycles after they are issued, creating empty queue rows. New instructions can enter the queue only when there are a sufficient number of empty rows in which to place the instructions. If there are not a sufficient number of empty rows, the input pipeline is stalled.
Instructions in the queue are prioritized to ensure that all instructions are issued from the queue in a finite amount of time, thus preventing deadlock as well as meeting performance goals by issuing oldest instructions first.
In a preferred embodiment, older instructions in a queue are compacted toward the bottom of the queue each cycle, while their original order is maintained. An update logic circuit generates control signals to perform the compaction. Compaction creates room at the top of the queue where new instructions enter. Maintaining instructions in-order from the bottom to the top of the queue eliminates the need for pointers to track oldest
ewest queue instructions, and greatly simplifies the issue prioritization process, allowing the use of fast, simple arbitration circuits.
Because instructions are issued out-of-order, removal of instructions from the queue leaves empty, or invalid, rows scattered throughout the queue. The remaining, i.e., valid, instructions are physically compacted in the queue toward the bottom each cycle. This leaves empty queue rows toward the top of the queue, where they are filled with instructions entering in subsequent cycles.
This operation can be simplified by moving instructions at most a predetermined number of rows lower each cycle. For example, since no more than four instructions enter the integer instruction queue each cycle, maximum input bandwidth is guaranteed if the predetermined number is four.
Instructions are moved through the queue via multiplexors associated with each queue row. In the integer queue, each multiplexor has five data inputs. For queue row N, the inputs correspond to the contents of rows N through N+4. An instruction in row N+2 is moved to row N by asserting the “N+2” multiplexor select signal. An update logic circuit generates each row's multiplexor selects to control the compaction of the queue.
In a fast computer system, e.g., one having a clock frequency on the order of 600 MHz, simplifying the arbitration stage, i.e., the primary critical path in the issue logic, is essential to meeting performance goals. Adding an extra stage of logic to the issue signal critical path to prioritize instructions based on pointers would mean running at a much slower cycle time, reducing the performance of the entire machine.
In a preferred embodiment, update logic used to compact the queue provides a count of free queue rows in cycle K−2 to the full-queue detection circuit. The number of instructions issued in cycle K−1 is added to the free row count. Next, the number of instructions enqueued in cycle K−1 is subtracted from the sum of the free row count and the issue count. Finally, the number of speculatively issued instructions issued in cycle K−1 which produce a cache hit is added to the above remainder.
The counting, addition and subtraction operations can be simplified by using flat-vectors to represent counts and shifting operations to increment and decrement the counts.
The result is then encoded and compared with the number of incoming instructions. Alternatively, to save space, the number of free rows in the queue is compared with a predetermined value, preferably the maximum number of incoming instructions allowed.
Accordingly, in a pipelined computer architecture in which instructions may be removed from the instruction queue out of sequence, a method for detecting instruction queue status at a cycle K comprises adding together the number of invalid instructions or free rows in the queue during cycle K−2, the number of instructions issued for cycle K−1 and the number of instructions speculatively issued in cycle K−1 that have produced a cache hit, and subtracting from the sum the number of instructions enqueued for cycle K−1.
The result of this calculation indicates the number of invalid instructions in the queue cycle K.
The number of invalid entries instructions, the number of issued instructions, and the number of enqueued instructions are each preferably represented as a flat vector. Adding can therefore be performed by shifting in one direction, while subtracting can be performed by shifting in the opposite direction.
If the result of the calculated value indicates that the queue is full, a stall signal is generated.
Alternatively, a stall signal can be generated if the indicative value is less than a predetermined value, w

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and circuits for early detection of a full queue does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and circuits for early detection of a full queue, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and circuits for early detection of a full queue will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3011403

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.