Electrical computers and digital processing systems: processing – Processing architecture – Superscalar
Reexamination Certificate
1998-06-02
2001-06-19
Pan, Daniel H. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing architecture
Superscalar
C712S028000, C712S212000, C712S214000, C712S217000, C712S218000
Reexamination Certificate
active
06249855
ABSTRACT:
BACKGROUND OF THE INVENTION
A central processing unit (CPU) that executes instructions out-of-order utilizes issue logic to clear instructions for passage to the CPU's execution unit(s). This issue logic can be divided into a number of discrete components that perform aspects of the issue clearance. An instruction scoreboard is used by the issue logic to weigh the register resource requirements of each instruction in an instruction queue to ultimately prioritize the instructions for issuance. Instructions waiting in the instruction queue are represented as flat bit vectors in the scoreboard logic. Each bit represents a register of the CPU and is set, or not, based on whether the associated instruction utilizes the register. Request logic identifies which instructions in the instruction queue are ready to issue to the execution units. Finally, an arbiter actually selects the instructions for issue based upon information derived by the scoreboard and request logic.
In older CPU architectures, the arbiter circuit merely had to select one instruction for issue per cycle. In these CPUs, there was typically one integer and one floating point execution unit, with separate instruction queues.
More modern CPUs utilize multiple, parallel execution units to maximize the number of instructions that can issue per machine cycle. These architectures complicate somewhat the design of the arbiter. Not only must it be possible to execute the two instructions simultaneously based upon the register requirements, for example, but if two arbiters are used to select the two instructions for issue, the two arbiters must coordinate their mutual operation to ensure that the same instruction is not sent to different execution units. This of course wastes compute resources since the multiple execution units will be duplicating each others work.
One solution to this problem is to a priori assign each instruction in the instruction queue to issue to one or the other of the execution units. One of the arbiters is then assigned to select from the instructions to issue to one execution unit, and the other arbiter is assigned to select instructions to issue in the other execution unit.
SUMMARY OF THE INVENTION
The problem with the conventional approach to coordinating the operation of the arbiters is that it sub-optimally utilizes the CPU's compute resources. The instructions are assigned to one or the other of the execution units without any assessment as to the state of the execution units when the instructions are ready to issue. As a result, due to prior instructions being executed by one of the execution units, it may occur that the execution unit to which a particular instruction has been assigned is not ready to execute that instruction due to a long latency event such as waiting for bus access, for example. Thus, the instruction cannot issue, even though it may be the lowest instruction in the instruction queue.
The present invention is directed to an arbiter system for the instruction issue logic of a CPU having two or more execution units operating in parallel. The arbiter system comprises two encoder circuits that select instructions in an instruction queue for issue to first and second execution units, respectively, based upon the positions of the instructions within the queue and requests by the instructions for the first and/or second execution units. As a result, since the instruction can request different execution units, this system is compatible with situations where the execution units may have different capabilities to execute different instructions, i.e., each execution unit may not be able to execute all of the instructions in the CPU's instruction set. According to the present invention, one of the encoder circuits is subordinate to the other circuit. The subordinate encoder circuit selects instructions from the instruction queue based not only on the positions of the instructions and their requests, but also on the instruction selection of the dominant encoder circuit. As a result, the encoder circuits coordinate their respective operations, allowing the highest-priority queued instruction to be executed in either of at least two execution units, avoiding the need for execution unit assignments at this level.
In specific embodiments, the dominant and subordinate encoder circuits select at least two instructions for issue within one machine cycle, for issue in the next machine cycle. As a result, the circuitry of the encoder circuits must be fast enough to allow for this operation. This is accomplished by having each encoder circuit receive a request signal indicating whether each instruction unit in the instruction queue can issue to the execution unit for which the encoder circuits select instructions. The dominant encoder circuit then generates a first execution unit grant signal for the oldest queued or highest priority instruction that requests the first execution unit. Accordingly, the first or dominant encoder circuit operates as a greedy picking circuit. The subordinate encoder circuit then generates a grant signal to the highest priority instruction that requests the second execution unit and for which the dominant encoder has not generated a grant signal.
In the implementation of the preferred embodiment, the dominant encoder forms the grant signal by pre-charging a grant line for each row if the instruction of the row requests the first execution unit. The grant lines are then de-asserted for all higher-queued or lower priority instructions with respect to the lowest row for which a request signal is generated for the first execution unit. The subordinate encoder circuit then generates a grant signal also by pre-charging the grant lines for each row if the instruction for the row requests the second execution unit and then de-asserts the grant lines of all higher-queued or lower priority instructions for the lowest row for which a request signal is generated and the dominant encoder circuit did not generate a grant signal for the instruction of that row.
An array of state elements is used at the output of the grant lines for both encoder circuits. A state element for each row only generates an issue signal to the second execution unit in response to an asserted grant line for the row from the subordinate encoder and the de-asserting of the grant line for the dominant encoder. As a result, the state elements essentially use the grant line de-assertion from the dominant encoder circuit as a clock edge to set the output of the grant line from the subordinate encoder circuits, with the pre-charge clock being used to latch any grant signal from the subordinate encoder. In this way, the subordinate encoder circuit operates in response to the output of the dominant encoder circuit even though no clock edge is available because the dominoing of the two encoder circuits happens within a single machine cycle.
In specific embodiments, multiple pairs of dominant/subordinate encoders may be used to select additional instructions for issue to additional execution units. Particularly, in the preferred embodiment, two pairs of encoder circuits are used to select instructions for four execution units. Specifically, a second dominant encoder circuit is used to select instructions for the third execution unit and a second subordinate encoder circuit is used to select instructions for a fourth execution unit. Instructions, upon entering the instruction queue, are assigned to either pair of the instruction units.
In general, according to another aspect, the invention also features a prioritizing latching circuit for dynamic logic, which operates within a single clock period. Specifically, in response to grant lines from a dominant encoder and a subordinate encoder, which are not commonly asserted, the prioritizing latching circuit is set in response to a de-assertion of a grant line from the dominant encoder circuit to pass any grant signal from the subordinate encoder circuit.
In general, according to still another aspect, the invention also features a method for assigning instruc
Farrell James A.
Gieseke Bruce A.
Compaq Computer Corporation
Hamilton Brook Smith & Reynolds P.C.
Nguyen Dzung C.
Pan Daniel H.
LandOfFree
Arbiter system for central processing unit having dual... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Arbiter system for central processing unit having dual..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Arbiter system for central processing unit having dual... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2472738