Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Scoreboarding – reservation station – or aliasing
Reexamination Certificate
2000-10-10
2002-02-26
Pan, Daniel H. (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Scoreboarding, reservation station, or aliasing
C712S245000, C712S228000, C712S023000, C712S213000
Reexamination Certificate
active
06351804
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to the storing of control bit vectors representing instructions prior to the execution of these instructions.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions during a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term “clock cycle” refers to an interval of time accorded to various stages of an instruction processing pipeline. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively.
Many superscalar microprocessor manufacturers design microprocessors in accordance with the x86 microprocessor architecture. Due to its wide acceptance in the computer industry, the x86 microprocessor architecture is employed by a large body of software. Microprocessors designed in accordance with the x86 microprocessor architecture advantageously enjoy compatibility with this large body of software. Computer systems manufacturers may be more inclined to choose an x86-compatible microprocessor than a non-x86-compatible microprocessor for computer systems. Information regarding the x86 microprocessor architecture may be found in the publication entitled: “PC Magazine Programmer's Technical Reference: The Processor and the CoProcessor” by Hummel, Ziff-Davis Press, Emeryville, Calif. 1992. This publication is incorporated herein by reference in its entirety.
The x86 microprocessor architecture is characterized by a plurality of complex variable byte length instructions. A particular instruction may comprise a single byte, or multiple bytes. Additionally, instructions may specify one or more instruction operations. As used herein, an “instruction operation” refers to an operation which may be executed by a functional unit to produce a result. Exemplary instruction operations may include an arithmetic operation, a logical operation, an address generation, etc. Instructions may have explicit instruction operations (i.e. operations defined by the opcode of the instruction) and implicit operations defined by the particular instruction coded by the programmer (i.e. an address generation and a load or store operation for an operand stored in memory).
Unfortunately, complex instructions employing more than one instruction operation are often difficult to execute in superscalar microprocessors. Superscalar microprocessors typically employ multiple functional units to perform concurrent execution of instructions. Therefore, it is desirable that the functional units be relatively simple, such that each functional unit occupies a small amount of silicon area. For example, a functional unit may be configured to execute one instruction operation during a clock cycle. Complex instructions may utilize multiple clock cycles within such functional units. The complex instruction must be interpreted differently by the functional unit during each clock cycle to perform the instruction operations specified by that instruction. Complex logic may be employed to correctly execute these complex instructions, deleteriously enlarging the size of the functional unit. A less costly solution (in terms of complexity and silicon area) to the execution of complex instructions employing multiple instruction operations is desired.
The multiple functional units employed by a superscalar microprocessor may be equipped with reservation stations to store instructions and operands prior to their execution by the respective functional unit. Reservation stations are useful in a superscalar microprocessor because instructions may be decoded and dispatched prior to the source operands of the instruction being available. An “operand” or “operand value” of an instruction is a value the instruction is intended to operate upon. Operands may be located by an “operand address” which may define a register or a memory location storing the operand. Operands may be register operands in which the value is stored in a register, memory operands in which the value is stored in a memory location, or immediate operands in which the value is stored within the instruction itself. A source operand value is a value upon which the instruction operates, and a destination operand is a location to store the result of executing the instruction. A result is a value generated by operating upon the source operands according to the instruction operation(s) defined by the instruction.
Generally speaking, a reservation station comprises one or more storage locations (referred to as “reservation station entries”). Each reservation station entry may store a decoded instruction and operands or operand values. Other useful information may also be stored in a reservation station entry.
Typically, a decoded instruction is transferred to a storage device within a functional unit when the operand values have been provided. The decoded instruction is then decomposed into a plurality of control bits. The control bits are conveyed to the dataflow elements within the functional unit, and cause the dataflow elements to perform the instruction operation. A “dataflow element” is a device which performs a particular manipulation upon an input operand or operands according to a set of control bits conveyed thereto. For example, a multiplexor is a dataflow element which selects one of multiple input operands. The control bits provided to the multiplexor indicate which of the multiple input operands should be selected. As another example, an arithmetic unit is a dataflow element which may add or subtract input operands dependent upon the state of its input control bits.
Unfortunately, decomposing a decoded instruction into control bits and then performing the instruction operation defined by the instruction during a clock cycle may limit the frequency (i.e. the inverse of the clock cycle) of a superscalar microprocessor. It would be desirable to perform an equivalent function to the reservation station/functional unit pair wherein an instruction is selected from the reservation station during a clock cycle and the result is produced in a subsequent clock cycle without limiting the frequency of the superscalar microprocessor.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a control bit vector storage according to the present invention. The present control bit vector storage (preferably included within a functional unit) stores control bits indicative of a particular instruction. The control bits are divided into multiple control vectors, each vector indicative of one instruction operation. The control bits control dataflow elements within the functional unit to cause the instruction operation to be performed. Advantageously, logic for determining the control bits for the data flow of a functional unit is removed from the functional unit. The clock cycle time characterizing the functional unit may be advantageously reduced by the amount of time previously used by the logic to generate the control bits.
Additionally, the present control bit vector storage allows complex instructions (or instructions which produce multiple results) to be divided into simpler operations. The hardware included within the functional unit may be reduced to that employed to perform the simpler operations. Advantageously, the amount of silicon area occupied by the functional unit may be reduced. Superscalar microprocessors which employ multiple functional units may particularly benefit from this utility.
In one embodiment, the control bit vector storage comprises a plurality of vector storages. Each vector storage comprises a pair of individual vector storages and a shared vector storage. The shared vector storage
Merkel Lawrence J.
Pan Daniel H.
LandOfFree
Control bit vector storage for a microprocessor does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Control bit vector storage for a microprocessor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Control bit vector storage for a microprocessor will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2938236