Electrical computers and digital processing systems: processing – Instruction decoding – Decoding by plural parallel decoders
Reexamination Certificate
1998-09-21
2001-02-20
An, Meng-Ai T. (Department: 2784)
Electrical computers and digital processing systems: processing
Instruction decoding
Decoding by plural parallel decoders
C712S216000
Reexamination Certificate
active
06192465
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to decoding instructions out of program order within a microprocessor.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance through the use of pipelining, parallel execution, and high clock rates. Pipelining is an implementation technique whereby multiple instructions are overlapped during the execution process. Parallel execution refers to the simultaneously executing multiple instructions in a single clock cycle. As used herein, the term “clock cycle” refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. At the end of a clock cycle, the resulting values are moved to the next pipeline stage.
Pipelining has several hazards associated with it. One particular hazard is stalling the pipeline due to branch instructions. When a branch instruction propagates through the pipeline, it is difficult to determine which instructions after the branch should be processed until the results of the branch instruction are know. For example, if the branch instruction is “taken”, then the next instruction to be executed after the branch may be located at a particular address that is offset from the branch instruction's address. In contrast, if the branch instruction is “not taken”, then the next instruction to be executed may be located at the address immediately following the branch instruction. As a result, the initial stages of the pipeline may be unable to determine which instructions should begin execution in the pipeline following the branch instruction. Thus, the pipeline may stall awaiting the results of the branch instruction.
In order to prevent the instruction pipeline from stalling, microprocessor designers may implement branch prediction schemes to provide the initial pipeline stages with a predicted result for each branch instruction. The initial stages of the pipeline speculatively execute instructions along the predicted path until the branch instruction executes and one of the following occurs: (1) the prediction is found to correct, in which case the instructions continue to execute and are no longer speculative, or (2) the prediction is found to be incorrect, in which case all pipeline stages executing instructions after the branch are flushed and the pipeline starts anew using the correct path.
While parallel execution and branch prediction improve overall instruction throughput for a microprocessor at a given clock cycle, process improvements have led to dramatically increased operating frequencies that have further increased the number of instructions that a microprocessor may execute in a fixed period of time. These advancements have placed increasing importance upon a microprocessor's ability to decode instructions. Instruction decoding typically refers to identifying the different fields within the instruction (e.g., the opcode field and any prefixes or operands) and then expanding the instruction into an internal format so that the microprocessor's functional units may easily execute the instruction.
While RISC (Reduced Instruction Set Computer) microprocessors have been implemented to simplify instruction decoding, microprocessors capable of executing older variable-length instruction sets such as the x86 instruction set have remained commercially important due to the vast amount of software available for the older instruction sets. Furthermore, operating frequencies have climbed so quickly that even RISC microprocessors may eventually need faster methods for decoding instructions.
One proposed method for quickly decoding large numbers of instructions involves using a number of parallel decoders. However, current implementations using parallel decoders have been limited in their throughput because of the “in-order” (i.e., in program order) nature of decoding. Most programs rely upon their instructions being executed in a particular order. This order is referred to as “program order”. As previously noted, most modern microprocessors support out-of-order execution. However, these microprocessors must ensure that the instructions that are executed out-of-order do not aversely affect the intended operation of the program. This is accomplished through “dependency checking”. Dependency checking refers to determining which instructions rely upon other instructions' prior execution to finction properly. Thus, dependency checking ensures that the only instructions that are executed out of order are those that will not adversely affect the desired operation of the program. For typical dependency checking hardware to operate correctly, it relies upon receiving decoded instructions that are in-order. Thus, typical instruction decoders receive and decode instructions in program order so that the program order will be preserved for the dependency checking hardware (typically the next stage in the instruction processing pipeline).
This in-order configuration affects decoder throughput by causing some decoders to stall in certain instances. For example, when a new set of instruction bytes is received by the decoders, each decoder must wait to output its results (i.e., its decoded instructions) until all decoders before it have output their results. If not, the following pipeline stages may receive the decoded instructions out-of-order.
For these reasons, a method and apparatus for quickly decoding a large number of instructions is desirable. In particular, a method capable of quickly decoding large numbers of instructions out of order is desirable.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a microprocessor capable of decoding instructions out-of-order while still performing dependency checking in program order. Broadly speaking, in one embodiment the microprocessor comprises an instruction cache, two decode units, a reorder queue, and dependency checking logic. The instruction cache may be configured to output sequential groups of instruction bytes called cache lines. The cache line is divided into portions, which are routed to respective decode units, which decode the individual instructions contained therein. The decode units operate independently of each other and may decode the cache line portions out of program order. The decode units output the decoded instructions according to their relative position within the cache line portions. The decoded instructions are received by the reorder queue, which comprises a plurality of storage lines. Each storage line in turn comprises a fixed number of instruction storage locations. The number of storage locations may equal the maximum possible number of instructions within each cache line portion. The reorder queue allocates one storage line for each decoded cache line, and the decoded instructions are stored according to their relative cache line portion positions. The decoded instructions may be read out of the reorder queue in program order, thereby enabling the dependency checking logic to perform dependency checking in program order.
In another embodiment, the microprocessor may further comprise a third decoder and routing logic. The routing logic may be configured to receive cache lines as they are output from the instruction cache and then route portions of them to one of the three decoders. The third decoder may be configured to operate as a split instruction decoder, and the routing logic may be configured to route instructions that extend across cache line portion boundaries to the third decoder.
A method for decoding instructions out-of-order and then reordering them for dependency checking is also disclosed. In one embodiment, the method may comprise fetching a plurality of instruction bytes and then decoding the instructions contained within the plurality of instruction bytes out of program order. The decoded instructions are then reordered to match program order and dependency checking is performed. The instructions may then be issued to reservation stations for eventual out of order execution.
In
Advanced Micro Devices , Inc.
An Meng-Ai T.
Christen Dan R.
Conley & Rose & Tayon P.C.
Whitmore Stacy
LandOfFree
Using multiple decoders and a reorder queue to decode... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Using multiple decoders and a reorder queue to decode..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Using multiple decoders and a reorder queue to decode... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2607485