Electrical computers and digital processing systems: processing – Instruction decoding – Predecoding of instruction component
Reexamination Certificate
2000-11-07
2002-09-24
Ellis, Richard L. (Department: 2653)
Electrical computers and digital processing systems: processing
Instruction decoding
Predecoding of instruction component
C712S204000, C712S206000, C712S207000, C712S211000, C712S213000, C712S227000, C712S235000, C712S237000, C712S239000
Reexamination Certificate
active
06457117
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention is related to the field of processors and, more particularly, to predecoding techniques within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors which employ wider issue rates. A “wide issue” superscalar processor is capable of dispatching (or issuing) a larger maximum number of instructions per clock cycle than a “narrow issue” superscalar processor is capable of dispatching. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
Many processors are designed to execute the x86 instruction set due to its widespread acceptance in the computer industry. For example, the K5 and K6 processors from Advanced Micro Devices, Inc., of Sunnyvale, Calif. implement the x86 instruction set. The x86 instruction set is a variable length instruction set in which various instructions occupy differing numbers of bytes in memory. The type of instruction, as well as the addressing modes selected for a particular instruction encoding, may affect the number of bytes occupied by that particular instruction encoding. Variable length instruction sets, such as the x86 instruction set, minimize the amount of memory needed to store a particular program by only occupying the number of bytes needed for each instruction. In contrast, many RISC architectures employ fixed length instruction sets in which each instruction occupies a fixed, predetermined number of bytes.
Unfortunately, variable length instruction sets complicate the design of wide issue processors. For a wide issue processor to be effective, the processor must be able to identify large numbers of instructions concurrently and rapidly within a code sequence in order to provide sufficient instructions to the instruction dispatch hardware. Because the location of each variable length instruction within a code sequence is dependent upon the preceding instructions, rapid identification of instructions is difficult. If a sufficient number of instructions cannot be identified, the wide issue structure may not result in significant performance gains. Therefore, a processor which provides rapid and concurrent identification of instructions for dispatch is needed.
Another feature which is important to the performance achievable by wide issue superscalar processors is the accuracy and effectiveness of its branch prediction mechanism. As used herein, the branch prediction mechanism refers to the hardware which detects control transfer instructions within the instructions being identified for dispatch and which predicts the next fetch address resulting from the execution of the identified control transfer instructions. Generally, a “control transfer” instruction is an instruction which, when executed, specifies the address from which the next instruction to be executed is fetched. Jump instructions are an example of control transfer instructions. A jump instruction specifies a target address different than the address of the byte immediately following the jump instruction (the “sequential address”). Unconditional jump instructions always cause the next instruction to be fetched to be the instruction at the target address, while conditional jump instructions cause the next instruction be fetched to be either the instruction at the target address or the instruction at the sequential address responsive to an execution result of a previous instruction (for example, by specifying a condition flag set via instruction execution). Other types of instructions besides jump instructions may also be control transfer instructions. For example, subroutine call and return instructions may cause stack manipulations in addition to specifying the next fetch address. Many of these additional types of control transfer instructions include a jump operation (either conditional or unconditional) as well as additional instruction operations.
Control transfer instructions may specify the target address in a variety of ways. “Relative” control transfer instructions include a value (either directly or indirectly) which is to be added to an address corresponding to the relative control transfer instruction in order to generate the target address. The address to which the value is added depends upon the instruction set definition. For x86 control transfer instructions, the address of the byte immediately following the control transfer instruction is the address to which the value is added. Other instruction sets may specifying adding the value to the address of the control transfer instruction itself. For relative control transfer instructions which directly specify the value to be added, an instruction field is included for storing the value and the value is referred to as a “displacement”.
On the other hand, “absolute” control transfer instructions specify the target address itself (again, either directly or indirectly). Absolute control transfer instructions therefore do not require an address corresponding to the control transfer instruction to determine the target address. Control transfer instructions which specify the target address indirectly (e.g. via one or more register or memory operands) are referred to as “indirect” control transfer instructions.
Because of the variety of available control transfer instructions, the branch prediction mechanism may be quite complex. However, because control transfer instructions occur frequently in many program sequences, wide issue processors have a need for a highly effective (e.g. both accurate and rapid) branch prediction mechanism. If the branch prediction mechanism is not highly accurate, the wide issue processor may issue a large number of instructions per clock cycle but may ultimately cancel many of the issued instructions due to branch mispredictions. On the other hand, the number of clock cycles used by the branch prediction mechanism to generate a target address needs to be minimized to allow for the instructions that the target address to be fetched.
The term “branch instruction” is used herein to be synonymous with “control transfer instruction”.
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a processor in accordance with the present invention. The processor is configured to predecode instruction bytes prior to their storage within an instruction cache. During the predecoding, relative branch instructions are detected. The displacement included within the relative branch instruction is added to the address corresponding to the relative branch instruction, thereby generating the target address. The processor replaces the displacement field of the relative branch instruction with an encoding of the target address, and stores the modified relative branch instruction in the instruction cache. Advantageously, the branch prediction mechanism employed by the processor may more rapidly generate the target address corresponding to relative branch instructions. The branch prediction mechanism may simply select the target address from the displacement field of the relative branch instruction instead of performing an addition to generate the target address. The rapidly generated target address may be provided to the instruction cache for fetching instructions more quickly than might otherwise be achieved. The amount of time elapsing between fetching a branch instruction and generating the corresponding target address may adv
Advanced Micro Devices , Inc.
Conley Rose & Tayon PC
Ellis Richard L.
Merkel Lawrence J.
Patel Gautam R.
LandOfFree
Processor configured to predecode relative control transfer... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Processor configured to predecode relative control transfer..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Processor configured to predecode relative control transfer... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2861570