Cumulative lookahead to eliminate chained dependencies

Electrical computers and digital processing systems: processing – Processing architecture – Superscalar

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S001000, C712S214000, C711S200000, C711S204000, C711S213000

Reexamination Certificate

active

06332187

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention is related to the field of processors and, more particularly, to mechanisms for enhancing parallelism within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by simultaneously executing multiple instructions in a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term “clock cycle” refers to an interval of time during which the pipeline stages of a processor perform their intended functions. For example, superscalar processors are typically configured with instruction processing pipelines which process instructions. The processing of instructions includes the actions of fetching, dispatching, decoding, executing, and writing back results. Each action may be implemented in one or more pipeline stages, and an instruction flows through each of the pipeline stages where an action or portion of an action is performed. At the end of a clock cycle, the instruction and the values resulting from performing the action of the current pipeline stage are moved to the next pipeline stage. When an instruction reaches the end of an instruction processing pipeline, it is processed and the results of executing the instruction have been recorded.
A problem associated with executing a large number of instructions concurrently is that instructions often have dependencies on instructions prior to them in program order. As used herein, the term “dependency” or “dependent” refers to the condition in which an instruction receives the result of executing a previous instruction as one of its operands. In other words, the dependent instruction operates on the result of the previous instruction. Generally speaking, instructions which are dependent on a previous instruction do not execute in parallel with that previous instruction. Instead, the previous instruction executes, the result is forwarded to the dependent instruction, and the dependent instruction executes in a subsequent clock cycle.
In many cases, instruction dependencies limit the number of instructions which may be executed in a given clock cycle. It is not uncommon in programs for a particular instruction to be dependent on an instruction immediately prior to the particular instruction or to be dependent on an instruction two instructions prior to the particular instruction. Further, it is not uncommon in programs for a majority of the instructions to be dependent in this way. This type of program severely limits the number of instructions which may be executed concurrently.
While a problem in any instruction set, programs written using the x86 instruction set (also referred to as IA-32 or APX) frequently are even more sensitive to the problem of dependencies limiting parallelism. For example, due to the relatively small number of registers available in the x86 instruction set, many operands are stored on a memory stack pointed to by the ESP register. Accordingly, the ESP is an operand of many instructions. Furthermore, many instructions update the ESP register as well (e.g. pushing and popping values on the stack). Accordingly, instructions may exhibit a chain of dependencies on the ESP register, limiting overall concurrent execution. Other instruction sets may not specify a dedicated stack pointer register such as ESP, but software may employ a stack model in which a register is used as a stack pointer register. Such a model may exhibit chained dependencies as well.
As used herein, an “operand” is a value operated upon by an instruction. Source operands are input values to be operated upon in response to the instruction to produce a result, which is the destination operand. Operands may be register operands if they are stored in registers internal to the processor, or memory operands if they are stored in a memory location external to the processor. Register operands are specified by a register address which may be directly encoded in the instruction or may be implicit in the definition of the instruction assigned to a particular opcode. Memory operands are specified by a memory address which may be specified via one or more address operands of the instruction (e.g. a displacement coded into the instruction, one or more register operands, etc.).
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a processor configured to generate lookahead values using a cumulative constant. The processor classifies operations to a particular register (e.g. the stack pointer register, or ESP in an embodiment employing the x86 instruction set architecture) as either accelerated or non-accelerated. For example, instructions which are defined to increment/decrement the particular register by an explicit or implicit constant value may be accelerated operations. Upon the occurrence of a non-accelerated operation, the processor may begin accumulating the cumulative effect of accelerated operations to the result of the non-accelerated operation as a cumulative offset. The result of the non-accelerated operation (upon execution thereof) may then be added to the cumulative offset values corresponding to each accelerated operation to generate the particular register value corresponding to that accelerated operation. Accordingly, dependencies upon the register due to the accelerated operations may be alleviated. Accelerated operations may execute in parallel upon provision of the value generated by the non-accelerated operations. The cumulative value may be maintained across multiple cycles of instruction dispatch, thereby allowing for dependency alleviation across the multiple cycles of instruction dispatch. Performance of the processor may be increased due to the alleviation of dependencies due to the particular register.
Broadly speaking, in one embodiment a processor is contemplated. The processor comprises a lookahead unit and a second unit. The lookahead unit is configured to detect an instruction having a particular register as an operand. The lookahead unit is configured to generate a constant corresponding to the instruction, wherein the constant is indicative of a modification of a value stored into the particular register in response to executing a previous instruction. The lookahead unit is configured to generate the constant responsive to: (i) a cumulative offset maintained by the lookahead unit, the cumulative offset reflecting a cumulative modification of the value due to each instruction between the previous instruction and the instruction; and (ii) a modification due to the instruction. Coupled to the lookahead unit, the second unit is configured to combine the constant with the value to generate the operand.
In another embodiment, a method for enhancing parallelism in a processor is contemplated. A particular instruction defined to generate a value for storage into a particular register is executed. A cumulative offset reflecting a cumulative modification to the value is maintained. The cumulative modification is due to one or more instructions subsequent to the particular instruction. A constant corresponding to a first instruction is generated responsive to the cumulative offset and a modification of the value due to the first instruction.
In yet another embodiment, a computer system is contemplated. The computer system comprises a processor and an input/output (I/O) device. The processor includes a lookahead unit and a second unit. The lookahead unit is configured to detect an instruction having a particular register as an operand. The lookahead unit is configured to generate a constant corresponding to the instruction, wherein the constant is indicative of a modification of a value stored into the particular register in response to executing a previous instruction. The lookahead unit is configured to generate the constant responsive to: (i) a cumulative offset maintained by the lookahead unit, the cumulative offset reflecting a cumulative modification of the value due to each instruction between the previous instruction and the inst

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cumulative lookahead to eliminate chained dependencies does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cumulative lookahead to eliminate chained dependencies, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cumulative lookahead to eliminate chained dependencies will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2568814

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.