Method and apparatus for pipeline streamlining where...

Electrical computers and digital processing systems: processing – Instruction issuing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06393550

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to pipelined microprocessors, and more particularly to achieving maximum throughput of dependent operations in a pipelined processor.
2. Art Background
Simple microprocessors generally process instructions one at a time. Each instruction is processed using four sequential stages: instruction fetch, instruction decode, execute, and result write back to the register file or memory. Within such microprocessors, different dedicated logic blocks perform each processing stage. Each logic block waits until all the previous logic blocks complete operations before beginning its operation.
To improve microprocessor efficiency, microprocessor designers overlapped the operations of the fetch, decode, execute, and write back stages such that the microprocessor operates on several instructions simultaneously. In operation, the fetch, decode, execute, and write back stages concurrently process different instructions. At each clock cycle the results of each processing stage are passed to the following processing stage. Microprocessors that use the technique of overlapping the fetch, decode, execute, and write back stages are known as “pipelined” microprocessors.
In order for pipelined microprocessors to operate efficiently, an instruction fetch unit at the head of the pipeline must continually provide the pipeline with a stream of instructions. However, conditional branch instructions within an instruction stream prevent an instruction fetch unit at the head of a pipeline from fetching the correct instructions until the condition is resolved. Since the condition will not be resolved until further down the pipeline, the instruction fetch unit cannot necessarily fetch the proper instructions.
To alleviate this problem, some newer pipelined microprocessors use branch prediction mechanisms that predict the outcome of branches, and then fetch subsequent instructions according to the branch prediction. Branch prediction is achieved using a branch target buffer (BTB) to store the history of a branch instruction based only upon the instruction pointer or address of that instruction. Every time a branch instruction is fetched, the BTB predicts the target address of the branch using the branch history. For a more detailed discussion of branch prediction, please refer to Tse Yu Yeh and Yale N. Patt,
Two
-
Level Adaptive Branch Prediction
, the 24th ACM/IEEE International Symposium and Workshop on MicroArchitecture, November 1991, and Tse Yu Yeh and Yale N. Patt,
Alternative Implementations of Two
-
Level Adaptive Branch Prediction
, Proceedings of the Nineteenth International Symposium on Computer Architecture, May 1992.
In combination with speculative execution, out-of-order dispatch of instructions to the execution units results in a substantial increase in instruction throughput. With out-of-order completion, any number of instructions are allowed to be in execution in the execution units, up to the total number of pipeline stages in all the functional units. Instructions may complete out of order because instruction dispatch is not stalled when a functional unit takes more than one cycle to compute a result. Consequently, a functional unit may complete an instruction after subsequent instructions have already completed. For a detailed explanation of speculative out-of-order execution, refer to M. Johnson,
Superscalar Microprocessor Design
, Prentice Hall, 1991, Chapters 2, 3, 4, and 7.
In a processor using out-of-order completion, instruction dispatch is stalled when there is a conflict for a functional unit or when an issued instruction depends on a result that is not yet computed. In order to prevent or mitigate stalls in decoding, a buffer (known as a reservation station (RS) may be provided between the decode and execute stages. The processor decodes instructions and places them into the reservation station as long as there is room in the buffer, and at the same time, examines instructions in the reservation station to find those that can be dispatched to the execution units (that is, instructions for which all source operands and the appropriate execution units are available).
Instructions are dispatched from the reservation station with little regard for their original program order. However, the capability to issue instructions out-of-order introduces a constraint on register usage. To understand this problem, consider the following pseudo-microcode sequence:
1. t←load (memory)
2. eax←add (eax, t)
3. ebx←add (ebx, eax)
4. eax←mov (2)
5. edx←add (eax, 3)
The micro-instructions and registers shown above are those of the well known Intel Microprocessor Architecture. For further information, reference may be made to the
i
486™
Microprocessor Programmers Reference Manual
, published by Osborne-McGraw-Hill, 1990, which is also available directly from Intel Corporation of Santa Clara, Calif.
In an out-of-order machine executing these instructions, it is likely that the machine would complete execution of the fourth instruction before the second instruction, because the third ADD instruction may require only one clock cycle, while the load instruction and the immediately following ADD instruction may require a total of four clock cycles, for example. However, if the fourth instruction is executed before the second instruction, then the fourth instruction would probably incorrectly overwrite the first operand of the second instruction, leading to an incorrect result. Instead of the second instruction producing a value that the third instruction would use, the third instruction produces a value that would destroy a value that the second one uses.
This type of dependency is called a storage conflict, because the reuse of storage locations (including registers) causes instructions to interfere with one another, even though the conflicting instructions are otherwise independent. Such storage conflicts constrain instruction dispatch and reduce performance.
Storage conflicts may be avoided by providing additional registers that are used to reestablish the correspondence between registers and values. Using register renaming, these additional “physical” registers are associated with the original “logical” registers and values needed by the program. To implement register renaming, the processor may allocate a new register for every new value produced, i.e., for every instruction that writes a register. An instruction identifying the original logical register for the purpose of reading its value obtains instead the value in the newly allocated register. Thus, the hardware renames the original register identifier in the instruction to identify the new register and the correct value. The same register identifier in several different instructions may access different hardware registers depending on the locations of register references with respect to the register assignments.
With renaming, the example instruction sequence depicted above becomes:
1. t
a
←load (mem)
2. eax
b
←add (eax
a
,t
a
)
3. ebx
b
←add (ebx
a
,eax
b
)
4. eax
c
←mov (2)
5. edx
a
←add (eax
c
,3)
In this sequence, each assignment to a register creates a new instance of the register, denoted by an alphabetic subscript. The creation of a renamed register for eax in the fourth instruction avoids the resource dependency on the second and third instructions, and does not interfere with correctly supplying an operand to the fifth instruction. Renaming allows the fourth instruction to be dispatched immediately, whereas, without renaming, the instruction must be delayed until execution of the second and third instructions. When an instruction is decoded, its result value is assigned a location in a functional unit called a reorder buffer (ROB), and its destination register number is associated with this location. This renames the destination register to the reorder buffer location. When a subsequent instruction refers to the renamed destination register, in order to obtain the value considered to be stored in the reg

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for pipeline streamlining where... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for pipeline streamlining where..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for pipeline streamlining where... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2892243

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.