Rename finish conflict detection and recovery

Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S023000, C712S217000

Reexamination Certificate

active

06829699

ABSTRACT:

BACKGROUND OF INVENTION
1. Field of the Invention
The present invention relates to improvements of out of order CPU architectures regarding performance purposes. In particular it relates to an improved method and system for operating a high frequency out of order processor with increased pipeline length.
2. Description Disadvantages of Prior Art
The present invention has a quite general scope which is not limited to a vendor specific processor architecture because its key concepts are independent therefrom.
Despite of this fact it will be discussed with a specific prior art processor architecture.
Said prior art out of order processor in this example an IBM S/390 processor has as an essential component a so called Instruction Window Buffer, further referred to herein as IWB, too. After coming from an instruction cache and passed through a decode and branch prediction unit the instructions are dispatched still in order. In this out of order processor the instructions are allowed to be executed and the results written back into the IWB out of order.
In other words, after the instructions have been fetched by a fetch unit stored in the instruction queue and have been renamed in a renaming unit they are stored in order into a part of the IWB called reservation station. From the reservation station the instructions may be issued out of order to a plurality of instruction execution units abbreviated herein as IEU, and the speculative results are stored in a temporary register buffer, called reorder buffer, abbreviated herein as ROB. These speculative results are committed (or retired) in the actual program order thereby transforming the speculative result into the architectural state within a register file, a so called Architected Register Array, further abbreviated herein as ARA. In this way it is assured that the out of order processor with respect to its architectural state behaves like an in order processor.
Within the above summarized scheme, “Renaming” is the process of allocating a new register in the reorder buffer for every new speculative execution result. Renaming is done to avoid the so called “write after read” and “write after write” hazards that otherwise would prevent the out of order execution of the instructions. Each time a new register is allocated, a destination tag the instruction ID is associated with this register. With the help of this tag the speculative result of the execution is written in the newly allocated register. Later on, the in order completion process sets the architectural state by writing the speculative data into a architectural register or by setting a flag bit that specifies that the data has become part of the architectural state. In this way, the out of order processor behaves from an architectural point of view as if it executes all instructions in an in order sequence.
In a state of the art approach renaming is done according to the schemes shown in FIG.
1
and FIG.
2
. In the upper portion of the figures the pipeline stages are illustrated whereas in the respective bottom part a structural overview is given. The main difference between the two schemes is the storing of source data or not storing of source data, respectively, into the issue queue. Therefore, the cycle in which the source data is read from the register file is different.
In particular, the first approach is illustrated in FIG.
1
. During renaming
110
the logical register addresses are assigned with physical register addresses in which the source data for the instruction resides. Further, a new register is allocated in which the speculative result of the instruction will be stored after execution. Next,
110
, the instruction is written into the issue queue
160
, together with all its control bits (like opcode), source validity (if the source data is already available in the register file) and other bits as resulting from the renaming process. The wake up logic
170
of the issue queue will monitor the results produced by the execution units and will set the source that is dependent on the target result to valid for those instructions that are waiting in the issue queue for the specific result in stage
120
. The select logic
170
will select commonly in an “oldest first” manner those instructions that will be issued to the execution units when all source data is available (i.e. source valid bits are ON). Once the select logic has selected the instruction that will be issued, the source address will be sent in the next cycle to the register file and the source data will be read from there,
130
. Finally, in the last cycle as shown in
FIG. 1
the execution
140
of the instruction is performed in an execution unit
190
thereby calculating the speculative result.
In
FIG. 2
the alternative pipeline scheme is shown. The difference is that in this case the data is read from the register file
260
directly after renaming
210
,
250
in case the source data is available. In stage
220
, the instruction is inserted, into the issue queue
270
, together with its source data read from the register file. It should be noted that the wake up logic
280
is required to firstly, set the valid bit of the source data and secondly, take care that the speculative results produced by the execution units
290
are written into the source data fields of the specific instruction that uses the speculative result as an input.
Both pipeline models are currently in use. The MIPS R10000, HP PA 8000 and the DEC 21264 are examples of processors that use the model shown in FIG.
1
. On the other hand, Intel Pentium, Power PC 604 and HAL SPARC64 are based on the model shown in FIG.
2
.
With the increasing number of circuits that fit onto a chip, processor designers enhance the performance of a processor by expanding the number of queue entries, by providing more execution units and especially, by designing the processor for a much higher frequency. Thereby, the trend in industry is especially towards very high frequency designs.
For processors with such a very high frequency target, the pipeline schemes shown in
FIGS. 1 and 2
are no longer applicable since the logic delay between the pipeline registers becomes too large to support the requested high frequency of operation. To support a much higher frequency the pipeline depth has to increase. For example, the pipeline shown in
FIG. 3
has been published in an article entitled “Intel Willamette Processor”, C″t Magazin, Vol 5, 2000, pp 16-17. The total pipeline has 20 stages, what is double the number of pipeline stages of its predecessor, the “Intel P6 processor (Pentium III).
The introduction of a much deeper pipeline has the advantage that the processor can run on a much higher frequency and therefore support a much higher throughput of the instructions. The drawback is, however, that the number of cycles needed for each Instruction to go through the pipeline also increases. Since the performance of the processor “MIPS (Millions Instruction per Second)” is equal to frequency divided by cycles per instructions (CPI) the performance gain by introducing a very deep pipeline remains limited.
Therefore, techniques that can reduce the pipeline length in performance critical cases are of great importance to increase the overall processor performance.
With reference to
FIG. 4
the IWB macros are shown schematically. In this processor, the so called Instruction Window Buffer (IWB) comprises a renaming logic
415
, an issue queue referred herein as reservation station (RS)
418
,
420
and amongst others a register buffer
425
referred to herein as ReOrder Buffer (ROB) for holding the speculative results. The architectural results are stored in a Register File
430
called Architectural Register Array (ARA). The reservation station, the ARA and the ROB are connected with a multiplexer unit
450
.
In
FIG. 5
the respective pipeline scheme is shown. The IWB implementation scheme uses the basic pipeline scheme of
FIG. 2
where the data is stored in the queue. It is, however, like the processor in ref 1 designed for a much higher frequenc

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Rename finish conflict detection and recovery does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Rename finish conflict detection and recovery, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Rename finish conflict detection and recovery will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3287673

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.