Local stall/hazard detect in superscalar, pipelined...

Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S219000, C712S245000, C711S125000

Reexamination Certificate

active

06587940

ABSTRACT:

FIELD OF INVENTION
The invention relates to computers and superscalar, pipelined microprocessors. More particularly, this invention relates to the method and apparatus for improving the performance of pipelined microprocessors.
BACKGROUND OF THE INVENTION
Typical computer systems have a number of common components. These components, as seen in
FIG. 1
, include a CPU, a bus, memory, and peripheral devices. In high-speed computers, the CPU may be a superscalar, pipelined microprocessor. As shown in
FIG. 2
, a superscalar, pipelined microprocessor can include an instruction fetch unit, multiple pipelines, and a centralized data-dependency hazard detection mechanism. The instruction fetch unit fetches instructions and forwards them to a pipeline. In the pipeline, the instructions flow through multiple pipeline stages, after which the results of the instructions are committed to an architectural state (i.e., memory).
The stages in a standard pipelined microprocessor may include: a rename register identification or instruction decode stage (“REN”); a register reading or operand fetch stage (“REG”); a first instruction execution stage (“EX1”); a second instruction execution stage (“EX2”); and a write-back stage (“WRB”). A pipelined microprocessor performs parallel processing in which instructions are executed in an assembly-line fashion. Consecutive instructions are operated upon in sequence, but several instructions are initiated before a first instruction is complete. In this manner, instructions step through each stage of a particular pipeline, one instruction per stage per pipeline at a time. For example, a first instruction is fetched and then forwarded to the REN stage. When the first instruction is finished in the REN stage, i.e., it is decoded and the instruction's register identification (“RegID”) is renamed from virtual to real space, it is forwarded to the REG stage and a second instruction is fetched and forwarded to the REN stage. This process continues until each instruction makes its way through every stage of the pipeline. However, in some situations, as discussed below, it is necessary to stall an instruction or multiple instructions in the pipeline. Stalling an instruction involves holding the instruction in a stage of the pipeline until the situation is resolved and the stall is no longer asserted.
Instructions in pipelined microprocessors have producers and consumers. In a pipelined microprocessor, one instruction in an earlier stage (e.g., REG) may be dependent (a consumer) on data from an instruction (producer) in a later stage (e.g., EX
1
or EX
2
). A producer is an instruction generating data, such as an add instruction. A target register is where the producer is going to write the results (destination operands) of the add. There may be a following add instruction which is earlier in the pipeline—earlier means it is a younger instruction in program order—that takes the results of the first add instruction from the target register (its source register) and adds it to something else, creating a second result. Therefore, the second add instruction is a consumer, and the relationship between the consumer and the producer is called a data-dependency. The process of the consumer reading data from its source register is known as consumer operand generation.
Often times it takes an instruction multiple stages or cycles before it completes its operation and the data generated by the instruction is available. This delay or latency can vary from instruction to instruction, with simple instructions taking one stage (one-cycle latency) and complex instructions taking multiple stages (multiple-cycle latency). If a producer has multiple-cycle latency, then its data will not be available to the consumer until the producer moves to a later stage and completes its operation. Such a situation is called a data-dependency hazard, and if a code segment is written with the consumer immediately following the producer or otherwise not separated by enough pipeline stages from the producer, the hardware has to detect the data-dependency hazard. In this situation, the hardware must stall the consumer in some pipeline stage until the producer can make its data available.
As illustrated in
FIG. 2
, conventional superscalar pipelined designs have a centralized data-dependency hazard detection mechanism whose output is a stall signal. This stall signal is a global stall that effectively holds the consumer in the EX
1
stage, the stage where the consumer is waiting for its source operands because the global stall arrives after the consumer has moved from the REG stage. The global stall applies to all pipelines and all stages prior to and including the stage in which the data-dependency hazard is detected. The centralized data-dependency hazard detection circuitry detects all possible consumer-producer data-dependency hazards. The global stall signal that is generated must traverse earlier pipeline stages—to stall something in the REG stage, the stall must traverse any prior stages, such as the REN stage. Likewise, the global stall signal must traverse the physical dimensions of the CPU to move back across stages. The distance alone across the die of a CPU can be relatively long, and there are usually a large number of stages.
Accordingly, arrival of the global stall signal at any one point may be late in a cycle, giving late notice of a stall. The resulting late notice increases when additional pipelines are added because it takes a non-linear increase in the amount of logic to generate the global stall as the number of pipelines is increased. This non-linear calculation is a function of the number of source operands by the width or number of pipelines by the depth of the pipelines (or number of stages). Consequently, faster circuitry is required with the global stall in order to operate at intended frequencies. This circuitry can limit the entire CPU frequency of operation.
Another problem with the late arrival of the global stall is that it necessitates a complete recalculation of data-forwarding architecture, including a register file re-read to ensure correct operand data. If consumer instructions Y
1
and Y
2
are in the REG stage when a data-dependency hazard occurs for an operand of Y
1
, the global stall may not arrive or be asserted until Y
1
and Y
2
are already in a later stage, such as the EX
1
stage. Since there was a data-dependency hazard for an operand of Y
1
when Y
1
was in the REG stage and Y
1
was forwarded to the EX
1
stage before the global stall arrived (i.e., before the producer instruction finished its computation and made its data available for Y
1
), the data in Y
1
is incorrect. Y
1
will receive the correct data from its producer via the data-forwarding architecture when the producer data is available.
Since there was no data-dependency hazard for Y
2
, the data for Y
2
is correct. However, since the global stall does not indicate in which pipeline nor for which instruction in REG the data-dependency hazard occurred, the operand data for each instruction forwarded to the EX
1
stage must be re-read during every cycle of the global stall to ensure correct data. Consequently, despite the fact that Y
2
read the correct data while in REG, Y
2
must re-read the register and re-compute during every cycle of the stall.
Re-reading is problematic considering that there are multiple source registers for each pipeline. Therefore, if there are six execution pipelines and two source operands per instruction, there are a total of twelve different register values which must be read from the register file. These registers values will be used unless there is data-forwarding from a producer in a later stage of pipeline. As discussed above, data-forwarding is performed for the consumer with the data-dependency hazard. The data-forwarding architecture performs calculations necessary to forward the data generated by producer instructions. If a producer is in-flight, it has not written to the register file yet and a consumer can read directly from the producer when it

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Local stall/hazard detect in superscalar, pipelined... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Local stall/hazard detect in superscalar, pipelined..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Local stall/hazard detect in superscalar, pipelined... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3080648

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.