Electrical computers and digital processing systems: processing – Dynamic instruction dependency checking – monitoring or... – Reducing an impact of a stall or pipeline bubble
Reexamination Certificate
2000-01-18
2003-07-08
Pan, Daniel H. (Department: 2183)
Electrical computers and digital processing systems: processing
Dynamic instruction dependency checking, monitoring or...
Reducing an impact of a stall or pipeline bubble
C712S023000, C712S245000, C712S237000
Reexamination Certificate
active
06591360
ABSTRACT:
FIELD OF INVENTION
The invention relates to computers and superscalar, pipelined microprocessors. More particularly, this invention relates to the method and apparatus for improving the performance of pipelined microprocessors.
BACKGROUND OF THE INVENTION
Typical computer systems have a number of common components. These components, as seen in
FIG. 1
, include a CPU, a bus, memory, and peripheral devices. In high-speed computers, the CPU may be a superscalar, pipelined microprocessor. As shown in
FIG. 2
, a superscalar, pipelined microprocessor can include an instruction fetch unit, multiple pipelines, and a centralized data-dependency hazard detection mechanism. The instruction fetch unit fetches instructions and forwards them to a pipeline. In the pipeline, the instructions flow through multiple pipeline stages, after which the results of the instructions are committed to an architectural state (i.e., memory).
The stages in a standard pipelined microprocessor may include: a rename register identification or instruction decode stage (“REN”); a register reading or operand fetch stage (“REG”); a first instruction execution stage (“EX1”); a second instruction execution stage (“EX2”); and a write-back stage (“WRB”). A pipelined microprocessor performs parallel processing in which instructions are executed in an assembly-line fashion. Consecutive instructions are operated upon in sequence, but several instructions are initiated before a first instruction is complete. In this manner, instructions step through each stage of a particular pipeline, one instruction per stage per pipeline at a time. For example, a first instruction is fetched and then forwarded to the REN stage. When the first instruction is finished in the REN stage, i.e., it is decoded and the instruction's register identification (“RegID”) is renamed from virtual to real space, it is forwarded to the REG stage and a second instruction is fetched and forwarded to the REN stage. This process continues until each instruction makes its way through every stage of the pipeline. However, in some situations, as discussed below, it is necessary to stall an instruction or multiple instructions in the pipeline. Stalling an instruction involves holding the instruction in a stage of the pipeline until the situation is resolved and the stall is no longer asserted.
Instructions in pipelined microprocessors are producers and consumers. In a pipelined microprocessor, one instruction in an earlier stage (e.g., REG) may be dependent (a consumer) on data from an instruction (producer) in a later stage (e.g., EX
1
or EX
2
). A producer is an instruction generating data, such as an add instruction. A target register is where the producer is going to write the results (destination operands) of the add. There may be a following add instruction which is earlier in the pipeline—earlier means it is a younger instruction in program order' that takes the results of the first add instruction from the target register (its source register) and adds it to something else, creating a second result. Therefore, the second add instruction is a consumer, and the relationship between the consumer and the producer is called a data-dependency. The process of the consumer reading data from its source register is known as consumer operand generation.
Often times it takes an instruction multiple stages or cycles before it completes its operation and the data generated by the instruction is available. This delay or latency can vary from instruction to instruction, with simple instructions taking one stage (one-cycle latency) and complex instructions taking multiple stages (multiple-cycle latency). If a producer has multiple-cycle latency, then its data will not be available to the consumer until the producer moves to a later stage and completes its operation. Such a situation is called a data-dependency hazard, and if a code segment is written with the consumer immediately following the producer or otherwise not separated by enough pipeline stages from the producer, the hardware has to detect the data-dependency hazard. In this situation, the hardware must stall the consumer in some pipeline stage until the producer can make its data available.
As illustrated in
FIG. 2
, conventional superscalar pipelined designs have a centralized data-dependency hazard detection mechanism whose output is a stall signal. This stall signal is a global stall that effectively holds the consumer in the EX
1
stage, the stage where the consumer is waiting for its source operands because the global stall does not issue until the consumer has moved from the REG stage. The global stall applies to all pipelines and all stages prior to and including the stage in which the data-dependency hazard is detected. The centralized data-dependency hazard detection circuitry detects all possible consumer-producer data-dependency hazards. The global stall signal that is generated must traverse earlier pipeline stages—to stall something in the REG stage, the stall must traverse any prior stages, such as the REN stage. Likewise, the global stall signal must traverse the physical dimensions of the CPU to move back across stages. The distance alone across the die of a CPU can be relatively long, and there are usually a large number of stages.
Accordingly, arrival of the global stall signal at any one point may be late in a cycle, giving late notice of a stall. The resulting late notice increases when additional pipelines are added because it takes a non-linear increase in the amount of logic to generate the global stall as the number of pipelines is increased. This non-linear calculation is a function of the number of source operands by the width or number of pipelines by the depth of the pipelines (or number of stages). Consequently, faster circuitry is required with the global stall in order to operate at intended frequencies. This circuitry can limit the entire CPU frequency of operation.
SUMMARY OF THE INVENTION
The present invention is a method and apparatus that generates a localized and simplified version of the global stall (“a local stall”) and uses the local stall to improve the operation of a pipelined microprocessor. The invention locates simplified data-dependency hazard detection nearer to the consumer operand generation than the centralized data-dependency hazard detection, thereby overcoming the inherent problems in centralized data-dependency hazard detection discussed above. The simplified data-dependency hazard detection reuses existing circuitry from a data forwarding architecture to generate a local stall. The data forwarding architecture performs calculations necessary to forward the data generated by producer instructions to consumer instructions. Accordingly, the simplified data-dependency hazard detection can generate a local stall with a very limited increase of logic by re-using data forwarding circuitry.
In an embodiment, the simplified data-dependency hazard detection performs operations on a local consumer re-using comparators used in the data forwarding calculations. These comparators compare pipeline producer RegIDs with the local consumer RegID to detect data dependencies. The producer RegIDs are the register addresses or identifiers for the target or destination register to which the producer is going to write. Likewise, the consumer RegID is the register address or identifier of the source register from which the consumer is going to read. If a producer destination register and the consumer source register match, there is a data-dependency between the producer and the consumer.
After determining that the consumer is data-dependent on a producer(s) (the “matched producer(s)”), the apparatus of the present invention evaluates the matched producer(s) to determine if their data is available yet. If the matched producer(s)' data is not available, then there is a data-dependency hazard and a local stall will be generated.
The simplified data-dependency hazard detection need only be concerned about its consumer across all producers. Specifically, the simp
Bhatia Rohit
Gibson Mark
Soltis, Jr. Donald C.
Hewlett-Packard Development Company
Pan Daniel H.
LandOfFree
Local stall/hazard detect in superscalar, pipelined... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Local stall/hazard detect in superscalar, pipelined..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Local stall/hazard detect in superscalar, pipelined... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3090720