Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1999-06-30
2002-05-07
Pan, Daniel H. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S239000, C712S237000, C712S240000, C712S215000, C712S206000, C712S023000, C711S125000, C711S135000
Reexamination Certificate
active
06385719
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates in general to data processing systems and in particular to a processor in a data processing system. More particularly, the present invention relates to synchronizing parallel pipelines in a superscalar processor.
2. Description of the Related Art
Reduced instruction set computer (“RISC”) processors are employed in many data processing systems and are generally characterized by high throughput of instructions. RISC processors usually operate at a high clock frequency and because of the minimal instruction set do so very efficiently. In addition to high clock speed, processor efficiency is improved even more by the inclusion of multiple execution units allowing the execution of two, and sometimes more, instructions per clock cycle.
Processors with the ability to execute multiple instructions per clock cycle are described as “superscalar.” Superscalar processors, such as the PowerPC™ family of processors available from IBM Corporation of Armonk, N.Y., provide simultaneous dispatch of multiple instructions. Included in the processor are an Instruction Cache (“IC”), an Instruction Dispatch Unit (“DU”), at least one Execution Unit (“EU”) and a Completion Unit (“CU”). Generally, a superscalar, RISC processor is “pipelined,” meaning that a second instruction group is waiting to enter the execution unit(s) as soon as the previous instruction group is finished.
In a superscalar processor, instruction processing is usually accomplished in six stages—fetch, decode, dispatch, execute, writeback and completion. The fetch stage is primarily responsible for fetching instructions utilizing the Instruction Fetch Unit (IFU) from the instruction cache and determining the address of the next instruction to be fetched. The decode stage generally handles all time-critical instruction decoding for instructions in the instruction buffer. The dispatch stage (utilizing DU) is responsible for non-time-critical decoding of instructions supplied by the decode stage and for determining which of the instructions can be dispatched in the current cycle.
The execute stage executes the instruction selected in the dispatch stage, which may come from the reservation stations or from instructions arriving from dispatch. The write back stage is used to write back any information from the rename buffers that is not written back by the completion stage. The completion stage maintains the correct architectural machine state by considering instructions residing in the completion buffer and utilizes information about the status of instructions provided by the execute stage.
Pipelined superscalar processors provide for out-of-order execution of instructions but utilize in-order fetch and completion to maintain sequential consistency of the instruction stream. Pipelining allows high operating frequencies at the cost of start-up latencies. To minimize pipeline latencies, the processor predicts the next pipeline state. When the processor is correct, no additional latencies are introduced. When a prediction is wrong, the pipeline must be restored to the correct state. Generally, instruction queues in the pipeline help absorb latencies by supplying queued data during upstream flush and re-fetch events.
In complex superscalar processors utilizing multiple pipelines it is critical that the pipelines be synchronized with each other. For example, if there are two pipelines operating and working in parallel, i.e., a normal instruction pipeline and a separate pipeline for branch state instructions, the instruction pipeline must not get ahead of the branch pipeline or a branch could execute before its state is available. The branch pipeline can detect and flush/invalidate conditions in the instruction pipeline.
Instructions provided from an Instruction Fetch Unit to an Instruction Decode Unit (IDU) can be invalidated quite late in the decode section of the instruction pipeline. This may occur when more branches are fetched than can be processed per clock cycle. If the fetch predictor is determined to be in error, all later instructions in the instruction pipeline must be cleared. No internal operations (IOPs) may pass from the decode section of the instruction pipeline to the dispatch unit before it is determined that the branch state will be available before the branch executes. A branch predictor utilizes additional information (history, address, etc.) about an instruction to improve the probability of a correct prediction and the fetch predictor simply uses the next sequential instruction without information on the instructions to be retrieved. The delay associated with determining whether the fetch prediction matches the branch prediction or that the fetch prediction is wrong and the decode pipeline must be flushed, slows up instruction processing and becomes a bottleneck that is undesirable in a complex processor.
It would be desirable therefore, to provide a method of synchronizing parallel pipelines, in addition to supplying queued data, to assure that branch executions are accomplished with correct information.
SUMMARY OF THE INVENTION
It is therefore one object of the present invention to provide a method and apparatus that will prevent premature execution of Internal Operations in parallel pipelines.
It is another object of the present invention to provide a method and apparatus for invalidating Internal Operations and individual instructions in parallel pipelines after a mis-predicted fetch operation.
It is a further object of the present invention to provide a method and apparatus for reducing instruction validation steps.
The foregoing objects are achieved as is now described. A transfer tag is generated by the Instruction Fetch Unit and passed to the decode unit in the instruction pipeline with each group of instructions fetched during a branch prediction by a fetcher. Individual instructions within the fetched group for the branch pipeline are assigned a concatenated version (group tag concatenated with instruction lane) of the transfer tag which is used to match on requests to flush any newer instructions. All potential instruction or Internal Operation latches in the decode pipeline must perform a match and if a match is encountered, all valid bits associated with newer instructions or internal operations upstream from the match are cleared. The transfer tag representing the next instruction to be processed in the branch pipeline is passed to the Instruction Dispatch Unit. The Instruction Dispatch Unit queries the branch pipeline to compare its transfer tag with transfer tags of instructions in the branch pipeline. If the transfer tag matches a branch instruction tag the Instruction Decode Unit is stalled until the branch instruction is processed thus, providing a synchronizing method for the parallel pipelines.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.
REFERENCES:
patent: 5142634 (1992-08-01), Fite et al.
patent: 5649225 (1997-07-01), White et al.
patent: 5764946 (1998-06-01), Trane et al.
Derrick John Edward
Eisen Lee Evan
Konigsburg Brian R.
Levitan David Stephen
Bracewell & Patterson L.L.P.
England Anthony V. S.
International Business Machines - Corporation
Pan Daniel H.
LandOfFree
Method and apparatus for synchronizing parallel pipelines in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for synchronizing parallel pipelines in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for synchronizing parallel pipelines in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2867243