Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
1999-03-05
2001-10-16
Follansbee, John A. (Department: 2154)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S215000
Reexamination Certificate
active
06304959
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to a deeply pipelined superscalar processor and more particularly to one which employs aggressive branch and fetch prediction mechanisms.
BACKGROUND OF THE INVENTION
Superscalar processors employ aggressive techniques to exploit instruction-level parallelism. Wide dispatch and issue paths place an upper bound on peak instruction throughput. Large issue buffers are used to maintain a window of instructions necessary for detecting parallelism, and a large pool of physical registers provides destinations for all of the in-flight instructions issued from the window beyond the dispatch boundary. To enable concurrent execution of instructions, the execution engine is composed of many parallel functional units. The fetch engine speculates past multiple branches in order to supply a continuous instruction stream to the decode, dispatch, and execution pipelines in order to maintain a large window of potentially executable instructions.
The trend in superscalar design is to scale these techniques: wider dispatch/issue, larger windows, more physical registers, more functional units, and deeper speculation. To maintain this trend, it is important to balance all parts of the processor-any bottlenecks diminish the benefit of aggressive techniques.
Instruction fetch performance depends on a number of factors. Instruction cache hit rate and branch prediction accuracy have been long recognized as important problems in fetch performance and are well-researched areas.
Modern microprocessors routinely use a plurality of mechanisms to improve their ability to efficiently fetch past branch instructions. These prediction mechanisms allow a processor to fetch beyond a branch instruction before the outcome of the branch is known. For example, some mechanisms allow a processor to speculatively fetch beyond a branch before the branch's target address has been computed. These techniques use run-time history to speculatively predict which instructions should be fetched and eliminate “dead” cycles that might normally be wasted. Even with these techniques, current microprocessors are limited in fetching instructions during a clock cycle. As superscalar processors become more aggressive and attempt to execute many more instructions per cycle, they must also be able to fetch many more instructions per cycle.
High performance superscalar processor organizations divide naturally into an instruction fetch mechanism and an instruction execution mechanism. The fetch and execution mechanisms are separated by instruction issue buffer(s), for example, queues, and reservation stations, etc. Conceptually, the instruction fetch mechanism acts as a “producer” which fetches, decodes, and places instructions into a reorder buffer. The instruction execution engine “prepares” instructions for completions. The completion engine is the “consumer” which removes instructions from the buffer and executes them, subject to data dependence and resource constraints. Control dependencies (branches and jumps) provide a feedback mechanism between the producer and consumer.
As instruction fetch decode and dispatch pipelines become wider, it becomes important to optimize the translation from the complex instruction set with a large amount of implicit information to an explicit instruction set that does not require the use of architected registers. This is particularly true in situations where the internal instructions do not have a direct one to one relationship to the external instructions. This is typically done to facilitate faster cycle times, simplify design or reduce the execution and/or register resources required for that instruction's execution. This uneven expansion of instructions as they flow down the pipeline have the side effect of making it extremely difficult to track and correlate instructions in the fetch unit and the decode unit after they have been expanded.
As processors are optimized for higher frequency operation they are being implemented with more pipeline stages. This along with a requirement for more accurate branch and fetch prediction forces increased instructions to be passed further down the processor pipeline before fetch predictors or branch predictors can be resolved.
Speculative branch prediction requires that each branch instruction is denoted by some unique value or enumeration to allow canceling out instructions which were fetched, passed down the pipeline and then determined not to be from the proper instruction stream. Some processor architectures distinguish different branch types from one another, in this case multiple “branch tags” would be required.
Accordingly, what is needed is a mechanism for forming independently generated identifying branch tags in multiple units which can be used to track, flush, and cancel fetch groups and portions of fetch groups upon resolution of the fetch and branch predictors. The present invention addresses such a need.
SUMMARY OF THE INVENTION
A method and system for assigning unique branch tag (BTAG) values in a decode unit in a processing system are disclosed. The method and system comprise providing at least one BTAG value and incrementing the at least one BTAG value for each fetch group as required. The method includes allowing the decode unit to generate the appropriate BTAG values for all dispatch groups formed by instructions within the same fetch group.
In the preferred implementation, the BTAG values comprise a major branch tag and two minor branch tags, a count branch tag, and a link branch tag. The “seed” value for each of the BTAGs is provided each time a branch redirection occurs. Because the branches are passed to the decode unit with little or no processing by the instruction fetch unit, and conditions can cause the branch execution to be delayed, more branches could be decoded and processed than the number of branch entry queues in the instruction fetch unit. Therefore the value of the next entry in the branch entry queue is broadcast to the decode unit and whenever the current branch in the last stage of the decode unit is identical to the broadcast value, the decode unit ceases to process any output instructions until the broadcast value changes.
REFERENCES:
patent: 5742783 (1998-04-01), Azmoodeh et al.
patent: 5822575 (1998-10-01), Tran
patent: 5930508 (1999-07-01), Faraboschi et al.
patent: 5961636 (1999-10-01), Brooks et al.
patent: 6044450 (2000-03-01), Tsushima et al.
patent: 6092176 (2000-07-01), Iadonato et al.
patent: 6108774 (2000-08-01), Muthusamy
patent: 6170051 (2001-01-01), Dowling
Derrick John Edward
Konigsburg Brian R.
Levitan David Stephen
England Anthony V. S.
Follansbee John A.
International Business Machines - Corporation
Sawyer Law Group LLP
LandOfFree
Simplified method to generate BTAGs in a decode unit of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Simplified method to generate BTAGs in a decode unit of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Simplified method to generate BTAGs in a decode unit of a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2614857