Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
2000-03-06
2004-05-04
Ellis, Richard L. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S207000
Reexamination Certificate
active
06732260
ABSTRACT:
FIELD
The present invention relates generally to microprocessors, and more specifically to microprocessors employing branch target prediction and prefetch mechanisms.
BACKGROUND
Many modern microprocessors have large instruction pipelines that facilitate high speed operation. “Fetched” program instructions enter the pipeline, undergo operations such as decoding and executing in intermediate stages of the pipeline, and are “retired” at the end of the pipeline. When the pipeline receives a valid instruction each clock cycle, the pipeline remains full and performance is good. When valid instructions are not received each cycle, the pipeline does not remain full, and performance can suffer. For example, performance problems can result from branch instructions in program code. If a branch instruction is encountered in the program and the processing branches to the target address, a portion of the instruction pipeline may have to be flushed, resulting in a performance penalty.
Branch Target Buffers (BTB) have been devised to lessen the impact of branch instructions on pipeline efficiency. A discussion of BTBs can be found in: David A. Patterson & John L. Hennessy, Computer Architecture A Quantitative Approach 271-275 (2d ed. 1990). A typical BTB application is also shown in FIG.
1
A.
FIG. 1A
shows BTB
10
coupled to instruction pointer (IP)
18
, and processor pipeline
20
. Also included in
FIG. 1A
are cache
30
and fetch buffer
32
.
The location of the next instruction to be fetched is specified by IP
18
. As execution proceeds in sequential order in a program, IP
18
increments each cycle. The output of IP
18
drives port
34
of cache
30
and specifies the address from which the next instruction is to be fetched. Cache
30
provides the instruction to fetch buffer
32
, which in turn provides the instruction to processor pipeline
20
. Fetch buffer
32
typically has a latency associated therewith, herein referred to as “Icache latency.”
When instructions are received by pipeline
20
, they proceed through several stages shown as fetch stage
22
, decode stage
24
, intermediate stages
26
, and retire stage
28
. Information on whether a branch instruction results in a taken branch is typically not available until a later pipeline stage, such as retire stage
28
. When BTB
10
is not present and a branch is taken, fetch buffer
32
and the portion of instruction pipeline
20
following the branch instruction hold instructions from the wrong execution path. The invalid instructions in processor pipeline
20
and fetch buffer
32
are flushed, and IP
18
is written with the branch target address. A performance penalty results, in part because the processor waits while fetch buffer
32
and instruction pipeline
20
are filled with instructions starting at the branch target address. The performance penalty is roughly equal to the sum of the Icache latency and the processor pipeline latency.
Branch target buffers lessen the performance impact of taken branches. BTB
10
includes records
11
, each having a branch address (BA) field
12
and a target address (TA) field
14
. TA field
14
holds the branch target address for the branch instruction located at the address specified by the corresponding BA field
12
. When a branch instruction is encountered by processor pipeline
20
, the BA fields
12
of records
11
are searched for a record matching the address of the branch instruction. If found, IP
18
is changed to the value of the TA field
14
corresponding to the found BA field
12
. As a result, instructions are next fetched starting at the branch target address. This mechanism is commonly referred to as “branch target prefetch.”
Branch target prefetch can occur while the branch instruction is still early in the processor pipeline, such as in decode stage
24
. In this case, when the predicted branch is actually taken, the latency is reduced from the sum of the Icache latency and the processor pipeline latency described above; however, the penalty associated with fetch buffer
32
(Icache latency) remains.
The latency associated with the use of BTB
10
is shown in FIG.
1
B. In region
60
, the processor pipeline has filled, and performance is good. In region
70
, a branch is taken, and the fetch buffer is flushed and refilled. As shown in region
70
, performance drops as the pipeline is flushed, and then performance is regained as the pipeline is refilled. Performance drops during latency period
50
. Latency period
50
is a function of the fetch buffer depth and the relative speeds of the processor pipeline and the cache. As the processor pipeline increases in speed, latency period
50
increases when expressed as a number of cycles.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for an alternate method and apparatus for branch target prefetch.
REFERENCES:
patent: 5163140 (1992-11-01), Stiles et al.
patent: 5412799 (1995-05-01), Papadopoulos
patent: 5506976 (1996-04-01), Jaggar
patent: 5515518 (1996-05-01), Stiles et al.
patent: 5742804 (1998-04-01), Yeh et al.
patent: 5764940 (1998-06-01), Mallick et al.
Microsoft Company. Microsoft® Computer Dictionary Fourth Edition. Microsoft Press, ©1999. P377.*
Microsoft Company. Microsoft® Computer Dictionary Fourth Edition. Microsoft Press, ©1999. pp. 56 and 60.*
Seznec, A., et al., “Multiple-Block Ahead Branch Predictors”,ASPLOS VII, 116-127, (Oct. 1996).
Yeh, T., et al., “Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache”,1993 ACM, 67-76, (1993).
Grochowski Edward T.
Kling Ralph
Ramakrishnan Kalpana
Wang Hong
LandOfFree
Presbyopic branch target prefetch method and apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Presbyopic branch target prefetch method and apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Presbyopic branch target prefetch method and apparatus will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3208848