Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
2001-02-16
2002-05-28
Pan, Daniel H. (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
C712S239000, C712S245000, C711S220000, C711S213000
Reexamination Certificate
active
06397326
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to microprocessors, and more particularly to a circuit and method for preloading a prediction circuitry within the microprocessor.
2. Description of the Relevant Art
Microprocessor based computer systems have become prevalent in today's society. The increasing use of computer systems in large part is related to advances in semiconductor chip technology, which technology is increasing circuit densities so that microprocessors can be implemented on one or a very small number of semiconductor chips. Additionally, speeds within microprocessors are increasing with the use of scalar computation with superscalar technology being the next logical step in the evolution of microprocessor. The term superscalar describes an implementation that improves performance by a concurrent execution of scalar instructions. Scalar instructions are the type of instruction typically found in general purpose microprocessors. Using today's semiconductor processing technology, a single microprocessor chip can incorporate high performance techniques that were once applicable only to large scale scientific processors.
Microprocessors run application programs. An application program comprises a group of instructions. In running application programs, microprocessors receive and execute instructions in some sequence. There are several steps generally performed by the microprocessor in executing a single instruction, including: fetching the instruction, decoding the instruction, assembling the operands required by the instruction, performing the operations specified by the instructions, and writing the results of the instruction to storage. These steps are controlled by a periodic clock signal. The period of the clock signal is the processor cycle time.
The time taken by a microprocessor to complete a program is determined by at least three factors: the number of instructions required to execute the program, the average number of processor cycles required to execute an instruction, and the processor cycle time. Microprocessor performance is improved by reducing the time taken by the microprocessor to complete the program, which dictates reducing one or more of these three factors.
One way to improve the performance of microprocessors is by overlapping the steps of different instructions, using a technique called pipelining. In pipelining, the various steps of instruction execution described above are performed by independent units called pipelined stages. Pipelining reduces the average number of cycles to execute an instruction, though not the total amount of time required to execute an instruction, by overlapping instructions and thus permitting the processor to handle more than one instruction at a time. Pipelining reduces the average number of cycles per instruction by as much as a factor of three. However, when executing a conditional branch instruction, the pipeline may sometimes stall until the result (resolution) of the conditional branch operation is known and the correct next instruction is fetched for execution. This stall is known as branch delay penalty and is a limiting factor in the speed enhancing effects of pipelining.
A typical pipelined scalar microprocessor executes one instruction per processor cycle. A superscalar microprocessor further reduces the average number of cycles per instruction beyond what is possible in a pipelined scalar processor, by concurrent execution of several instructions in different pipelines. While superscalar processors are simple in theory, there is more to achieving increased performance than simply increasing the number of pipelines. Increasing the number of pipelines makes it possible to execute more than one instruction per cycle, but there is no guarantee that any given sequence instructions can take advantage of the capability. Instructions are not always independent of one another, but are often dependent. Instruction dependencies can be either data dependent or control dependent. A control dependency occurs when a control decision, such as for example, a conditional branch decision must be made before subsequent instructions can be executed.
Branch prediction mechanisms are often employed in superscalar microprocessor to predict the outcome of a conditional branch before its resolution. Once a branch prediction is made the microprocessor pursues the likely execution path prior to decode and subsequent execution of the conditional branch instruction. At any point within the path of instruction execution, if the microprocessor determines that a prior prediction was incorrect, the microprocessor backs up in the instruction stream and proceeds down the correct path. There is a penalty from employing branch prediction mechanisms within a microprocessor. The penalty relates to wasted time associated with instructions completed after the conditional branch is predicted but before the branch outcome is actually determined. These completed instructions are discarded after a branch misprediction, and the time the microprocessor spent executing them is wasted.
Dynamic branch prediction mechanisms generally include a branch prediction unit. Several different dynamic branch prediction mechanisms have been studied extensively. One mechanism involves a technique referred to as bimodal branch prediction. In bimodal branch prediction, a prediction is made based on the direction the particular branch went the last few times the particular branch was executed. It is possible that more accurate predictions can be made using more history for the branch instruction. Another mechanism considers the history of each branch independently and takes advantage of repetitive patterns. This technique is referred to a local branch prediction. Another technique uses the combined history of all recent branches in making a prediction. This technique is often referred to as global branch prediction.
In the global branch prediction technique, a dedicated N-bit shift register is used to record the resolution of the most recent N conditional branches. When a branch instruction is encountered, the contents of the shift register are subsequently used to access, directly or indirectly, a branch history table that stores a plurality of two bit counters. As will be more fully described below, the most significant bit of the accessed two bit counter defines the prediction for the encountered branch instruction. The prediction (logical one for taken or logical zero for not taken) is then shifted into the shift register. After resolution of the branch instruction, the appropriate counter in the table is incremented, if the branch instruction is resolved as taken. Likewise, for each not taken branch, the appropriate counter is decremented. The counter is saturating such that the counter is not decremented past zero nor incremented past three. Furthermore, the contents of the shift register is corrected in the event of misprediction.
As noted above, the most significant counter bit determines the prediction. Repeatedly taken branches will be predicted to be taken, and repeatedly not taken branches will be predicted to be not taken. By using the two bit counter, a prediction scheme can tolerate a branch going an unusual direction one time and keep predicting the usual branch direction.
A variation of the global branch prediction involves the use of the branch instruction address or program counter (pc) of the branch instruction. This technique is often referred to as global branch prediction with index selection. In this scheme, the branch history table is indexed with a concatenation of the shift register contents and the pc. Another variation of the global branch prediction technique involves XORing the pc with the contents of the shift register, the results of which are used to access the branch history table for a particular counter. This technique is often referred to as global prediction with index sharing.
Most hardware prediction schemes use one of the global prediction techniques discussed abov
Gupta Amit R.
Horton David C.
Conley Rose & Tayon PC
Merkel Lawrence J.
Pan Daniel H.
LandOfFree
Method and circuit for preloading prediction circuits in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and circuit for preloading prediction circuits in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and circuit for preloading prediction circuits in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2902907