Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-01-22
2001-06-12
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S123000, C711S213000
Reexamination Certificate
active
06247097
ABSTRACT:
This invention generally deals with enabling processors to increase their speed of fetching instructions for executing programs. More specifically, this invention generates a fetch history table that directly operates with an instruction cache for delivering memory-accessed instructions to processor execution hardware (such as instruction execution pipelining hardware) for fetching multiple-predicted basic blocks in a single fetch cycle.
BACKGROUND
To understand this invention, it is essential that the basic block characteristic of all computer programs be understood as background. Programs .are linearly stored in a storage hierarchy, including in a processors main memory, which causes all instruction locations therein to have a linear characteristic. However, when these instructions are executed by the processor, the processor is required by the branch instructions in the program to use nonlinear sequencing of these same instructions linearly obtained from the hierarchy. Thus, the execution sequence of all computer programs is determined by the branch instructions contained in each program. The execution of each program breaks up the program into basic blocks, each starting with a target instruction of a branch instruction and ending with a branch instruction which provides the target address beginning the next basic block in the instruction execution sequence of the program. Any basic block may contain any number of instructions, from one instruction (a branch instruction) to a very large number of instructions (such as thousands of instructions). The processor fetches instructions fastest if they are sequentially located in memory, and slowest if the instruction is the target of a branch, requiring the processor to intervene and perform a target address calculation and often go elsewhere in memory to find the instruction, such as obtaining page faults that greatly slow the processing. Often, short basic blocks are responsible for great degradation in the performance of a processor.
An abundance of conditional branches in programs impede instruction fetching mechanism of modern processors. To support execution of significantly larger number of instructions per cycle, future microprocessors will have to support speculative fetching and execution of large number of instructions. Method described here can speculatively fetch instructions across multiple conditional branches (with different target addresses) in each cycle, based on dynamic multilevel branch predictions and code reorganization during compilation or instruction cache loading.
Over the last decade, microprocessor performance has increased at the rate of about 60% per year. To keep up the performance improvement rates, future microprocessors will have to execute (and commit) significantly larger number of instructions per cycle. Non-numerical workloads are ladened with conditional branches which makes the superscalar processor implementation difficult. In one study, it has been shown that average C programs have about 18% of their instructions being conditional branches and C++ pro grams have about 11% of their instructions being conditional branches (on RS/6000 platform). These limit the sizes of the basic blocks to only 6 to 10 instructions. These necessitates speculative execution beyond basic blocks.
Most processors employ sophisticated branch prediction mechanism to predict the path to be taken by a conditional branch and its target address. However, these are used only to predict the outcome of the next conditional branch to be executed. Due to small sizes of the basic blocks and the increased need for speculation, future microprocessors will have to predict the outcome of multiple branches in a single cycle with high degree of accuracy and be able to fetch instructions from the target addresses of these branch instructions in a single cycle.
To give a better idea, many path-based correlated dynamic branch prediction algorithm can provide branch prediction with accuracy as high as 97% for many non-numerical workload (such as SPECint). With such a high accuracy, predicting the outcome of four successive conditional branches can be made with an accuracy of 88.5%. Similarly, three successive conditional branches can be predicted with 91.3% and two successive conditional branches can be predicted with 94.1% accuracy. This means that, with an average basic block size of
6
instructions, the expected number of instructions speculatively executed that end up being in the taken path is 28.3 with four level of branch prediction (versus only 11.8 instructions with a single level of branch prediction).
As the branches gets further away from the current point of execution, the accuracy with which they can be predicted decreases. Ability to fetch a large number of instructions can greatly enhance the number of instructions that can be executed in a given cycle within the constraints of data, control and structural hazards.
Each computer program is comprised of a set of instructions stored in a permanent memory of a computer system in storage-location sequence, which may also be represented by a virtual-address sequence to a processor fetching the program for executing it. The fetched sequence of instructions is resolved by the processor into an instruction execution sequence which usually differs significantly from the program's storage-location sequence due to branch instructions in the programs.
Thus, programs are generally stored in a permanent computer memory (such as a hard disk) in their virtual-address sequence, which is generally used by processors to transfer portions of a program (such as in page units) into the computer system's random access memory, and then a processor fetches lines of the program in the memory using the virtual-addresses of the program for execution by the processor. The fetched lines often are put into an instruction cache in the executing processor.
Therefore, the instruction execution sequence of a program in the processor does not occur in the program's instruction compiled sequence, wherein the instruction execution sequence is determined by the branch instructions executed in each program, resulting in the program's architected instruction execution sequence. Not pertinent to this invention is the so-called out-of-sequence instruction execution found In some complex processors which nevertheless must follow the program's architected instruction execution sequence.
Each program architected execution sequence of instructions is dependent on the data used by a particular execution of the program, and varying data may unpredictably vary a program's execution sequence of instructions. The data will often control whether a branch instruction is taken or not taken. A not-taken branch instruction generates a target address to the next instruction in the virtual-address sequence of the program. A taken branch instruction generates a target address to a non-sequential instruction at any virtual address.
The instruction fetch mechanisms in all processors operate fastest as long as they are accessing sequential instructions in the virtual address sequence, because then each next instruction address is generated by merely incrementing the current instruction address to generate the next sequential instruction address, which often is the next sequential instruction in the same line in the processor cache (cache hit) from which it can be immediately provided to the processor for execution.
However, a taken branch may involve the additional overhead of having to fetch the target instruction at a location which is not currently in the cache (a cache miss), which then initiates a fetch cycle for copying another line containing that target instruction from the memory into the cache. It is well known that taken branch instructions slow down the fetch rate of instructions needed for program execution in any processor, due to the taken branch instructions deviating from the virtual-address sequence of instructions in a program.
Consequently program execution is slowe
Augspurger Lynn L.
Cutter Lawrence D.
Goldman Bernard M.
International Business Machines - Corporation
Kim Matthew
LandOfFree
Aligned instruction cache handling of instruction fetches... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Aligned instruction cache handling of instruction fetches..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Aligned instruction cache handling of instruction fetches... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2443898