Method and apparatus for implementing execution predicates...

Electrical computers and digital processing systems: processing – Architecture based instruction processing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S233000

Reexamination Certificate

active

06513109

ABSTRACT:

BACKGROUND
1. Technical Field
The present invention relates generally to computer processing systems and, in particular, to a method and apparatus for implementing execution predicates in a computer processing system.
2. Background Description
Early microprocessors generally processed instructions one at a time. Each instruction was processed using four sequential stages: instruction fetch; instruction decode; instruction execute; and result writeback. Within such microprocessors, different dedicated logic blocks performed each different processing stage. Each logic block waited until all the preceding logic blocks completed operations before beginning its operation.
Improved computational speed has been obtained by increasing the speed with which the computer hardware operates and by introducing parallel processing in one form or another. One form of parallel processing relates to the recent introduction of microprocessors of the “superscalar” type, which can effect parallel instruction computation. Typically, superscalar microprocessors have multiple execution units (e.g., multiple integer arithmetic logic units (ALUs)) for executing instructions and, thus, have multiple “pipelines”. As such, multiple machine instructions may be executed simultaneously in a superscalar microprocessor, providing obvious benefits in the overall performance of the device and its system application.
For the purposes of this discussion, latency is defined as the delay between the fetch stage of an instruction and the execution stage of the instruction. Consider an instruction which references data stored in a specified register. Such an instruction requires at least four machine cycles to complete. In the first cycle, the instruction is fetched from memory. In the second cycle, the instruction is decoded. In the third cycle, the instruction is executed and, in the fourth cycle, data is written back to the appropriate location.
To improve efficiency and reduce instruction latency, microprocessor designers overlapped the operations of the fetch, decode, execute, and writeback logic stages such that the microprocessor operated on several instructions simultaneously. In operation, the fetch, decode, execute, and writeback logic stages concurrently process different instructions. At each clock pulse the result of each processing stage is passed to the subsequent processing stage. Microprocessors that use the technique of overlapping the fetch, decode, execute, and writeback stages are known as “pipelined” microprocessors. In principle, a pipelined microprocessor can complete the execution of one instruction per machine cycle when a known sequence of instructions is being executed. Thus, it is evident that the effects of the latency time are reduced in pipelined microprocessors by initiating the processing of a second instruction before the actual execution of the first instruction is completed.
In general, instruction flow in a microprocessor requires that the instructions are fetched and decoded from sequential locations in a memory. Unfortunately, computer programs also include branch instructions. A branch instruction is an instruction that causes a disruption in this flow, e.g., a taken branch causes decoding to be discontinued along the sequential path, and resumed at a new location in memory. Such an interruption in pipelined instruction flow results in a substantial degradation in pipeline performance.
Accordingly, many pipelined microprocessors use branch prediction mechanisms that predict the existence and the outcome of branch instructions (i.e., taken or not taken) within an instruction stream. The instruction fetch unit uses the branch predictions to fetch subsequent instructions.
The pool of instructions from which the processor selects those that are dispatched at a given point in time is enlarged by the use of out-of-order execution. Out-of-order execution is a technique by which the operations in a sequential stream of instructions are reordered so that operations appearing later are executed earlier, if the resources required by the later appearing operations are free. Thus, out-of-order execution reduces the overall execution time of a program by exploiting the availability of the multiple functional units and using resources that would otherwise be idle. Reordering the execution of operations requires reordering the results produced by those operations, so that the functional behavior of the program is the same as what would be obtained if the instructions were executed in their original sequential order.
In general, there are two basic approaches to scheduling the execution of instructions: dynamic scheduling and static scheduling. In dynamic scheduling, the instructions are analyzed at execution time and the instructions are scheduled in hardware. In static scheduling, a compiler/programmer analyzes and schedules the instructions when the program is generated. Thus, static scheduling is accomplished through software. These two approaches can be jointly implemented.
Effective execution in pipelined architectures requires that the pipeline have a high utilization rate, i.e., that each unit in a pipeline is steadily executing instructions. Some operations cause a disruption of this utilization, such as branch instructions or unpredictable dynamic events such as cache misses.
A number of techniques have been developed to solve these issues. For example, branch prediction is utilized to eliminate or reduce the branch penalty for correctly predicted branches. Moreover, dynamic scheduling of instructions in out-of-order superscalar processors is utilized to maintain a high utilization rate even when events occur dynamically.
However, branch prediction does not fully eliminate the cost of branching, since even correctly predicted branches cause disruption for fetching new instructions. Furthermore, branches reduce the compiler's ability to schedule instructions statically. This degrades the performance of program execution, especially for implementations which execute instructions in-order, such as very long instruction word architectures (VLIW).
To address these problems, a technique referred to as predication has been introduced to eliminate branches for some code sequences. This technique replaces control flow instructions by conditionally executed instructions (called “predicated instructions”), which are executed if a particular condition (the “predicate”) is either TRUE or FALSE. For an article describing predicated execution, see “On Predicated Execution”, Park and Schlansker, Technical Report No. HPL-91-58, Hewlett-Packard, Palo Alto, Calif., May 1991.
Predication has been discussed as a strategy which reduces control dependencies into data dependencies. This is achieved by converting conditional branches into guards for each instruction on a conditional path. This process is called “if-conversion”. Predicated architectures offer benefits because they reduce the number of branches which must be executed. This is especially important in statically scheduled architectures where if-conversion allows for the execution of instructions on both paths of a branch and eliminates the branch penalty.
Consider the following example code sequence:
if (a < 0)
a−−;
else
a++;
Further, consider a translation of this code for a microprocessor without a predication facility, and with the variable “a” assigned to a general purpose register r
3
. For an architecture such as the IBM PowerPC™ architecture, this translates into an instruction sequence with two branches:
cmpwicr0, r3, 0
; compare a < 0
bc  false, cr0.1t, L1
; branch if a >= 0
addicr3, r3, −1
; a−−
b   L2
; skip else path
L1:
addicr3, r3, 1
; a ++
L2:
. . .
On a microprocessor similar to the IBM PowerPC™ but with a predication facility, this translated code sequence can be converted to a branch-free sequence. In the following example, predicated instructions are indicated by an if clause which specifies the predicate immediately following the instruction

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for implementing execution predicates... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for implementing execution predicates..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for implementing execution predicates... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3021201

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.