Method and apparatus for reordering memory operations along...

Electrical computers and digital processing systems: processing – Processing control – Branching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

91

Reexamination Certificate

active

06381691

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to computer processing systems and, in particular, to method and apparatus for reordering load operations along multiple execution paths in a computer program. The invention is applicable to operations reordered when the program is generated (static reordering) as well as to operations reordered at execution time (dynamic reordering).
2. Background Description
Contemporary high-performance processors rely on superscalar, superpipelining, and/or very long instruction word (VLIW) techniques for exploiting instruction-level parallelism in programs (i.e., for executing more than one instruction at a time). In general, these processors contain multiple functional units, execute a sequential stream of instructions, are able to fetch from memory more than one instruction per cycle, and are able to dispatch for execution more than one instruction per cycle subject to dependencies and availability of resources.
The pool of instructions from which the processor selects those that are dispatched at a given point in time is enlarged by the use of out-of-order execution. Out-of-order execution is a technique by which the operations in a sequential stream of instructions are reordered so that operations appearing later are executed earlier, if the resources required by the later appearing operations are free. Thus, out-of-order execution reduces the overall execution time of a program by exploiting the availability of the multiple functional units and using resources that would otherwise be idle. Reordering the execution of operations requires reordering the results produced by those operations, so that the functional behavior of the program is the same as what would be obtained if the instructions were executed in their original sequential order.
In the case of memory-related operations, a memory load operation reads a datum from memory, loads it in a processor register, and frequently starts a sequence of operations that depend on the datum loaded. Thus, in addition to using idle resources, the early (out-of-order) initiation of memory load operations may hide delays in accessing memory, including potential cache misses.
In general, there are two basic approaches to implementing out-of-order execution and reordering of results: dynamic reordering and static reordering. In dynamic reordering, the instructions are analyzed at execution time, and the instructions and results are reordered in hardware. In static reordering, a compiler/programmer analyzes and reorders the instructions and the results produced by those instructions when the program is generated, thus the reordering tasks are accomplished through software. These two approaches can be jointly implemented.
In current processor designs, instructions which can be executed out-of-order typically come from a single instruction stream which is usually the most likely path. The likeliness of a path can be determined by a number of methods, such as static or dynamic branch prediction, runtime profiling, or synthetically generated probabilities. Instructions from the most likely path are executed out-of-order and, if this path is taken, then the execution time of the path is reduced. However, if another, less likely path is taken, then no benefit is obtained by the out-of-order execution. In fact, the out-of-order execution of instructions on the predicted path may actually increase the runtime if the prediction is incorrect. This execution approach is referred to as a “single path” execution approach, since out-of-order execution occurs along a single predicted path which is deemed most likely.
Thus, effective out-of-order execution using the single path approach requires that the predictability of the most likely path be sufficiently high to generate any gains. Note that even though the predictability of a single branch may be deemed high, following the execution of several branches progressively degrades the probability that a given point on the path may be reached.
To reduce the risk of mispredicting the path and losing all of the gains from out-of-order execution, it is more advisable to make instructions from multiple paths available for execution. Various schemes and designs have been described to either statically or dynamically select instructions from multiple paths for out-of-order execution. These approaches are referred to as “multi path” approaches, since out-of-order execution occurs along multiple likely paths of execution.
Consider the following code fragment:
a[16]=0;
if (i<j)
t=a[i]+1;
else
t=a[j]+1;
In a single path approach, it has to be determined whether the condition i<j is more likely to be true or false over the whole execution of the program. When i<j is true, out-of-order execution and speculation may generate the following code:
t′=a[i]+1;
a[16]=0;
if (i<j)
t=t′;
else
t=a[j]+1;
If a sufficient number of execution units are available, execution of this code may be improved for the expected path. However, if the alternate path is taken, no code improvements are achieved.
In a multipath scheme, instructions from both branches of the if statement can be moved out-of-order. This allows more aggressive code to be generated which will have good performance regardless of the branch direction (provided that enough execution units are available to execute the code from both branch paths without penalty):
t′=a[i]+1;
t″=a[j]+1;
a[16]=0;
if (i<j)
t=t′;
else
t=t″;
One factor that limits the ability to reorder operations is ambiguous memory references; this is the case when a memory load operation appears after a memory store operation in a sequential instruction stream, and it is not possible to determine ahead of time whether the memory locations accessed by the load and the store operations are different. For example, consider the following code fragment:
*X=(a+b+2)<<4
r=((*Y)+c){circumflex over ( )}d
wherein:
*X indicates the memory location whose address is contained in X;
<< indicates a left-shift operation; and
{circumflex over ( )} indicates an exclusive-or (XOR) operation.
Assuming that a, b, c, and d are values stored in registers r1 through r4 of a processor, respectively, and that X and Y are in registers r8 and r9, respectively, then this code fragment can be represented by the following instruction sequence (wherein the first register after the name of the instruction is the target register, and the remaining registers are the input operands):
add
r10,r1,r2
  ; r10 = a+b
add
r11,r10,2
  ; r11 = a+b+2
shift_left
r12,r11,4
; r12 = a+b+2<<4
store
r12,(r8)
; *X = a+b+2<<4
load
r20,(r9)
  ; r20 = *Y
add
r21,r20,r3
; r21 = *Y+c
xor
r22,r21,r4
; r22 = (*Y+c) {circumflex over ( )}d
If it can be determined that X and Y are different, then the two expressions can be scheduled for parallel execution, yielding a sequence as follows (wherein the symbol || denotes parallel execution):
add
r10, r1, r2 ||
load r20, (r9)
add
r11, r10, 2
shift_right
r12, r11, 4 ||
add r21, r20, r3
Store
r12, (r8) ||
xor r22, r21, r4
In a machine with two execution units, the above sequence would take 4 cycles to complete (assuming that a load takes two cycles, and other operations take a single cycle).
On the other hand, if it cannot be determined whether X and Y are always different, i.e., the addresses are ambiguous, then the two expressions would have to be scheduled in the original order, taking 8 cycles (assuming again that a load takes two cycles).
The above example is not atypical. Ambiguity in memory references degrades performance fairly severely by forcing the seq

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for reordering memory operations along... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for reordering memory operations along..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for reordering memory operations along... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2829467

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.