Loop allocation for optimizing compilers

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S149000, C717S150000, C717S151000, C717S157000

Reexamination Certificate

active

06651246

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field
The present invention is directed to an improvement in computing systems and in particular to computer systems which provide for optimized loop code generation in the compilation of computer programs.
2. Prior Art
Optimizing compilers permit efficient object code to be emitted given a particular piece of source code to be compiled. Source code which includes loops is typically the subject of optimization in compilers. For a given segment of source code containing loops and for a given target machine micro architecture, cache geometry and parallel processing capability, the loop allocation of an optimizing compiler will be used to attempt to determine a collection of object code loop nests which will give efficient execution at an acceptable compilation-time cost.
Loop allocation optimization found in known compilers typically relies upon a set of ordered loop allocation transformations, as well as optimizations for data locality and parallelism. For example, loop source code may be optimized by emitting source code which minimizes off-chip access when the loop object code is executed. Another optimization for loop source code is to emit object code which may be executed in parallel by a multi-processor machine.
Typically, prior art optimizing compilers which carry out loop allocation include loop distribution early in the set of transformations, followed by parallelism and data locality transformations and finish with loop fusion and array contraction as a cleanup phase.
In prior art optimizing compilers, nested loops are optimized on a loop-by-loop basis. A prior art approach to optimizing sibling loops is to merge such nests. This approach is described by Sarkar, V. and Gao, G. R., “Optimization of Array Accesses by Collective Loop Transformations,” 5
th
International Conference on Supercomputing, Cologne, Germany, June 1991, pp. 194-205.
This prior art approach involves a profitability and correctness test for the merger of the sibling loops. The optimization determines first if fusion of the sibling loops is desirable (a profitability analysis). Another prior art approach to loop optimization is to first distribute the loop code, to then optimize the distributed code and then to fuse the code after optimization.
Each of the above approaches to optimization involves optimizations of the loop code independent of, or following, loop distribution steps. Where nested loops are optimized on a loop-by-loop basis, optimizing which may be possible due to relationships between code in different nested loops may be missed. Similarly, where the loop code is distributed, optimized and then fused, the optimization is carried out on distributed portions of the code and interrelationships between those sections of code may not be considered in the optimization.
It is therefore desirable to have a computer system which carries out the loop allocation in an optimized compiler without accomplishing the loop distribution step at an early point in the sequence of loop transformations.
SUMMARY OF THE INVENTION
According to one aspect of the present invention, there is provided an improved system for the optimization of loop code compilation.
According to another aspect of the present invention, there is provided a computer program product for compilation of a source code segment. The computer program product has instruction means to generate a program dependence graph for the source code segment. The program dependence graph includes a control dependence graph and a data dependence graph. Each of the control dependence graph and the data dependence graph have nodes, each node in the data dependence graph is associated with one or more statements in the source code segment. There is also instruction means to generate an interference graph from the data dependence graph, with instruction means for deriving nodes for the interference graph from the nodes in the data dependence graph. The nodes in the interference graph are thereby each associated with one or more statements in the source code segment. There is instruction means for generating a node weight for each node in the interference graph, each node having a node weight reflecting the resource usage for the one or more statements associated with the node.
There is also instruction means for generating edges for the interference graph, each edge connecting a pair of nodes in the interference graph, with instruction means for generating an associated edge weight for each edge reflecting the desirability of maintaining the one or more statements associated with each of the pair of nodes connected by the edge within the same loop. There is also provided instruction means for partitioning the interference graph into subgraphs based on the edge weights and the node weights of the interference graph, and instruction means for emitting code conforming to the partitioned interference graph.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for partitioning the interference graph comprises instruction means for first conducting a profitability test to select a pair of nodes in the interference graph, instruction means for then conducting a correctness test on the selected pair, and instruction means for merging the selected pair of nodes into a coalesced node where the correctness test is satisfied for the selected pair.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for conducting a profitability test comprises instruction means for selecting the pair of nodes in the interference graph having the highest associated edge weight, the selected pair of nodes having a sum of node weights lower than a pre-defined resource limit for a target machine for the compiler.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for conducting a correctness test comprises instruction means for comparing the selected pair of nodes in the interference graph with nodes in the interference graph corresponding to nodes in the data dependence graph defined to be reachable by the data dependence graph from those nodes in the data dependence graph reachable from the nodes in the data dependence graph corresponding to the selected pair of nodes in the interference graph.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for comparing nodes comprises instruction means for defining a test set by generating a merged reachability set by taking the union of the nodes reachable from the selected pair of nodes, and removing the selected pair of nodes, and taking the union of the nodes reachable from the merged reachability set, and comparing the intersection of the test set with the union of the pair of selected nodes with the null set.
According to another aspect of the present invention, there is provided the above computer program product in which each node in the interference graph has an associated reachability vector representing which nodes in the interference graph are reachable from the node and in which set operations to determine reachability of nodes are carried out using the reachability vectors.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for generating a program dependence graph comprises instruction means for ordering the generation of the program dependence graph from an innermost level of nested loops in the source code segment to an outermost level of nested loops in the source code segment.
According to another aspect of the present invention, there is provided the above computer program product in which the instruction means for generating a program dependence graph comprises instruction means for generating the control dependence graph for a level of nested loops in the sour

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Loop allocation for optimizing compilers does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Loop allocation for optimizing compilers, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Loop allocation for optimizing compilers will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3184772

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.