Method, apparatus, and product for optimizing compiler with...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S159000, C717S152000, C709S241000, C712S226000, C712S235000, C712S241000

Reexamination Certificate

active

06651247

ABSTRACT:

BACKGROUND
1. Field of the Invention
The invention generally relates to methods and devices for optimizing computer register allocation and assignment, particularly as implemented in an optimizing compiler using instruction level scheduling.
2. Related Art
A compiler is a computer program that transforms a source computer program written in one language, such as Fortran, or C, into a target computer program that has the same meaning but is written in another language, such as an assembler or machine language. A compiler's tasks may be divided into an analysis stage followed by a synthesis stage, as explained in
Compilers: Principles, Techniques, and Tools
by A. Aho et al. (Addison Wesley, 1988) pp. 2-22. The product of the analysis stage may be thought of as an intermediate representation of the source program; i.e., a representation in which lexical, syntactic, and semantic evaluations and transformations have been performed to make the source code easier to synthesize. The synthesis stage may be considered to consist of two tasks: code optimization, in which the goal is generally to increase the speed at which the target program will run on the computer, or possibly to decrease the amount of resources required to run the target program; and code generation, in which the goal is to actually generate the target code, typically relocatable machine code or assembly code.
A compiler that is particularly well suited to one or more aspects of the code optimization task may be referred to as an “optimizing compiler.” Optimizing compilers are of increasing importance for several reasons. First, the work of an optimizing compiler frees programmers from undue concerns regarding the efficiency of the high-level programming code that they write. Instead, the programmers can focus on high-level program constructs and on ensuring that errors in program design or implementation are avoided. Second, designers of computers that are to employ optimizing compilers can configure hardware based on parameters dictated by the optimization process rather than by the non-optimized output of a compiled high-level language. Third, increased use of microprocessors that are designed for instruction level parallel processing, such as RISC and VLIW microprocessors, presents new opportunities to exploit this processing through a balancing of instruction level scheduling and register allocation.
There are various strategies that an optimizing compiler may pursue. Many of them are described in S. Muchnick,
Advanced Compiler Design and Implementation
(Morgan Kaufmann Publishers, 1997). One large group of these strategies focus on optimizing transformations, such as are described in D. Bacon et al., “Compiler Transformations for High-Performance Computing,” in
ACM Computing Surveys
, Vol. 26, No. 4 (December 1994) at pp. 345-520. These transformations often involve high-level, machine-independent, programming operations: for example, removing redundant operations, simplifying arithmetic expressions, removing code that will never be executed, removing invariant computations from loops, and storing values of common sub-expressions rather than repeatedly computing them. These machine-independent transformations are hereafter referred to as high level optimizations.
Other strategies employ machine-dependent transformations. These machine-dependent transformations are hereafter referred to as low level optimizations. Two important types of low level optimizations are: (a) instruction scheduling and (b) register allocation. An important portion of both types of low level optimization strategies are focused on loops in the code, where in many applications the majority of execution time is spent.
A principal goal of some instruction scheduling strategies is to permit two or more operations within a loop to be executed in parallel, a process referred to as instruction level parallel (ILP) processing. ILP processing generally is implemented in processors with multiple execution units. One way of communicating with the central processing unit (CPU) of the computer system is to create “very long instruction words” (VLIW's). VLIW's specify the multiple operations that are to be executed in a single machine cycle. For example, a VLIW may instruct one execution unit to begin a memory load and a second to begin a memory store, while a third execution unit is processing a floating point multiplication. Each of these execution tasks has a latency period; i.e., the task may take one, two, or more cycles to complete. The objective of ILP processing is thus to optimize the use of the execution units by minimizing the instances in which an execution unit is idle during an execution cycle. ILP processing may be implemented by the CPU or, alternatively, by an optimizing compiler. Utilizing CPU hardware, however, may be complex and result in an approach that is not as easy to change or update as the use of an appropriately designed optimizing compiler.
One known technique for improving instruction level parallelism in loops is referred to as software pipelining. As described in the work by D. Bacon et al. referred to above, the operations of a single loop iteration are separated into s stages. After transformation, which may require the insertion of startup code to fill the pipeline for the first s−1 iterations and cleanup ode to drain it for the last s−1 iterations, a single iteration of the transformed code will perform stage 1 from pre-transformation iteration i, stage 2 from pre-transformation iteration i-l, and so on. This single iteration is known as the kernel of the transformed code. A particular known class of algorithms for achieving software pipelining is referred to as modulo scheduling, as described in James C. Dehnert and Ross A. Towle, “Compiling for the Cydra 5,” in
The Journal of Supercomputing
, vol. 7, pp. 181, 190-197 (1993; Kluwer Academic Publishers).
Typically, the application of an instruction scheduling algorithm depends on information provided by a dependence graph (as well as information about the machine on which the instructions will be executed). As is known to those skilled in the art, the dependence graph represents source program dependencies at the machine instruction level. The construction of the dependence graph is based upon general data flow information that may be computed and maintained across several optimization phases. There are several alternative forms of data flow representation described in the literature, and a typical optimizer may choose to use any one or more of these. Among them are so-called “def-use” (definition-use) chains, static single assignment (SSA) form, and dynamic single assignment (DSA) form. From the instruction scheduling point of view, the fewer dependencies there are in the dependence graph, the more freedom the scheduler has to achieve higher degrees of ILP. Some forms of data flow representation (such as SSA) enable more accurate and more resource-efficient construction of instruction dependence graphs than others.
As noted, another group of low level optimization strategies involves register allocation and assignment. Some of these strategies have as their goal improved allocation and assignment of registers used in performing loop operations. The allocation of registers generally involves the selection of variables to be stored in registers during certain portions of the compiled computer program. The subsequent step of assignment of registers involves the choosing of specific registers in which to place the variables. The term “variable” will generally be understood to refer to a quantity that has a “live range” during the portion of the computer program under consideration. Specifically, a variable has a live range at a particular point in the computer program if that point may be included in a control path having a preceding point at which the variable is defined and a subsequent point at which the variable is used. Thus, register allocation may be described as referring to the selection of live ranges to be stored in registers

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method, apparatus, and product for optimizing compiler with... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method, apparatus, and product for optimizing compiler with..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, apparatus, and product for optimizing compiler with... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3159394

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.