Compiler method and apparatus for elimination of redundant...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S152000, C712S024000, C712S241000

Reexamination Certificate

active

06301706

ABSTRACT:

I. FIELD OF THE INVENTION
The present invention relates to processors and computing devices and more particularly to compiler techniques for optimizing loop constructs.
II. BACKGROUND OF THE INVENTION
Many practical applications require processing of very large amounts of information in a short period of time. One of the basic approaches to minimizing the time to perform such computations is to apply some sort of parallelism, so that tasks which are logically independent can be performed in parallel. This can be done, for example, by executing two or more instructions per machine cycle, i.e., by means of instruction-level parallelism. Thus, in a class of computers using superscalar processing, hardware is used to detect independent instructions and execute them in parallel, often using techniques developed in the early supercomputers.
Another approach to exploiting instruction level parallelism is used by the Very Long Instruction Word (VLIW) processor architectures in which the compiler performs most instruction scheduling and parallel dispatching at compile time, reducing the operating burden at run time. By moving the scheduling tasks to the compiler, a VLIW processor avoids both the operating latency problems and the large and complex circuitry associated with on-chip instruction scheduling logic. Both superscalar and VLIW processors take advantage of techniques know as pipelining for instruction scheduling optimization.
As known, each VLIW instruction includes multiple independent operations for execution by the processor in a single cycle. A VLIW compiler processes these instructions according to precise conformance to the structure of the processor, including the number and type of the execution units, as well as execution unit timing and latencies. The compiler groups the operations into a wide instruction for execution in one cycle. At run time, the wide instruction is applied to the various execution units with little decoding. The execution units in a VLIW processor typically include arithmetic units such as floating point arithmetic units. An example of a VLIW processor that includes floating point execution units is described by R. K. Montoye, et. al. in “Design of the IBM RISC System/6000 floating point execution unit”, IBM J.Res. Develop., V. 43 No.1, pp. 61-62, January 1990.Additional examples are provided in U.S. Pat. No. 5,418,975, which is incorporated herein by reference in the entirety.
Predicated and speculative computations are known in the art, see e.g.
Parallel and Distributed Computing Handbook,
Albert Y. Zomaya, Editor, McGraw-Hill 1996, chapter 21, Superscalar and VLIW Processors, pp 621-647, incorporated herein by reference. To improve efficiency, certain instructions may be executed speculatively and their results may then be retired or discarded. Also it is known that profile data that characterizes program behavior can be obtained by performing test runs of the program. Such a technique is employed, for example, for profiled branch prediction. This generated profile data enables the compiler to identify probable alternatives of a conditional statement so as to enhance the efficiency of speculative computations.
While these processors are capable of performing a variety of tasks adequately, it is perceived that the performance of VLIW processors can be improved further by improving optimization techniques employed by compilers that compile programs for VLIW processing. More specifically, redundant speculative computations in the loop body may reduce effectiveness of loop software pipelining. Thus, it desirable to provide for program compilation that reduces such redundant speculative calculations in the innermost loops.
III. SUMMARY OF THE INVENTION
A novel method and system is presented for use with VLIW and other parallel processing architectures for avoiding redundant speculative computations in the compilation of the innermost loops.
The method includes: a) determining whether the loop has an inductive variable (such as a counter or a loop iteration variable) and a conditional statement, which depends on the inductive variable; b) determining if a set of values of the inductive variable includes two subsets, in at least one of which the conditional statement is a loop invariant; and c) if conditions in steps a) and b) are satisfied, transforming the loop into two consecutive loops executable with a reduced set of values of the inductive variable.
The method also includes: a) determining if the loop has an inductive variable and does not have cross-iterational data dependencies; b) if the conditions in a) are satisfied, dividing the loop at a conditional statement, which brings redundant speculative calculations into the loop, so as to form at least a first and a second consecutive loops; and c) providing intermediate vectors so as to enable communication between the first and the second newly formed loops, wherein one vector is provided for each data dependence between the new loops.
The method also includes: a) determining if the loop has a conditional statement, which has improbable alternatives on certain control flow paths leading to the loop; and b) if the condition in a) is satisfied, creating a copy of the loop that contains the improbable alternatives.
The method also includes: a) determining if the loop has a conditional statement, which has no improbable alternatives, but the result of the condition remains the same in successive loop iterations with high probability; and b) if this condition in the previous step is satisfied, forming an outer loop, which contains two copies of the original loop, wherein in each of the copies the conditional statement is transformed such that the improbable result of the statement leads to exit from a given loop copy and transfer to another loop copy.
The method also includes: a) dividing the loop into a probable area and an improbable area; b) duplicating parts of the probable area so that all control flow paths from the improbable area to the probable area are eliminated; and c) forming a new outer loop such that all the back edges of the loop that originated in the improbable area are moved from a head of the original loop to the head of the new outer loop.
In general, the disclosed method includes: a) identifying if there is a condition within the loop that permits modification of the loop for optimization so as at least to reduce redundant speculative calculations; b) if the condition in step (a) has been identified, transforming the loop into one or more new modified loops, otherwise optimizing a program within the loop and returning to step (a) with another loop; and (c) repeating steps (a) and (b) for each of the new modified loops obtained by the transformation in step (b).


REFERENCES:
patent: 4710872 (1987-12-01), Scarborough
patent: 4821181 (1989-04-01), Iwasawa
patent: 5202995 (1993-04-01), O'Brien
patent: 5481723 (1996-01-01), Harris et al.
patent: 5584027 (1996-12-01), Smith
patent: 5958048 (1999-09-01), Babaian et al.
patent: 6026240 (2000-02-01), Subramanian
patent: 6049669 (2000-04-01), Holler
patent: 6148439 (2000-11-01), Nishiyama
Tang et al., “Impact of Self-Scheduling Order on Performance of Multiprocessor Systems,” ACM Proceedings on Int'l Conference on Supercomputing, Nov. 14-18, 1988, pp. 593-603.*
Tang et al., “Compiler Techniques for Data Synchronization in Nested Parallel Loops,” ACM Proceedings on the 1990 Int'l Conference on Supercomputing, Jun. 11-15, 1990, pp. 177-186.*
Ebcioglu et al., “VLIW Compilation Techniques in a Superscalar Environment,” Proceedings of the ACM Sigplan '94 Conf. on Programming Language Design and Implementation, Jun. 1994, pp. 36-48.*
Yu et al., “Control Mechanism for Software Pipelining on Nested Loop,” Proceedings of the IEEE on Advances in Parallel and Distributed Computing, Mar. 19-21, 1997. pp. 345-350.*
Muchnick, Steven S.,Advanced Compiler Design and Implementation, 1997, pp. 588-589.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Compiler method and apparatus for elimination of redundant... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Compiler method and apparatus for elimination of redundant..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Compiler method and apparatus for elimination of redundant... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2599695

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.