Data processing: software development – installation – and managem – Software program development tool – Translation of code
Reexamination Certificate
2000-12-21
2004-04-06
Morse, Gregory (Department: 2122)
Data processing: software development, installation, and managem
Software program development tool
Translation of code
C717S151000, C717S152000, C717S155000, C717S156000, C717S161000
Reexamination Certificate
active
06718541
ABSTRACT:
BACKGROUND OF THE INVENTION
The modem optimizing compiler for the Explicit Parallel Instruction Computing (EPIC) architecture performs the global task of detection and refining the potential parallelism of a source program being compiled. The EPIC architecture creates new possibilities for code optimization due to its ability to perform many operations in parallel.
Improvements in both programming techniques and hardware support to improve compiler optimization techniques are important to realizing the full potential of the EPIC architecture.
A Critical Path Algorithm can be utilized to schedule a basic block of operations. A critical path is defined as the longest path through a data dependency graph. A latest start time list is generated forming a number of time frames equal to the length of the critical path. Each operation is examined to determine whether it is dependent on another operation. For example, if an operation is not dependent on any other operation it can be scheduled last.
The EPIC approach is platform oriented and therefore code translation mechanisms are very important and include many platform dependent optimizations. Critical path algorithms facilitate the utilization of the full parallelism inherent in a target machine. However, the critical path strategy is not useful for scheduling when potential parallelism of a basic block being scheduled is much higher than the parallelism of the target hardware because the target hardware may have insufficient register resources required to hold data until dependencies and latencies are resolved.
One of the possible approaches to register allocation for processors with multiple functional units is described in U.S. Pat. No. 5,987,259 Patent titled “Functional unit switching for the allocation of registers”. If some instruction is able to be performed onto a variety of possible functional units, it is scheduled at the unit with the least subscribed register resources. This method is useful for an architecture with separate register files connected to each functional unit. In case of two equivalent units and if there are insufficient registers associated with the first unit to keep all the temporary values necessary for some code region, instructions of that code region scheduled to perform on the first functional unit that may be executed on the second unit are detected. Those instructions are rescheduled for execution on the second functional unit. This method is not applicable for architectures with register files shared among a number of functional units.
Another approach to the problem of optimal register allocation is expounded in the paper “Combining Register Allocation and Instruction Scheduling” by Rajeev Motwani, Krishna V. Palem, Vivek Sarkar and Salem Reyen (Technical Report, Courant Institute, TR 698, July 1995). The main idea of that method is the formulation of combined register allocation and instruction scheduling within a basic block as a single optimization problem. An objective cost function that more directly captures the primary measure of interest in code optimization is the completion time of the last instruction. Unfortunately, a heuristic algorithm designed could not have been formulated in the context of an existing instruction scheduling/register allocation phase-ordered solution.
One more paper should be mentioned in context of the invention proposed: “Scheduling Expression DAGs for Minimal Register Need” by Christoph W. Kessler (University of Trier, Germany, faculty of Mathematics and Informatics, Monthly report 1996-12). This article presents a new algorithm for generating schedules for expression Data Dependency Graph (DAG) that uses a minimal number of registers. Classically it is a NP-complete problem, but the algorithm shows good results for expressions up to 50 operations. Extended to cope with multiple functional units this algorithm might be applied for EPIC architecture unless there is a requirement of minimal register usage. This may penalize performance due to restricted Instruction Level Parallelism (ILP).
Accordingly, improved techniques for scheduling operations in a basic block are required. The main requirements to be satisfied are: multiple functional units, shared register file, sensitivity to register pressure and operation latencies, compatibility with traditional phase-ordered compilation technique (separate operation scheduling and register allocation), unrestricted ILP.
SUMMARY OF THE INVENTION
Platform specific optimization methods for the EPIC architecture include full predication and speculation support. Optimizations based on the estimation of earliest and latest start times of operations and on a critical path strategy are applied to Extended Scalar Blocks (ESBs, also referred to as Basic Blocks). Extended Scalar Blocks are regions of the predicated code where all dependencies between operations are represented explicitly as a relation between two operations for a considerable number of operations. For each ESB the compiler works out the critical path which is defined as a sequence of operations that will take the longest CPU time and cannot be executed in parallel because of dependencies. The main goal of any optimizer in optimizing the compiler is to reduce the execution time of the program being optimized. Reduction of the critical path estimated in terms of target processor cycles is the application criterion applied to optimizations to achieve the main goal.
An optimizing compiler for an Explicit Parallel Instruction Computing (EPIC) architecture performs the global task of detecting and refining the potential parallelism of a source program being compiled.
According to one aspect of the invention, a heuristic for a List Scheduling algorithm schedules registers based on register economy which is helpful when scheduling scalar blocks of great potential parallelism. The critical path in this case is much longer than theoretical value. The register pressure is reduced at the scheduling phase.
According to another aspect of the invention, scheduling priority for a cycle-driven scalar scheduler is changed from operation of Latest Start Time to a function of two arguments: the number of successors of a particular operation and an order of operation in a tree-like IR (intermediate representation) structure.
According to another aspect of the invention, the use of new heuristic for well-known List Scheduling algorithm leads to smaller register pressure.
According to another aspect of the invention, an algorithm for calculation of register economy priority values (REP) includes the act of ordering operations of a basic block in a data dependency graph, indicating the number of predecessors for each ordered operation, and generating sequential REPs starting with the last operation in the ordered list.
Other features and advantages of the invention will be apparent in view of the following detailed description and appended drawings.
REFERENCES:
patent: 5202975 (1993-04-01), Rasbold et al.
patent: 5202993 (1993-04-01), Tarsy et al.
patent: 5317734 (1994-05-01), Gupta
patent: 5819088 (1998-10-01), Reinders
patent: 5987259 (1999-11-01), Goebel
patent: 6526573 (2003-02-01), Babaian et al.
Dani, Amod K., Ramanan, V. Janaki Govindarajan, R., “Register-Sensitive Software Pipelining”, http://pdps.eece.unm.edu/1998/papers/293.pdf. retrieved using www.google.com, Nov. 6, 2002.*
Gibbons, Phillip B., Muchnick, Steven S., “Efficient Instruction Scheduling for a Pipelined Architecture”, Hewlett-Packard Laboratories, Palo Alto, CA, 1986 ACM Portal, Nov. 7, 2002.*
Goodman, James R., Univ of Wisconsin-Madison CS Dept, Hsu, Wei-Chung, Cray Research Inc., “Code Scheduling and Register Allocation in Large Basic Blocks”, 1988 ACM, retrieved from ACM Portal, Nov. 7, 2002.*
Rau, B. R., Glaeser, C. D., Advanced Processor Technology Group, ESL, Inc., “Some Scheduling Techniques and an Easily Schedulable Architecture for High Performance Scientific Computing”, 1981 IEEE, retrieved from IEEE database, Nov. 7, 2002.*
Rau, B. Ramakrishna, Glaeser, Christopher D., Picard, Raymond L., Advanced
Ostanevich Alexander Y.
Volkonsky Vladimir Y.
Elbrus International Limited
Morse Gregory
Steelman Mary
Townsend and Townsend / and Crew LLP
LandOfFree
Register economy heuristic for a cycle driven multiple issue... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Register economy heuristic for a cycle driven multiple issue..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Register economy heuristic for a cycle driven multiple issue... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3217564