Loop optimization with mapping code on an architecture

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S241000

Reexamination Certificate

active

06772415

ABSTRACT:

FIELD OF THE INVENTION
The invention relates to methods and environments for mapping source code on a target architecture, preferably a parallel architecture.
BACKGROUND OF THE INVENTION
Before actual mapping of source code on a target architecture, one often transforms the original first code into a second code, because one expects the execution of said second code on said architecture to be better. Said source code often comprises loops and loop nests. Part of said code transformations are loop transformations, which are the topic of research in many papers and publications.
Said source codes typically have a lot of loops and loop nests. For data transfer and storage optimization, it is essential that global loop transformations are performed, over artificial boundaries between subsystems (procedures). It is exactly at these boundaries that the largest memory buffers are usually needed [K.Danckaert, K.Masselos, F.Catthoor, H.De Man, C.Goutis, Strategy for power efficient design of parallel systems, IEEE Trans. on VLSI Systems}, Vol.7, No.2, pp.258-265, June 1999.]. To optimize these buffers, global (inter-procedural) transformations are necessary. Most existing transformation techniques can only be applied locally, to one procedure or even one loop nest [N.Passos, E.Sha, Achieving full parallelism using multidimensional retiming, IEEE Trans. on Parallel and Distributed Systems, Vol.7, No.11, pp.1150-1163, November 1996.].
Most existing transformation methods assume that there is a dependency between two statements which access the same memory location when at least one of these accesses is a write [W.Shang, E.Hodzic, Z.Chen, On uniformization of affine dependence algorithms, IEEE Trans. on Computers, Vol.45, No.7, pp.827-839, July 1996.]. In this way, output-, anti- and flow-dependencies are all considered as real dependencies.
As for parallel target architectures, the previously applied compiler techniques tackle the parallelization and load balancing issues as the only key point. [S.Amarasinghe, J.Anderson, M.Lam, and C.Tseng, The SUIF compiler for scalable parallel machines, Proc. of the 7th SIAM Conf. on Parallel Proc. for Scientific Computing}, 1995.]. They ignore the global data transfer and storage related cost when applied on data dominated applications like multi-media systems. Only speed is optimized and not the power or memory size. The data communication between processors is usually taken into account in most recent methods [C.Diderich, M.Gengler, Solving the constant-degree parallelism alignment problem, Proc. EuroPar Conference, Lyon, France, August 1996. Lecture notes in computer science, series, Springer Verlag, pp.451-454, 1996.] but they use an abstract model (i.e. a virtual processor grid, which has no relation with the final number of processors and memories). In this abstract model, the real (physical) data transfer costs cannot be taken into account.
To adequately express and optimize the global data transfers in an algorithm, an exact and concise modeling of all dependencies is necessary. The techniques which are currently used in compilers do not use an exact modeling of the dependencies, but an approximation in the form of direction vectors or an extension of this (see [M.Wolf, M.Lam, A loop transformation theory and an algorithm to maximize parallelism, IEEE Trans. on Parallel and Distributed Systems, Vol.2, No.4, pp.452-471, October 1991.] for a detailed description). An example is [K.McKinley, A compiler optimization algorithm for shared-memory multiprocessors, IEEE Trans. on Parallel and Distributed Systems, Vol.9, No.8, pp.769-787, August 1998.], which combines data locality optimization and advanced interprocedural parallelization. However, it does not use an exact modeling, and as a result it cannot analyze the global data-flow.
Many techniques have been developed using an exact modeling, but these have not led to real compilers. The first method which used an exact modeling was the hyperplane method L.Lamport, The parallel execution of do loops, Communications of the ACM, Vol.17, No.2, pp.83-93, February 1974., where a linear ordering vector is proposed to achieve optimal parallelism. It works only for uniform loop nests, and all statements in the loop nest are considered as a whole: they all get the same ordering vector. Some particular cases of the hyperplane method have been proposed too. For example, selective shrinking and true dependence shrinking are in fact special cases of the hyperplane method, in which a particular scheduling vector is proposed.
Often a linear programming method is proposed to determine the optimal scheduling vector for the hyperplane method. Extensions where all statements of a loop nest are scheduled separately are called affine-by-statement scheduling. In [M.Dion, Y.Robert, Mapping affine loop nests: new results, Lecture Notes in Computer Science, Vol.919, High-Performance
Computing and Networking, pp.184-189, 1995.], a further extension is considered, namely for affine dependences. These methods are mainly theoretical: the ILP's to solve get too complicated for real world programs. Moreover they are designed only to optimize parallelism, and they neglect the data reuse and locality optimization problem.
Some papers have addressed the optimization of algorithms using the polytope model however these methods are quite complex and not suitable for models in the order of hundreds of polytopes.
AIM OF THE INVENTION
It is an aim of the invention to provide methods for transforming source code with loop transformations which focus on global data storage and transfer issues in parallel processors and which are feasible for realistic real-world data dominant codes.
SUMMARY OF THE INVENTION
In a first aspect of the invention a method (3) for transforming a first code (1) to a second code (2) is disclosed. Said first code and said second code describe at least the same functionality. Said first code and said second code are executable on a predetermined target architecture (4). The invented method transforms said first code into said second code, such that said target architecture can deliver said functionality while executing said code in a more cost optimal way. Said cost can relate to energy consumption of said target architecture but is not limited thereto. Said target architecture comprises at least two processors (5) and at least three levels of storage (6). Said target architecture can be denoted to be a parallel architecture. Each level comprises of storage units, wherein variables or signals, defined in said codes can be stored. Said levels define a hierarchy between storage units. Storage units in a first level are denoted local storage units. Storage units in the second level are less local than the ones in said first level. Storage units in a level N are less local than the ones in Level N−1, N−2, . . . and said first level. Storage units in the highest level are denoted global storage units. Said invented method transforms said codes in order to optimize transfers in between said storage levels, meaning that data transfers between lower level storage units are preferred for optimality. The method thus improves at least the data locality. In each of said processors at least two storage levels can be found. The characteristics of said storage levels within a processor can be considered to be part of internal processor characteristics information. The invented method is characterized in that it exploits such internal processor characteristics information explicitly while transforming said first code into said second code. Said codes considered in the invented method is characterized in that it contains a least of a plurality of loop nests. While transforming said first into said second code said loop nests in said codes are considered simultaneously. While transforming said first code in said second code, said loop nests are taken into account globally, meaning said transformation does not work step-by-st

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Loop optimization with mapping code on an architecture does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Loop optimization with mapping code on an architecture, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Loop optimization with mapping code on an architecture will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3294403

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.