Data processing: software development – installation – and managem – Software program development tool – Translation of code
Reexamination Certificate
1997-02-28
2001-05-01
Hafiz, Tariq R. (Department: 2762)
Data processing: software development, installation, and managem
Software program development tool
Translation of code
C717S152000, C717S152000, C717S152000, C717S152000, C717S152000, C717S152000, C717S152000
Reexamination Certificate
active
06226790
ABSTRACT:
FIELD OF THE INVENTION
The present invention pertains to compilers. More particularly, the present invention optimizes code transformations for attaining superior overall performance.
BACKGROUND OF THE INVENTION
Computers are being used today to perform a wide variety of tasks. Many different areas of business, industry, government, education, entertainment, and most recently, the home, are tapping into the enormous and rapidly growing list of applications developed for today's increasingly powerful computer devices. Computers have also become a key technology for communicating ideas, data, and trends between and among business professionals. These devices have become so useful and ubiquitous, it would be hard to imagine today's society functioning without them.
Computers operate by executing programs, or a series of instructions stored in its memory. These programs, and their series of instructions, are collectively referred to as software. Software is key to utility of computers. Software is what makes the computer devices function and perform useful tasks. Good software makes for effective machines, whereas poor software makes for difficult to use, less effective machines. Thus, the utility of the computer device often hinges upon the utility of the software written for the device.
Software is written by professionals referred to as programmers or software engineers. As programs have become larger and more complex, the task of writing software has become correspondingly more difficult. As a result, programmers typically code in “high level languages” to improve productivity. The use of high level language makes the task of writing extremely long and complex programs more manageable. The completed program, however, must be translated into machine executable language in order to run on a computer. Programmers rely upon compilers to translate their program written in high level language into a program comprised of machine executable code, known as “machine language.”
Compiler efficiency and sophistication is directly related to the speed and reliability of the machine executable code. The process of translating the program written in high level language into a program written in machine language is referred to as compiling. The actual translation is performed by a software program referred to as a compiler. The compiler operates on the program written in high level language. The high level language program is referred to as source code. The compiler translates the source code into machine executable code. Ultimately, it is the machine executable code which will run on the computer. Thus, the speed and reliability of the executable code depends upon the performance and efficiency of the compiler. If the compiler is inefficient, the machine executable code will run slower. Other attributes, such as executable code size and reliability, may also be affected. Hence, it is critical to the speed and efficiency of the program that the compiler thoroughly optimizes the executable code during the translation process.
There are several different methods and techniques used to optimize source code. One technique, commonly referred to as “loop interchange,” involves changing the order in which loops are executed. Rather than executing loops according to the way that the human programmer had originally written the computer program, the compiler rearranges the loops into a different, more efficient order so that the code can be executed much faster without impacting the final result. Another technique, known as “cache tiling” or “blocking,” involves the compiler breaking large operational blocks into several smaller blocks. Subdividing the larger blocks into smaller blocks generally reduces the total number of cache misses that are required. Reducing the number of cache misses, directly increases the speed at which the code may be executed. Yet another method for improving the execution speed involves “register tiling” or “unrolling.” A register tiling process further subdivides operations blocks so as to minimize the number of loads and stores which are required. Associated with any given hardware design is a limited number of registers. By keeping data in registers for an elapsed period of time, data items need not be loaded to or stored from memory each time the data is accessed.
In the past, various compilers have used one or more of these techniques to optimally compile their code. First, loop interchange as applied to change the ordering of the loops. The resulting code was then altered according to the most optimal cache tiling. Next, the modified code was then changed to reflect the most optimal register tiling. However, it has been discovered by the present inventors that although each of these techniques separately optimizes the code, that their final combined effects might not produce the most optimal performance. This is due to the observation made by the present inventors that these techniques are highly interdependent. Changing one of these factors transforms the other techniques. For instance, optimizing loop ordering might cause register tiling to become worse or vice versa. In addition, the various transformations might have contradictory effects on different characteristics of the machine that contribute to performance. For example, one loop ordering may improve cache behavior, but it might also seriously degrade scheduling behavior.
Thus, there is a need in the prior art for a compiler that can automatically determine how best to transform source code considering the tree optimizations discussed and considering various machine characteristics such as caches and instruction scheduling. The present invention provides a highly effective, elegant solution to this problem by employing a total machine model to estimate performance characteristics for a wide range of transformations; the set of transformations which produces the best overall execution time estimate is then selected.
SUMMARY OF THE INVENTION
The present invention pertains to a method for determining an optimal loop interchange, set of register tiling amount, and cache tiling size for compiling source code into object code. It has been discovered that these different parameters are highly interdependent. Optimizing just one of the factors separately, might adversely affect the performance characteristics of the other parameters. In order to find the particular loop interchange, set of register tiling amount, and cache tiling size for achieving the best overall performance, the present invention first constructs a model of the specific computer system upon which the object code is to be run. Next, the search space comprising all of the different possibilities of the loop interchanges, register tiling amounts, and cache tiling sizes is run through the model. The model calculates an estimated performance rating for each of the possible combinations based on anticipated cache misses, instruction schedules, and loop overhead. The particular loop interchange, set of register tiling amounts, and cache tiling sizes corresponding to the best estimated performance is then selected as being the most optimal. The source code is then compiled according to this optimal loop interchange, register tiling amount, and cache tiling size.
REFERENCES:
patent: 5442790 (1995-08-01), Nosenchuck
patent: 5485619 (1996-01-01), Lai et al.
patent: 5535393 (1996-07-01), Reeve et al.
patent: 5790859 (1998-08-01), Sarkar
patent: 5805863 (1998-09-01), Chang
patent: 5809308 (1998-09-01), Tirumalai
patent: 5842022 (1998-11-01), Nakahira et al.
patent: 5862384 (1999-01-01), Hirai
patent: 5867711 (1999-02-01), Subramanian et al.
patent: 5910900 (1999-06-01), Mangelsdorf
patent: 5920724 (1999-07-01), Chang
patent: 5930507 (1999-07-01), Nakahira et al.
patent: 5940620 (1999-08-01), Graham
patent: 5956498 (1999-09-01), Mangelsdorf
Swamy Punyamurtula et al, “Minimum Dependence Distance Tiling of Nested Loops with Non-uniform Dependences”, IEEE pp. 74-81, Apr. 1994.*
Josep Torrellas et al, “Optimizing Instruction Cache Performance for Operati
Maydan Dror
Wolf Michael
Hafiz Tariq R.
Silicon Graphics Inc.
Vo Ted T.
Wagner , Murabito & Hao LLP
LandOfFree
Method for selecting optimal parameters for compiling source... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for selecting optimal parameters for compiling source..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for selecting optimal parameters for compiling source... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2564838