Method, apparatus and computer programmed product for binary...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S152000, C717S152000

Reexamination Certificate

active

06289505

ABSTRACT:

BACKGROUND OF THE INVENTION
Field of the Invention
This invention relates to the field of optimizing compilers. Specifically, this invention is a method, apparatus and computer program product for providing a high level language compiler with the capability to re-optimize a previously compiled binary executable.
Background
FIG. 1
illustrates a prior art optimizing compiler, indicated by general reference character
100
, for compiling a source program to create an optimized binary executable. The compiler
100
consumes source information
101
through a compiler front-end segment
103
. The compiler front-end segment
103
processes the syntax and semantics of the source information
101
according to the rules of the programming language applicable to the source information
101
. The compiler front-end segment
103
generates at least one version of an intermediate code representation
104
(IR) of the source information
101
. The intermediate code representation generally includes data structures that either represent, or can be used to create, data dependency graphs (DDGs) and execution flow graphs. The intermediate code representation
104
is then optimized by an intermediate representation optimizer segment
105
. The intermediate representation optimizer segment
105
operates on, and adjusts, the intermediate code representation
104
of the source information
101
to optimize the execution of a program in a variety of ways well understood in the art. The intermediate representation optimizer segment
105
generates an optimized intermediate representation
106
. A code generator segment
107
consumes the optimized intermediate representation
106
, performs low level optimizations, allocates physical registers and generates binary module
109
(and conditionally assembler source code) from the optimized intermediate representation
106
. The binary module comprises binary computer instructions (binary code) in a module that can be linked with other modules to create a binary executable. The assembler source code is a series of symbolic statements in an assembler source language. Both the assembler source code and the binary code are targeted to a particular computer application binary interface (ABI).
DDGs embody the information required for an optimizer to determine which statements are dependent on other statements. The nodes in the graph represent statements in a programmed block and arcs represent the data dependencies between nodes. In particular, the scope of a variable extends from a “def” of the variable to a “use” of the variable. A “def” corresponds to an instruction that modifies a variable (an instruction “defines” a variable if the instruction writes into the variable). A use corresponds to an instruction that uses the contents of the variable. For example, the instruction “x=y+1;” “defs x” and “uses y”. An arc in the DDG extends from the def of a variable to the use of the variable. DDGs are described in chapter
4
of
Supercompilers for Parallel and Vector Computers
, by Hans Zima, © 1991, ACM press, ISBN 0-201-17560-6, 1991.
As mentioned above, the code generator segment
107
performs low level optimizations and generates either (or both) binary code or assembler source code. The intermediate representation of the program generally references virtual registers. That is, the intermediate representation optimizer assumes that the target computer contains an unlimited number of registers. During the operation of the code generator segment
107
, these virtual registers are assigned to the physical registers of the target computer. This resource management is performed in the code generator segment
107
by a register allocation (expansion) process.
One aspect of the register allocation process is that the contents of physical registers are often “spilled” to memory at various points during the execution of the program so that the limited number of physical registers can be used to hold values of more immediate relevance to the program at those various points. Those values that are spilled to memory are often restored to the registers when the program advances to different points of execution. An example of register allocation and register spilling techniques is provided in
Compilers: Principles, Techniques and Tools
by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Addison-Wesley Publishing Co. © 1988, ISBN 0-201-10088-6, pages 541-546.
Execution flow graphs represent the sequence of operations executed by the program. Information can be included on the graph's edges to provide scheduling information such as dependency information, frequency of execution information or other information that is useful for optimizing the operations represented by the execution flow graph.
Software pipelining is a technique for scheduling the execution of instructions. In the cage of simple basic block loops, software pipelining schedules different overlapping iterations of the loop body to exploit a computer's underlying parallel computation units. The execution schedule includes of a prologue, a kernel, and an epilogue. The prologue initiates the first p iterations thus starting each iteration, A steady state is reached after the first p*II cycles, where II is the initiation interval where each initiated iteration is executing instructions in parallel. In this steady state or kernel, one iteration of the loop is completed every II cycles. Once the kernel initiates the last iteration in the loop, the epilogue completes the last p iterations of the loop that were initiated by the kernel. Often the instruction schedule requires that a particular instruction be initiated after some delay—thus, unfilled instruction slots in the instruction schedule are filled with “no-operation” (NOP) instructions.
Computer manufacturers often make a family of computers with similar architectures. One problem for both computer manufacturers and computer application developers is the conflict between the desire of computer manufacturers to provide more powerful computers with extended capabilities and that of the program application developers who tend to optimize an application to execute on the largest number of computers of a particular family. Although the models in the architecture family are similar, they often have differences. (For example, the SPARC™ architecture includes different application binary interfaces (ABIs), numbers of pipelines, and other differences between the V8, V8+, and V9 SPARC based models.) These differences are generally the result of cost/performance trade-offs or new architectural features added to later models. Commercial applications are generally compiled and optimized to execute using only the capabilities of the architecture that are shared by each model of the architecture family. Thus, the application does not use the advanced capabilities available to the more advanced models. Because application developers generally do not provide source code, the user of the application is unable to optimize the application to take advantage of the additional features of the more advanced models—thus, the application will not perform as efficiently as if it were optimized specifically for the computer model that executes the application.
Another problem is that compiler optimization generally assumes that each execution flow path is equally likely to be executed during the operation of the application. This means that the compiler does not optimize the execution flow path according to how the program actually operates. However, applications can be instrumented to capture an execution profile when operating on a particular data set. This profile information could be captured, and used to optimize an application specifically for use with that particular data set. In addition, some computer architectures provide memory performance information, such as cache-miss information for the memory caches. This information could also be used to optimize memory organization for a particular data set and usage pattern. These optimizations could

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method, apparatus and computer programmed product for binary... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method, apparatus and computer programmed product for binary..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, apparatus and computer programmed product for binary... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2529457

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.