Method and apparatus for patching program text to improve...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S152000, C711S125000, C712S227000

Reexamination Certificate

active

06295644

ABSTRACT:

TECHNICAL FIELD OF THE INVENTION
This invention relates to a method and an apparatus and article of manufacture for improving the performance of computer applications or programs via the use of an optimizer which avoids the utilization of excessive emulation and lookup tables by patching selective program text to directly access the code cache.
BACKGROUND OF THE INVENTION
Prior art systems for improving the performance of applications (i.e. run time optimizations) may be based on one of several approaches. One approach is to provide compilers known as “optimizers” or “optimizing compilers” which serve to improve the run time of a given application while it is executing. In prior art approach, the compiler software or runtime optimizer takes complete control of the existing binary. The runtime optimizer then determines which code is used most often (i.e. the “hot code”), translates the hot code, and emulates the remaining code. When branches are encountered in the emulated code, the target address is checked to see if there is a translation. If there is a translation, the system will execute the translated code instead of the original code.
In the prior art approaches to the running of an application on a computer, the binary of a computer application is not actually executed. This is because the emulator controls the process, such that the emulator actually emulates the existing binary and checks on each control transfer instruction (branch, call, return, etc.). However, this approach is problematic because the system needs to translate a lot of code to get the requisite speed. If the system has certain code and does not translate it, it will emulate that code, in which case a prior art system will spend too much time on the given task.
Such a system is expressed in FIG.
4
.
FIG. 4
depicts prior art run-time optimizer
130
. The control loop
131
begins execution of a block of program code via emulation performed by the profiling emulator
132
. The profiling aspect of emulator
132
allows the control loop
131
to track the number of times the particular block of code has been executed via emulation. Note that a run-time optimization system is different from a run-time binary translation system, in that the latter is usually for architecture migration while the former is to decrease execution time. The run-time optimization system is using the emulator
132
for profiling in order to guide optimizations, i.e. the code is running on its native system. After a predetermined number of executions via emulation, the control loop
131
designates the block of code as hot code, and thus, desirable for optimization. The control loop
131
then activates trace selector
133
to form a trace and then translate/optimize the formed trace. The trace selector
133
forms a trace of the instructions that comprise the block of code by following the instructions in the block. When a branch instruction is encountered, the trace selector makes a prediction as to whether the branch is taken or falls through. If the selector decides the branch is mostly taken, then the trace is formed by extending the code from the branch target block. If the selector decides not to take the branch, then the branch falls through, and the trace continues within the fall through block. The trace terminates at a backward branch predicted to be taken or when the trace becomes sufficiently large. After the trace is completed, the code is rewritten with machine dependent and machine independent optimizations. The optimized code is then placed into the code cache
134
. The next time the control loop
131
encounters a condition to execute this block of code, then the control loop
131
will execute the code in the code cache
134
and not emulate the code via emulator
132
.
As shown in
FIG. 5
, if the target of a branch which is taken to exit trace
1
, as shown by branch instruction
141
, then control is returned to the run-time system optimizer
130
and to control loop
131
, which determines if the target resides in the code cache. If the target resides in code cache, then the control loop
131
modifies the target of the branch instruction
141
to be the trace
2
in
142
in code cache as shown by branch instruction
143
. This modification is called backpatching. Thus, if the exit of the trace is already translated, then the branch is backpatched such that a subsequent execution will directly branch to the new trace without returning to the control loop. Backpatching increases the speed of execution of the code, because returning to the control loop significantly slows down execution time.
A particular problem with the prior art optimizer is that it cannot backpatch an indirect branch. The optimizer cannot backpatch an indirect branch because the target address is unknown. The target address is typically in a register or memory location and not written directly in code. Thus, the optimizer will shift control back to the control loop
141
to determine whether the target address has been translated, which is expensive in terms of time. The prior art has attempted to minimize this problem by inlining a code sequence to search a smaller lookup table in the optimized traces, however, these mechanism still incur high overhead. Without such a lookup, the branch will switch execution back to the original binary, and not return to trace execution. Moreover, if small table lookup fails then the execution will shift control back to the control loop, as described above. Examples of indirect branches are return branches and switch branches.
Another problem with the prior art optimizer is that it attempts to translate any code that is deemed hot based on a small threshold. A higher threshold would incur too much overhead while a lower threshold would end up with traces in poor quality, i.e. too many traces, and traces with early exits. Thus, it is rather difficult to come up with an ideal threshold value. Note that emulation time overhead is such that a higher threshold would require the code to be emulated much longer before they get a chance to be translated. Without a translation, the execution of the trace would be performed by software simulation or emulation. However, given that emulation is necessarily slow, all hot code is usually translated but this can present a problem, some traces are very difficult to translate. For example, it is difficult to translate a trace with branches in the delay slot of another branch. The requirement of translating all hot code therefore increases the translation time and complexity.
Hence, there are several drawbacks to this approach. This type of prior art system necessitates emulation of the code regions of the application that have not been translated, or otherwise requires the translation of every region the very first time it is encountered. This adds significant overhead because both emulation and translation are slow. Moreover, a system cannot effectively optimize every trace identified as “hot.” Some traces in the prior art optimizer cannot be optimized to run better than the original sequence. Moreover, optimization of some traces results in a reduction of performance after the optimization because the approach of the prior art must handle indirect branches efficiently in the optimized traces. For example, in PA 8000 systems made by the Hewlett-Packard Corporation of Palo Alto, Calif. (USA), procedure return branches are usually mispredicted by hardware. An effectively selected trace, which successfully predicted the target of a return branch could save misprediction cost, say about 5 cycles. However, for other architectures, such as the implementation of the new IA-64 architecture, where return branches will often by correctly predicted by return branch stack, the overhead of handling return branches would cost more than a correctly predicted branch. Hence, the approach in prior art is more suitable for architectures with expensive return branches, but not suitable for architectures with inexpensive return branches.
The second problem with the prior art concerns th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for patching program text to improve... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for patching program text to improve..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for patching program text to improve... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2517694

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.