Low overhead speculative selection of hot traces in a...

Data processing: software development – installation – and managem – Software program development tool – Testing or debugging

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06470492

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to techniques for identifying portions of computer programs that are particularly frequently executed. The present invention is particularly useful in dynamic translators needing to identify candidate portions of code for caching and/or optimization.
BACKGROUND
Dynamic translators translate one sequence of instructions into another sequence of instructions which is executed. The second sequence of instructions are ‘native’ instructions—they can be executed directly by the machine on which the translator is running (this ‘machine’ may be hardware or this machine may be defined by software that is running on yet another machine with its own architecture). A dynamic translator can be designed to execute instructions for one machine architecture (i.e., one instruction set) on a machine of a different architecture (i.e., with a different instruction set). Alternatively, a dynamic translator can take instructions that are native to the machine on which the dynamic translator is running and operate on that instruction stream to produce an optimized instruction stream. Also, a dynamic translator can include both of these functions (translation from one architecture to another, and optimization).
Caching dynamic translators attempt to identify program hot spots (frequently executed portions of the program, such as certain loops) at runtime and use a code cache to store translations of those frequently executed portions. Subsequent execution of those portions can use the cached translations, thereby reducing the overhead of executing those portions of the program.
A dynamic translator may take instructions in one instruction set and produce instructions in a different instruction set. Or, a dynamic translator may perform optimization: producing instructions in the same instruction set as the original instruction stream; thus, dynamic optimization is a special native-to-native case of dynamic translation. Or, a dynamic translator may do both—converting between instruction sets as well as performing optimization.
In general, the more sophisticated the execution profiling scheme, the more precise the hot spot identification can be, and hence (i) the smaller the translated code cache space required to hold the more compact set of identified hot spots of the working set of the running program, and (ii) the less time spent translating hot spots into native code (or into optimized native code). Unless special hardware support for profiling is provided, it is generally the case that a more complex profiling scheme will incur a greater overhead. Thus, dynamic translators typically have to strike a balance between minimizing overhead on the one hand and selecting hot spots very carefully on the other.
Depending on the profiling technique used, the granularity of the selected hot spots can vary. For example, a fine-grained technique may identify single blocks (a straight-line sequence of code without any intervening branches), whereas a more coarse approach to profiling may identify entire procedures. Since there are typically many more blocks that are executed compared to procedures, the latter requires much less profiling overhead (both memory space for the execution frequency counters and the time spent updating those counters) than the former. In systems that are doing program optimization, another factor to consider is the likelihood of useful optimization and/or the degree of optimization opportunity that is available in the selected hot spot. A block presents a much smaller optimization scope than a procedure (and thus fewer types of optimization techniques can be applied), although a block is easier to optimize because it lacks any control flow (branches and joins).
Traces offer yet a different set of tradeoffs. Traces (also known as paths) are single-entry multi-exit dynamic sequences of blocks. Although traces often have an optimization scope between that for blocks and that for procedures, traces may pass through several procedure bodies, and may even contain entire procedure bodies. Traces offer a fairly large optimization scope while still having simple control flow, which makes optimizing them much easier than a procedure. Simple control flow also allows a fast optimizer implementation. A dynamic trace can even go past several procedure calls and returns, including dynamically linked libraries (DLLs). This allows an optimizer to perform inlining, which is an optimization that removes redundant call and return branches, which can improve performance substantially.
Unfortunately, without hardware support, the overhead required to profile hot traces using existing methods (such as described by T. Ball and J. Larus in “Efficient Path Profiling”, Proceedings of the 29th Symposium on Micro Architecture (MICRO-29), December 1996) is often prohibitively high. Such methods require instrumenting the program binary (inserting instructions to support profiling), which makes the profiling non-transparent and can result in binary code bloat. Also, execution of the inserted instrumentation instructions slows down overall program execution and once the instrumentation has been inserted, it is difficult to remove at runtime. In addition, such method requires sufficiently complex analysis of the counter values to uncover the hot paths in the program that such method is difficult to use effectively on-the-fly while the program is executing. All of these make traditional schemes inefficient for use in a caching dynamic translator.
Hot traces can also be constructed indirectly, using branch or basic block profiling (as contrasted with trace profiling, where the profile directly provides trace information). In this scheme, a counter is associated with the taken target of every branch (there are other variations on this, but the overheads are similar). When the caching dynamic translator is interpreting the program code, it increments such a counter each time a taken branch is interpreted. When a counter exceeds a preset threshold, its corresponding block is flagged as hot. These hot blocks can be strung together to create a hot trace. Such a profiling technique has the following shortcomings:
1. A large counter table is required, since the number of distinct blocks executed by a program can be very large.
2. The overhead for trace selection is high. The reason can be intuitively explained: if a trace consists of N blocks, this scheme will have to wait until N counters all exceed their thresholds before they can be strung into a trace. It does not take advantage of the fact that after the first counter gets hot, the next N-
1
counters are very likely to get hot in quick succession, making it unnecessary to bother incrementing them and doing the bookkeeping of the past blocks that have just executed.
SUMMARY OF THE INVENTION
According to the present invention, traces are identified as hot on a speculative basis, rather than based on full trace profile data. The series of blocks beginning at a hot start-of-trace condition and continuing until an end-of-trace condition is identified as a hot trace. Such a trace is identified as hot without the need to incur the overhead of actually measuring whether successive blocks have been executed a sufficient number of times to be considered hot.
The identification of what constitutes the trace is accomplished as the trace is executed. A translation of the trace is emitted as the trace is being executed, is available for optimization in a system that performs optimization, and is captured in the code cache.
A particularly useful start-of-trace condition is when the last interpreted branch was backward taken. A useful end-of-trace condition is when one of the following three conditions occurs: (1) the last interpreted branch was backward taken, (2) the number of interpreted branches exceeded a threshold value, or (3) the number of native instructions emitted for the trace exceeded another threshold value.
Thus, according to the present invention, rather than use higher overhead, sophisticated profiling techniques for

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Low overhead speculative selection of hot traces in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Low overhead speculative selection of hot traces in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Low overhead speculative selection of hot traces in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2999453

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.