Method and apparatus for profiling of non-instrumented...

Electrical computers and digital processing systems: processing – Processing control – Branching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S239000

Reexamination Certificate

active

06233678

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to the runtime profiles of software programs executing on computers.
2. Description of the Related Art
Runtime profiling is a mechanism for understanding a program's runtime behavior. A runtime profile is a collection of information indicating the control flow path of a program, i.e. which instructions executed and where branches in the execution took place. Profile-based optimizations can then be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement.
The runtime profile of a program is used by optimizing compilers and dynamic translators to focus their analysis efforts on parts of the program where greater performance benefit is likely. Advanced compilers perform optimizations across block boundaries to increase instruction-level parallelism, enhance resource usage and improve cache performance. Profile data is also useful for software developers in tuning the performance of their programs.
Program profiling typically counts the occurrences of an event during a program's execution. The measured event is typically a local portion of a program, such as a routine, line of code or branch. More fine-grained profiling is also possible based upon basic blocks and control-flow edges. Profile information for a program can consist of simple execution counts or more elaborate metrics gathered from counters within the computer executing the program.
One conventional approach to profiling is to instrument the program code by adding profiling probes to the code. Profiling probes are additional instructions which are used to log the execution of a basic block of code containing the probe. Typically, the program is compiled with the profiling probes placed within each basic block of code. The instrumented code is then executed using several different suites of test inputs to obtain several sets of profile data. The program is subsequently recompiled using the resulting profile data to give profile-based compilation of the original program.
Instrumentation based methods for gathering profile data tend to be complex and time consuming. Instrumentation of the code can result in a code size explosion due to the added instructions. The additional probe instructions also slow execution of the code and a profiled, or instrumented, version of a program can run as much as thirty times slower than the original version. Execution slow down is more than an inconvenience. Experience has shown that slow down is a major reason for profile based optimizations not being widely used in the user community.
Selection of representative test input suites for instrumented programs is important to the accuracy of the profile data. If the inputs are not selected carefully, the profile will not reflect actual usage. Programs that are highly data-dependent, such as a sort routine or a database application, have branches that are highly sensitive to user inputs. Validating the profile is difficult without a large scale study of user habits. In the absence of a user study, profiling is typically done using a large set of inputs which increases the time required to produce accurate profiling data.
However, in order to reduce the time required to obtain instrumented profiling, small test input data suites must be used to profile the program. Smaller test input suites, however, reduce the accuracy of the resultant profile data. Therefore, there is a trade-off between the accuracy of profiling and the time required to perform profiling.
There remain, however, some programs for which it is difficult or impossible to come up with representative test input data. Real time applications, such as operating system (OS) kernels and embedded systems, are excluded from the benefits of profile driven optimizations because of their execution nature. Long running applications, such as database systems, are often excluded from profiling as well.
Furthermore, analyzing and using the profile data requires additional processing steps. A program must be compiled with profiling enabled, executed using the test input suites, and then recompiled based upon the profile data. For small programs, this does not involve a large amount of overhead or difficulty. However, for large systems, such as commercial database applications, this requires significant alteration of build scripts. A large number of man-hours are invested in these scripts. In addition, large systems will require a significant amount of processing time for analysis and recompilation. As a result, software vendors are hesitant to adopt profile driven optimizations.
An alternative to instrumenting the program code is to use statistical program count (PC) sampling. The actual program runs under control of a profiler program which acts as a wrapper around the code which is being profiled. The profiler program makes an operating system call to set up a timer interrupt to be delivered to the profiler program at a predetermined frequency of X times per second. It also registers a “handler procedure” for this interrupt within the operating system. The actual program execution is then started and driven by a test input suite.
When a timer interrupt occurs, the handler procedure is invoked and the running program is suspended. At this point, the state of the machine (in other words, the program count of the process) is passed to the profiler program for recordation. The handler procedure also often records the values of many of the registers of the processor at the time of the interrupt.
The overhead of statistical PC sampling is determined by the sampling frequency X that is selected. The overhead and speed are determined by the sampling frequency. Overhead will decrease and speed will increase when the sampling frequency is decreased. However, the accuracy of the profile data is also determined by sampling frequency and increases when the sampling frequency is increased. Therefore, there is a trade-off between overhead and accuracy when selecting the sampling frequency.
Further, the statistical PC sampling approach described above typically results in too fine a level of granularity. It also doesn't really track the control flow well and requires a high level of analysis in order to use it in the process of optimizing the code. In order to perform optimization of the program code, the profile information which indicates which parts of the code are hot (in other words, those parts of the code which execute frequently) need to be mapped back to the program's control flow. This is difficult to do when the profile data is associated with a bunch of program count values that are taken at arbitrary intervals. Also, due to the high level of analysis required, the analysis of the profile data is usually performed after the runtime of the program. This has the disadvantage that some of the dynamic addressing information may be lost, such as the runtime control flow of the program within a dynamically linked library. In addition, the requirement of post-runtime analysis prevents statistical PC sampling from being used for on-the-fly optimization.
Alternatively, static methods exist which are based upon compiler assumptions and do not involve the use of profile data obtained through instrumentation of the code or execution interrupts and which do not require the code to be recompiled. However, these static estimates are typically not as accurate as profiling. For example, when static estimates are used to predict branch behavior, the inaccuracy of the predictions are approximately twice that for predictions based upon profiled information. Furthermore, control flow within dynamically bound procedures is difficult to estimate statically.
Another approach is to use existing branch handling hardware to speed up profiling. The use of hardware to reduce overhead overcomes the need to trade-off accuracy for lower profiling overhead, as is the case with statistical PC sampling. The use of hardware can also reduce the level

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for profiling of non-instrumented... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for profiling of non-instrumented..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for profiling of non-instrumented... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2484234

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.