Method for determining program control flow

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S152000, C717S152000

Reexamination Certificate

active

06308321

ABSTRACT:

BACKGROUND OF THE INVENTION
Analysis of binary software executables is a fundamental tool for many useful applications, such as binary translation (from one computer architecture to another), post-link optimization, and binary instrumentation. All these tools need to distinguish executable code from un-reachable code and other data, and need to determine the structure of the code in terms of basic blocks and control flow paths between them.
U.S Pat. No. 5,986,541, filed Dec. 4, 1997 (Agarwal), incorporated herein by reference in its entirety, discloses a method of instrumenting original binary code to create augmented or remediated binary code, which can then perform many useful functions such as error detecting and repair. Various embodiments of the Agarwal application accomplish tasks including, but not limited to, remediation, assertion checking, test certification and coverage, continuous internal value testing, bootstrap regression testing, test path identification and statistical pattern matching.
An executable program is the final result of the software development process. That process consists of compiling source modules written by the programmer into object modules which are then linked together to produce the final executable program. The executable program thus consists of the actual machine instructions of the program, together with whatever data and descriptive information is needed to run those instructions. Higher-level source constructs such as names and structured control flow are in general no longer available.
The ability to modify an executable program to, for example, incorporate new functionality, for example as described in the Agarwal application, enables many important applications, as it obviates the need to go back to the original source, which may not be available, and to re-build the program.
Modifying the executable program is not straightforward, however, because of the structure of machine instructions. One problem is displacement update, which pertains to the manner in which instructions refer to each other.
FIG. 1A
demonstrates the displacement update problem. A small piece of code
10
accesses registers named r
1
, r
2
, r
3
and r
5
in this example. Instruction
10
A adds the contents of registers r
2
and r
3
, and places the resulting sum into register r
1
. Instruction
10
B tests the contents of register r
1
, i.e. the sum resulting from instruction
10
A, and if it is less than or equal to zero, jumps 346 bytes to the last instruction
10
Z of the sample code
10
, as indicated by arrow
12
. The number 346, called the displacement
11
, has previously been calculated by the compiler and is part of the jump instruction
10
B.
In
FIG. 1B
, additional code
14
has been inserted between the jump instruction
10
B and its target
10
Z, perhaps for one or more of the reasons described in the Agarwal application. As a result, there are more than 346 bytes in the code
15
between the jump
10
B and its target
10
Z, and the displacement
11
must be updated or the jump indicated by arrow
12
A will be to the wrong target, possibly with catastrophic results. In addition, those displacements of any other relative jump instructions which cross that new code must also be updated, as must the target addresses of any absolute jump instructions where the target instruction has been shifted. Note that the 346 bytes separating the jump
10
B and its target
10
Z in
FIG. 1A
may contain both code and data.
Another important issue is register usage. Registers are fast storage locations within the computer's central processing unit (CPU), which typically has only a small, fixed number of registers, e.g. thirty-two registers. If the inserted code needs to make use of registers, then either registers must be chosen that do not interfere with the program's use of registers, or the programrs register usage must be changed. Register usage is a flow-sensitive analysis, since it is unlikely that there are any registers that are unused throughout the program. Thus, register usage must be calculated separately for each point in the program. In general, this analysis requires knowledge of the program's control flow structure.
FIG. 2
illustrates the concept of register usage. The original code includes instructions
20
A and
20
B. Instruction
20
A copies the contents of register r
5
into register r
1
. Instruction
20
B is a conditional jump instruction. Since data is loaded into register r
1
at instruction
20
A, it is clear that just prior to instruction
20
A there is no further use of whatever data may be in register r
1
, so that register r
1
may be used freely for other purposes. Here, two new instructions
22
which use register r
1
have been inserted before instructions
20
A and
20
B. The new instructions
22
calculate and save in register r
1
the sum of the values stored in registers r
2
and r
3
. The content of r
1
, i.e., the sum, is then stored in memory at a location defined by the contents of register r
6
plus an offset of 12 bytes. Since the following instruction
20
A copies the value of r
5
into r
1
, the insertion of these two instructions
22
does not affect the execution of the program.
The insertion of these instructions
22
is acceptable because the following original instruction
20
A moves the contents of register r
5
into r
1
, and the original program has no further use for register r
1
at the point where the new instructions
22
have been inserted. In general, all possible paths from the new instructions must be examined to determine which registers are available.
Thus, to perform the analyses that indicate what new code can be inserted at each point in the program, the control flow structure must be determined. In some cases determining this structure is simple. In first example of
FIGS. 1A and 1B
, for instance, the jump target is hard-coded into the instruction. This is not always true, however. Often the jump target is computed at run-time.
SUMMARY OF THE INVENTION
The present invention provides an analysis method which, starting from external entry points, discovers all reachable instructions and the control-flow paths between them in an executable program.
A major challenge is that control flow is often determined by run-time values. For instance, a branch or jump instruction might use register contents or even memory contents to determine the target. Analysis must thus include constant propagation to determine control flow.
A further challenge is that standard constant propagation algorithms are dependent on the control-flow graph, which causes a chicken-and-egg problem in analyzing binaries: constant propagation is needed to determine the control-flow, but the control-flow graph is needed to perform constant propagation.
The present invention is an iterative, incremental method that interleaves control-flow analysis and constant propagation. In brief, control-flow analysis does as much as it can without constant propagation information. Constant propagation then runs on the partial control-flow graph, which enables more control-flow analysis to be done, and so on.
In accordance with the present invention, a method of generating a program control flow definition from the program code comprises determining entry points in the program. The code is followed, or sequentially scanned or examined, from an entry point to a control flow instruction such as a branch or jump instruction. A code block is then defined as the code from the entry point up to and including the control flow instruction. From the control flow instruction, additional entry points are identified. This is repeated for each entry point having a known value, resulting in a partial control flow definition.
For entry points having unknown values, constant propagation analysis is performed on the partial control flow definition to convert unknown entry point values to known values. Finally, the entry points determined by the constant propagation analysis are used as starting points in the scanning step to define additional entry p

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for determining program control flow does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for determining program control flow, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for determining program control flow will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2593048

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.