Data processing: software development – installation – and managem – Software program development tool – Translation of code
Reexamination Certificate
2001-04-02
2004-01-13
Ingberg, Todd (Department: 2124)
Data processing: software development, installation, and managem
Software program development tool
Translation of code
C717S135000, C717S136000, C717S140000, C717S146000, C717S154000, C717S158000, C717S159000
Reexamination Certificate
active
06678886
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The invention relates to an apparatus and a method for generating an optimization object for providing an execution program module by linking necessary routines etc. to an object program generated by compiling a source program made up of a number of source files and, more specifically, to an apparatus and a method for generating an optimization object which can generate an execution program module in which instruction strings are so located as to be suitable for execution of a computer's cache functions and predictive branch functions.
2. Background Art
In recent computer systems, to speedily execute instructions of a database processing program etc., future execution is predicted before actual execution. For example, a computer speed-up apparatus utilizing such prediction may come in:
I. a translation lookaside buffer (TLB) used for speedily referencing a page table provided to correlate with each other memory's physical locations and their logical locations observable from the program,
II. a cache apparatus for using both time-wise and space-wise localities of a memory accessed by instructions to thereby hold necessary data in prediction of future memory accesses, or
III. a branch prediction apparatus for predicting whether a branch instruction branches or not before it is determined, to previously execute the instruction speculatively according to the results of the prediction.
The above-mentioned database has a large memory region of for example 8 GB and its source program is comprised of an aggregate of roughly 1000 through 10000 source files, each typically having more than 100 to less than 10,000 instruction strings, so that if each file is supposed to have an average of 1000 instructions, the program as a whole comprises a vast number of instruction strings of as many as 100 thousand to 10 millions. Whether an execution speed-up apparatus would properly function for such a large sized program depends largely on the instruction strings, or their execution scheme, of the application program as an execution program module generated by linking necessary routines to the object program. For example, an instruction cache apparatus can be reserved for reducing an average memory access time for fetching necessary instructions required for execution thereof. When the instructions are executed consecutively without executing a branch instruction, those instructions are fetched one after another, so that by expanding a cache block in units of which they are fetched at a time by the instruction cache apparatus from the memory, the number of time-consuming memory accesses can be decreased to thereby speed up the execution.
An actual application program, however, contains conditional branch instructions, unconditional branch instructions, and subroutine calling instructions and so its execution efficiency is largely deteriorated because:
I. the instructions are not executed smoothly in that some of the instructions that are not executed in a block especially fetched from the memory, are arranged in the cache memory, thus decreasing its utilization ratio; and
II. when a branch instruction is executed or a subroutine is called, control may be transferred to an instruction string described in a different file, so that instruction strings actually running in the execution program module may be discontinuous; to frequently update a TLB for the instructions provided for speedily referencing an instruction page table for use in correlating physical locations in the memory and logical locations observable from the program with each other.
Also, there are similar problems in data access; if data accessed and data not accessed are mixed in a certain execution environment or if data pieces accessed proximately time-wise are arranged at space-wise separate locations from each other, the execution speed is largely deteriorated because a data TLB is frequently updated which is used to speedily reference a logical address/physical address translation table for data addresses for use in correlating the physical locations in the memory and the logical locations observable from the program. Those situations can be improved by increasing the capacity of the cache memory capacity and the data TLB However, this is difficult to carry out in terms of cost and space because it increases the hardware size required.
Further, a branch prediction apparatus actually involves:
I. dynamic branch prediction for recording a hysteresis of whether conditions for a conditional branch instruction have been satisfied hitherto to thereby predict its future branch
on-branch;
II. explicit static branch prediction for recording beforehand branch
on-branch information of an instruction in the instruction itself when it is described to use it later when it is executed; and
III. implicit static branch prediction for observing a location relationship between a branch destination and a branch source to thereby decide branch
on-branch when a relevant instruction is executed; which are combined and used actually. Dynamic prediction by the branch prediction apparatus uses a hysteresis about how a branch instruction has been executed hitherto in prediction of its future. Therefore, if a relevant program has a large loop configuration such that once an instruction is executed, a large number of other instructions are executed until that instruction is executed next time, all of the hystereses, which are necessary for hysteresis recording, of executed branch instructions cannot be recorded due to limitations on the hardware size, so that some of the past hystereses may not be utilized in many cases. Even in such a case where past hystereses cannot be utilized, static branch prediction can be applied. By implicit static branch prediction used often, a branch is often predicted according to an experience rule that conditional branch instructions of a loop typically “have often a skip destination instruction with a lower numbered address (i.e., backward branch) and hardly have a skip destination instruction with a higher numbered address (i.e., forward branch)”. This experience rule, however, does not always work sufficiently.
Thus, in order to effectively operate the speed-up mechanism of hardware for executing instructions, when an object program or an execution program module is generated, an execution state is guessed or a thus generated program is executed experimentally. This is done to thereby decide instruction strings highly likely to be executed based on a record etc. of the execution state, or discriminate between execution portions and unexecution portions in a certain operational environment or between portions likely to be executed frequently and portions not likely to be done so. By dividing such a program into execution portions, unexecution portions, and frequently used portions and thus controlling it, the updating frequency of the TLB can be reduced, and also the location relationship between branch sources and branch destinations can be adjusted so that the implicit static branch prediction can function properly.
However, a source program made up as an aggregate of a large number of source files is compiled by the compiler processing each of its files, subroutines, functions, etc. as one unit to thereby generate an object program. Therefore, each compilation unit for the object file, object subroutine, and object function is divided into execution portions, unexecution portions, and frequently used portions, so that for example a plurality of files, i.e. compilation units, has not been processed so much. Accordingly, in order to concentrate execution instruction strings over a plurality of compilation units or optimize a system as a whole by adjusting those in front of and behind those strings, a vast source program comprised of a large number of source files must be compiled in a batch by a compiler newly developed, which takes a long time in compilation and is problematic.
Also, in order to generate an execution program module (application program) from an
Fujitsu Limited
Nahar Qamrun
Staas & Halsey , LLP
LandOfFree
Apparatus and method for generating optimization objects does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for generating optimization objects, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for generating optimization objects will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3220151