Insertion of prefetch instructions into computer program code

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S131000, C717S142000, C717S154000, C712S235000, C712S237000

Reexamination Certificate

active

06675374

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a technique for inserting memory prefetch instructions (e.g., instructions that prefetch data into a processor's on-chip cache memory from off-chip main memory) into computer-executable program code, and more specifically, to such a technique wherein the prefetch instructions may be inserted into the program code in such a way as to improve efficiency and speed of execution of the code, avoid both cache memory conflicts and the overtaxing of processor resources, and reduce program execution inefficiencies (e.g., stalling of program execution by the processor) that can result if the data required by the processor to execute the code is not present in the cache memory when needed by the processor. Although the present invention will be described in connection with embodiments that are particularly well suited to use in connection with inserting of prefetch instructions into program code having one or more program loops in which memory array accesses are present, it will be appreciated that the present invention also may be advantageously used to insert such instructions into other types of program code.
2. Brief Description of Related Prior Art
As computer processors have increased their processing speeds, main computer memory systems have lagged behind. As a result, the speed of the computer system's main memory can be the limiting factor in the speed of execution of application programs by the computer system, particularly in the case of programs that manipulate large data structures (e.g., large arrays stored in memory, such as those needed in scientific and engineering programs). More specifically, when data stored in main memory is required by the computer system's processor to execute a given program, latency in transferring that data from the main memory to the processor may reduce the speed with which the processor may execute the program.
In order to try to increase program execution speed and reduce the aforesaid type of data transfer latency, in many conventional computer systems, the processor is used in conjunction with an associated high-speed cache memory. Typically, when the processor is implemented in a microprocessor integrated circuit chip, this cache memory is comprised in same chip as the processor. In such processors, when the data contained in the cache is accessed by the processor, that memory operation may stay on-chip (i.e., within the processor chip); such on-chip memory operations may be orders of magnitude faster to execute than similar memory operations that must access main memory.
In a further effort to increase program execution speed and efficiency, many conventional high-performance processors (e.g., the Alpha 21264™ microprocessor manufactured by, and commercially available from the Assignee of the subject application) have been configured to be able to issue instructions out-of-order, and to process certain instructions in parallel. By implementing these features in a given processor, the bandwidth of the processor's program instruction throughput may be increased. However, in a sequence of program instructions there may be a so-called “critical path” of instructions that are dependent upon one another and cannot be issued in parallel. When such a critical path exists in a given set of program instructions, the execution time of the instructions tends to approach the latency of execution of the critical path. In some important types of application programs (e.g., scientific and engineering application programs), memory operations comprise a significant portion of the total instructions in the programs' respective critical paths.
By appropriately inserting prefetch instructions into a program, the time required for the processor to execute the program's critical path can be decreased. That is, by inserting prefetch instructions, at appropriate places in the program prior to the point in the program where the data being prefetched by the prefetch instructions is required by the processor, the time required to execute the program's critical path of instructions may be reduced, by enabling the prefetched data to be in the cache and available to the processor at or near the time when it will be needed by the processor. This can improve the program's efficiency and speed of execution.
Further problems, in the form of cache conflicts, can arise if both the timing of data prefetching, during execution of the program, is not carefully managed to avoid such conflicts and, when the data is prefetched, it is transferred from the main memory to a cache memory that is not fully associative. That is, when such a cache memory is used, depending upon the timing of prefetching, and the address in main memory of the newly prefetched data, the newly prefetched data may displace (i.e., overwrite) useful data previously stored in the cache just prior to the processor requesting the useful data. When the processor references (e.g., requests) the useful data after it has been displaced from the cache, a cache miss occurs. This, in turn, causes retrieval from the main memory of the previously-displaced useful data, which is again stored in the cache, thereby displacing the data that previously displaced the useful data. The operations involved with this type of cache conflict problem are wasteful as they increase the time that it takes the processor to be able to use the useful data, and also consumes memory system bandwidth.
Computer programmers typically develop computer programs for conventional processors using relatively high-level source code computer languages (e.g., C++, Pascal, Fortran, etc.). This is because programmers often find developing computer software using such high-level languages to be much easier than developing the software using relatively low-level languages (e.g., assembly and machine language code). Compilation programs (e.g., compilers, linkers, assemblers, etc.) are typically used to translate or convert the source code developed by a programmer into a machine-executable form or image code for execution by the target processor. The compilation programs often implement processes (hereinafter “optimization processes”) that structure and generate the machine-executable code in such a way as to try to ensure that the execution of the machine-executable code by the target processor consumes a minimum amount of resources of the target computer system.
One such conventional optimization process is disclosed in U.S. Pat. No. 5,704,053 to Santhanam. The optimization process described in Santhanam involves inserting prefetch instructions that prefetch array accesses in scientific application program loops. This patent also describes performing reuse analysis using only subscript expression analysis, where previous methods had relied on dependence analysis. The patent also describes generating and inserting prefetch instructions, and taking into account reuse of data, to eliminate unnecessary prefetch instructions. Santhanam also teaches determining a “prefetch distance” (i.e., in essence, a time interval between the beginning of execution of the prefetch instruction and the expected time that the processor will require the data being prefetched by the instruction) that is used to calculate where in the program to insert the prefetch instruction. It is said that the prefetch distance may be calculated in terms of a number of loop iterations, in advance of the expected time that the processor will require the prefetched data.
Santhanam nowhere discloses or suggests employing any kind of cache conflict analysis when determining whether and where to insert a prefetch instruction. Thus, disadvantageously, Santhanam's disclosed optimization process is unable to prevent cache conflict problems, of the type described above, from occurring during execution of the machine code generated by that process. Santhanam also nowhere discloses or suggests generating the machine-executable code in such a way that the number of

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Insertion of prefetch instructions into computer program code does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Insertion of prefetch instructions into computer program code, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Insertion of prefetch instructions into computer program code will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3208726

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.