Method of, system for, and computer program product for...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S152000, C717S152000, C717S152000, C717S152000

Reexamination Certificate

active

06175957

ABSTRACT:

A portion of the Disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to optimizing compilers for development of computer programs for use on a computer, and more particularly to code restructuring.
2. Description of the Related Art
Advances in processor designs have provided very high speed instruction processing. In order to sustain this speed, processors need very effective usage of the memory hierarchy of the machines. Ensuring effective usage is difficult for large programs like compilers or database engine codes that have complex control flow. This occurs, for example, as a result of having to consider inputs of different types or handle error conditions appropriately. Consequently, code generated for large complex programs consists of sections that are heavily executed intermingled with sections that are infrequently executed, resulting in poor utilization of the memory hierarchy. Code restructuring may improve memory hierarchy utilization and, consequently, performance.
The state of the art of code restructuring techniques is described in Simons (B. Simons, “Code Positioning for Procedures and Basic Blocks,” IBM Technical Report ADTI-1994-006; also available as IBM Santa Teresa Laboratory Technical Report TR 03.580.). Two representative techniques described by Simons include the IBM Heatshrink and the IBM AOPT products. Heatshrink, a trace-directed program restructuring (TDPR) tool, was designed and implemented by Randy R. Heisch. A more detailed description of Heatshrink is given in Heisch (R. R. Heisch, “Trace-Directed Program Restructuring for AIX Executables”,
IBM Journal of Research and Development
, vol. 38, September 1994.). AOPT, a stand-alone tool developed by Itai Nahshon, attempts to locate a “hot path” in the code and then move the entire path as a single entity, rather than move individual basic blocks. Profile information, used by both Heatshrink and AOPT, is obtained by AOPT. Profiling involves running code using sample or representative inputs. Based on the behavior of the code on the given inputs, one makes assumption about its behavior on all inputs. Heatshrink and AOPT both perform global reordering of basic blocks and are bundled together as part of FDPR (Feedback Directed Program Restructuring tool), an IBM Licensed Program Product for IBM AIX Operating System, Version 4.1.
Pettis et al. (K. Pettis and R. C. Hansen, “Profile Guided Code Positioning,”
Proceedings of the ACM SIGPLAN
'90
Conference on Programming Language Design and Implementation
(
PLDI
), pp. 16-27, June 1990.) follow a different approach, that also relies on profiling information, but that moves executed basic blocks only within a procedure in addition to moving procedures. Basic blocks that are not executed according to the profile data are moved to a separate region at the end of an ordering.
These and other conventional techniques for code restructuring attempt to keep frequently executed code close together in memory. Intuitively, such an approach tries to retain the working set of frequently executed code in cache. If the working set fits in the cache, this approach is reasonable; on the other hand, if the cache is small or if the effective portion of the cache available to an application is small due to multiprogramming, conventional code restructuring techniques may not perform well. A major flaw of these conventional approaches is that “closeness” in memory does not actually relate to cache or paging performance. Two basic blocks may be “close” together in memory, and yet there may still be a cache miss or page fault transitioning between the two basic blocks, or they may be very far apart and still cause the same effect of a cache miss or page fault. Thus, conventional code restructuring techniques fail to realize that blocks interacting frequently should be kept in the same cache line or page rather than “close” to each other in memory. Conventional code restructuring techniques also fail to realize that memory hierarchy hardware parameters such as cache line size, page size, cache associativity size and translation lookaside buffer (TLB) size may be considered in determining the code structure of an application.
Thus, the conventional code restructuring techniques fail to consider memory hierarchy. As such, there is a need for a method of, system for, and computer program product for, providing code restructuring dependent upon a memory hierarchy.
SUMMARY OF THE INVENTION
The invention disclosed herein comprises a method of, a system for, and an article of manufacture for providing code restructuring based on profiling information and memory hierarchy, that is also scalable to arbitrary levels of memory hierarchy.
The present invention reorders code to reduce communications across cache line and across page boundaries. The present invention also reorders code to reduce the likelihood that chunks of code that communicate with each other and that have the same cache or TLB associativity set will be assigned to different cache lines or pages. The present invention may be generalized to deal with multiple levels of cache or other types of memory hierarchy.
The present invention accomplishes this by reordering code such that chunks of code that communicate extensively with each other are on the same cache line, page, or other level of memory hierarchy. Reordering is performed by first clustering the nodes, performing first level padding of unfilled clusters with basic blocks that were not accessed in profiling, and then performing additional padding using no-operation instructions (NOP or NO-OP) to fill out any unfilled portion of that level of the memory hierarchy. For example, to reorder code for a first level cache line, nodes are clustered to reduce communications across cache lines, any of these clusters which are unfilled clusters are padded with basic blocks that were not accessed in profiling, and then any unfilled cache lines are further padded with NOPs to fill out the cache lines.
The present invention avoids set collisions by considering both memory hierarchy size and associativity. Associativity means that there is a connection between a memory location and the place in the memory hierarchy to which the data for that memory location is assigned. 4-way associativity, for example, means that the cache is divided into four sets, and each memory location is uniquely mapped into one of those four sets. This mapping may be done by using two bits of the memory location address. If a cache line is 128 bytes in size, and if a memory word contains 4 bytes, then 32 words will fit on one cache line. A primary objective of the associativity mapping is to have each consecutive chunk of 32 words map to different associativity sets. If bits
6
and
7
of the memory location address are used to determine to which associativity set a 32 bit chunk maps, then adjacent 32 word chunks map to different sets. A set collision occurs when all the cache lines in a particular set are full. This may occur even if one of the other sets has space. As conventional processor implementations hard wire the associativity between memory and cache sets, data may not arbitrarily occupy a cache line in an associativity set that differs from the one to which the particular memory is assigned. The present invention considers both memory hierarchy size and associativity to avoid such associativity set collisions by dividing the cache size by its associativity, i.e., if an 8K cache is 4-way set associative, then n2=2048 bytes. This n2 falls between the cache line size, n1, of 128 bytes and the page size, n3, of 4096 bytes. Similarly, the TLB size divided by its associativity is considered. Alternative embodimen

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of, system for, and computer program product for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of, system for, and computer program product for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of, system for, and computer program product for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2495924

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.