Compiler-based cache line optimization

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06564297

ABSTRACT:

BACKGROUND OF THE INVENTION
Generally, a microprocessor operates much faster than main memory can supply data to the microprocessor. Therefore, many computer systems temporarily store recently and frequently used data in smaller, but much faster cache memory. Cache memory may reside directly on the microprocessor chip (Level 1 cache) or may be external to the microprocessor (Level 2 cache). In the past, on-chip cache memory was relatively small, 8 or 16 kilobytes (KB); however, more recent microprocessor designs have on-chip cache memories of 256 and even 512 KB.
Referring to
FIG. 1
, a typical computer system includes a microprocessor (
10
) having, among other things, a CPU (
12
), a load/store unit (
14
), and an on-board cache memory (
16
). The microprocessor (
10
) is connected external cache memory (
17
) and a main memory (
18
) that both hold data and program instructions to be executed by the microprocessor (
10
). Internally, the execution of program instructions is carried out by the CPU (
12
). Data needed by the CPU (
12
) to carry out an instruction are fetched by the load/store unit (
14
) and loaded into internal registers (
15
) of the CPU (
12
). A memory queue (not shown) maintains a list of outstanding memory requests. The load/store unit adds requests into the memory queue and also loads registers with values from the memory queue. When the memory queue contains a list of outstanding memory requests this is referred to as a memory transaction. The memory transaction is released, or guaranteed to be completed, with other instructions. The correspondence between starting and releasing memory transactions with instructions helps a compiler manage the memory queue.
Upon command from the CPU (
12
), the load/store unit (
14
) searches for the data first in the fast on-board cache memory (
16
), then in external cache memory (
17
), and finally in the slow main memory (
18
). Finding the data in the cache memory is referred to as a “hit.” Not finding the data in the cache memory is referred to as a “miss.”
The time between when a CPU requests data and when the data is retrieved and available for use by the CPU is termed the “latency” of the system. If requested data is found in cache memory, i.e., a data hit occurs, the requested data can be accessed at the speed of the cache and the latency of the system is reduced. If, on the other hand, the data is not found in cache, i.e., a data miss occurs, and thus the data must be retrieved from main memory for access, the latency of the system is increased.
In pursuit of increasing efficiency by reducing latency and increasing the hit to miss ratio associated with cache memory, prefetch operations have been implemented in many computer systems. Prefetch operations retrieve data associated with a memory operation prior to when the memory operation occurs. By doing so, when the memory operations occurs, the data is present in the cache memory. It is important to schedule prefetch operations at optimal points in an instruction line and to prefetch only data that is likely to be referenced.
SUMMARY OF THE INVENTION
In general, in accordance with an embodiment of the present invention, a method for cache line optimization of programs with irregular access patterns comprises selecting references for optimization, identifying cache lines, and mapping the selected references, determining dependencies within the cache lines, and scheduling the cache lines based on the determined dependencies with the goal of increasing the number of outstanding cache line misses at all times.
In general, in accordance with an embodiment of the present invention, a method of cache line optimization comprises a cache line scheduling step, and an instruction line scheduling step based on the cache line scheduling step.
In general, in accordance with an embodiment of the present invention, a software tool for cache line optimization comprises a program stored on computer-readable media for selecting references for optimization, identifying cache lines, mapping the selected references to the identified cache lines, determining dependencies within the cache lines, and scheduling the cache lines based on the determined dependencies.
In general, in accordance with an embodiment of the present invention, a software tool for cache line optimization comprises a program stored on computer-readable media for scheduling a cache line, and scheduling an instruction line based on the cache line scheduling.
In general, in accordance with an embodiment of the present invention, an apparatus for cache line optimization comprises cache line scheduling means, and instruction line scheduling means, wherein the instruction line scheduling means schedules instructions based on the cache line scheduling means.
Other advantages and features will become apparent from the following description, including the figures and the claims.


REFERENCES:
patent: 5900022 (1999-05-01), Kranich
patent: 5950007 (1999-09-01), Nishiyama et al.
patent: 6049669 (2000-04-01), Holler
patent: 0 743 598 (1996-11-01), None
patent: 0 919 921 (1999-06-01), None
Kandemin et al., “Improving cache Locality by a Combination of Loop and Data Transformations”, 1999, IEEE, p159-167.*
Trancoso et al., “Cache Optimization for Memory-Resident Decision Support Commercial Workloads”, 1999, IEEE, p546-554.*
Carr et al., “Compiler Optimizations for Improving Data Locality”, 1994, ACM, p252-262.*
McKinley et al., “Improving Data Locality with Loop Transformations”, 1996, ACM, p424-453.*
IBM, “Executable Program Restructuring for Optimal Cache/Real Memory Utilization”, IBM TDB, vol. 36, issue 12, p153-154, 1993.*
IBM, “Code Optimization by Hints to the Compiler”, IBM TDB, vol. 40, issue 9, p125-128, 1997.*
Gupta et al, “Improving Instruction Cache Behvior by Reducing Cache Pollution”, 1990, IEEE, p82-91.*
Hwu et al., “Achieving High Instruction Cache Performance with an Optimizing Compiler”, 1989 ACM, p242-251.*
IBM, “Selective Prefetching Based on Miss Latencye” IBM TDB, vol. 36, issue 10, p411-412, 1993.*
IBM, “Grouping of Instructions”, IBM TDB, vol. 38, issue 8, p531-534, 1995.*
Holler, A.M.; “Optimization for a Superscalar Out-of-Order Machine,” Proceedings of the 29thAnnual IEEE/ACM Int'l. Symposium on Microarchitecture, Micro-29, Paris, Dec. 2-4, 1996; Proceedings of the Annual IEEE/ACM Int'l. Symposium on Microarchitecture (Micro), Los Alamitos, IEEE Comp. Soc. Press, U, vol. SYMP. 29, Dec. 2 1996, pp. 336-348.
Wolf, M.E. et al.; “Combining Loop Transformations Considering Caches and Scheduling,” Proceedings of the 29thAnnual IEEE/ACM Int'l. Symposium on Microarchitecture, Micro-29, Paris, Dec. 2-4, 1996; Proceedings of the Annual IEEE/ACM Int'l. Symposium on Microarchitecture (Micro), Los Alamitos, IEEE Comp. Soc. Press, U, vol. SYMP. 29, Dec. 2 1996, pp. 274-286.
PCT International Search Report, Sep. 30, 2002, 4 pgs.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Compiler-based cache line optimization does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Compiler-based cache line optimization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Compiler-based cache line optimization will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3003842

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.