System and method for scheduling memory instructions to...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S207000, C711S213000

Reexamination Certificate

active

06678796

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention pertains generally to software prefetching algorithms. More particularly, the invention is a software method and apparatus for scheduling instructions to provide adequate prefetch latency.
2. The Prior Art
Current computer systems include, among other things, a memory system and a processing unit (or processor or central processing unit (CPU)). A memory system serves as a repository of information, while the CPU accesses information from the memory system, operates on it, and stores it back.
However, it is well known that CPU clock speeds are increasing at a faster rate than memory speeds. When a processor attempts to read a memory location from the memory system, the request is “very urgent”. That is, in most computer systems, the processor stalls or waits while the memory system provides the data requested to the CPU. The “latency” of the memory is the delay from when the CPU first requests data from memory until that data arrives and is available for use by the CPU.
A cache is a special high-speed memory in addition to the conventional memory (or main memory).
FIG. 1
depicts a conventional hierarchical memory system, where a CPU is operatively coupled to a cache, and the cache is operatively coupled to the main memory. By placing the cache (very fast memory) in front of the main memory (large, slow memory), the memory system is able to satisfy most requests from the CPU at the speed of the cache, thereby reducing the overall latency of the system.
When the data requested by the CPU is in the cache (known as a “hit”), the request is satisfied at the speed of the cache. However, when the data requested by the CPU is not in the cache (known as a “miss”), the CPU must wait until the data is provided from the slower main memory to the cache, and then to the CPU, resulting in greater latency. In memory-bound applications, such as database servers and other commercial applications, data requests very often miss the cache, as is generally known in the art.
To address the problem of latency and to increase the “hit” to “miss” ratio associated with cache memory, many modern computer systems have introduced instructions for prefetching data from memory to cache. For example, instructions set architectures (ISA's), such as SPARC™ V9, support software data prefetch operations. The details of implementing prefetch operations have been left to the designers of optimizing compilers to find ways to reduce the frequency of cache misses. In general, prefetch instructions require adequate latency for proper operation.
For many scientific applications that work on arrays, the CPU generally operates from a contiguous group of memory. Thus, predicting which areas of memory will be required for a memory operation may be carried out relatively far in advance (in CPU cycles) of the memory operation, and scheduling prefetches is a relatively nominal task by placing the prefetch far enough in advance of the memory operation to cover the prefetch latency.
For database applications and other commercial applications, however, predicting which areas of memory will be required is much more difficult, in large part because there is normally insufficient latency time (in CPU cycles) between the address forming operation and the memory operation (associated with the address) to cover the prefetch latency (the amount of the time for carrying out the prefetch instruction). In these cases, the prefetch cannot be simply moved far enough in advanced of the memory operation in order to cover the prefetch latency.
Accordingly, there is a need for a method and apparatus which schedules memory instructions to provide adequate latency for proper prefetch operation. The present invention satisfies these needs, as well as others, and generally overcomes the deficiencies found in the background art.
BRIEF DESCRIPTION OF THE INVENTION
The present invention is a method and apparatus embodied in software suitable for use with compilation of source code. The invention further relates to machine readable media on which are stored embodiments of the present invention. It is contemplated that any media suitable for retrieving instructions is within the scope of the present invention. By way of example, such media may take the form of magnetic, optical, or semiconductor media.
The present invention also relates to a method and use of prefetch operations to load data from memory into a cache. It is contemplated that the invention may be used for loading data from conventional main memory as well as other “slow” data storage structures such as a disk storage or a network storage, for example. Although, the invention is described herein with respect to a single cache, it is contemplated that any suitable cache arrangement (e.g., various levels of cache) is within the scope of the present invention.
In its most general terms, the invention comprises software for scheduling memory operations to provide adequate prefetch latency. The invention is generally used in conjunction and incorporated into compilation software (compiler), which converts source code into a compiled program (or executable file). During compilation, the source code is converted into an intermediary “program code” which is processed by the compiler. After the compiler has completed processing the program code, a compiled program is generated from the program code.
More particularly, the invention is embodied in a prefetch scheduler component having a martyr load locating routine and a prefetch placement routine.
The prefetch scheduler sorts memory operations (such as loads) into two groups: loads that are not likely to miss the cache (hit candidates) and loads that are likely to miss the cache (miss candidates). Various algorithms known in the art may be used for carrying out this classification.
The martyr load locating routine carries out the operation of locating memory operations within the program code which are likely to miss the cache. The martyr load locating routine then designates one of the memory operations which is likely to miss the cache as a “martyr” load and removes any prefetch associated with the martyr load operation. As a result, the martyr load will have to access the requested data from memory, thereby inherently generating latency (i.e, CPU stall).
The present invention utilizes the latency created by the martyr load to provide the necessary latency for other prefetches to complete. More particularly, the prefetch placement routine carries out the operation of locating one or more “succeeding” memory operations, subsequent to the martyr load. The prefetch placement routine then schedules prefetches associated with each succeeding memory operation “behind” (i.e., prior to) the martyr load operation. Since the martyr load will generate latency to carry out its operation, the same latency is used to allow the scheduled prefetches (placed prior to the martyr load) to complete.
The number of prefetches which may be placed “behind” the martyr load varies according to the hardware implementation on which the program will execute. For example, the UltraSPARC-III™ chip provides for eight (8) in-flight memory operations to be carried out in operation. In this case, the number of prefetches to be placed “behind” the martyr load would be seven (7), the eighth memory operation occupied by the martyr load.
The prefetch scheduler component as described herein is used in conjunction with a compiler. The compiler normally includes other compilation components for carrying out other compilation tasks as is known in the art. In general, the compiler processes a program code (converted from a source code file) and generates a compiled file (or executable program). The prefetch scheduler component executes during this compilation process and carries out the method of the present invention.


REFERENCES:
patent: 5377336 (1994-12-01), Eickemeyer et al.
patent: 5493675 (1996-02-01), Faiman, Jr. et al.
patent: 5627982 (1997-05-01), Hirata et al.
patent: 5948095 (1999-09-01), Arora et al.
pa

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for scheduling memory instructions to... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for scheduling memory instructions to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for scheduling memory instructions to... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3202441

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.