Method for apparatus for prefetching linked data structures

Electrical computers and digital processing systems: processing – Instruction fetching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S207000, C711S137000

Reexamination Certificate

active

06687807

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is generally directed to reducing the latency of memory operations in modern computer systems. More specifically, the present invention is directed to additional computer system hardware and a method for more efficient generation of prefetch instructions in compiler-inserted prefetching of pointer-based or recursive data structures.
2. Background
Processor utilization in modem computer systems is diminished by latency of memory accesses. Memory speeds have not kept pace with continually increasing processor speeds, and the widening gap between the two makes it difficult to realize high processor efficiencies.
Most computer systems use a cache hierarchy to help alleviate the problem of memory latency. As illustrated for example in
FIG. 1
, different levels of cache memory
104
,
106
are located closer to the central processing unit (CPU)
100
than the main memory
102
. Other cache structures such as separate instruction, data and prefetch caches are also known in the art. Cache memory is a short term memory which typically contains data most likely to be accessed by the CPU
100
. Therefore, fewer data requests from the CPU
100
result in data accesses that extend all the way out to the main memory
102
, and there is a reduction in the amount of cycle time that the CPU
100
sits idly waiting for the data it requests. A typical first level cache
104
as shown in
FIG. 1
may be only 2 to 3 cycles away from the core of the CPU
100
while a second level cache
106
may be 10 to 20 cycles away. However, main memory
102
may be on the order of 100 to 500 cycles away from the core of the CPU
100
which can result in significant latency for a CPU
100
which is required to access the main memory
102
on a frequent basis.
A typical program working on a linked data structure directed to the computer's memory subsystem and executed on the CPU
100
might include a process such as that demonstrated by the flow chart of FIG.
2
. The process searches through the memory subsystem one data node at a time until the desired data is located, at which point the search would stop and the data would be used by the CPU
100
for further processing. The first step of the process
200
determines whether the end of the data structure has been reached by checking the node address against null. If the node address is equal to null and thus not valid, the process might move on to another data structure
202
within the memory subsystem and continue the search. If the node address is not equal to null, the data at that address is fetched
210
and is checked against a key value
204
to determine whether it is the desired data. If the data is the desired data, the search stops and the data is further processed
206
by the CPU
100
. If the data is not the desired data, the search continues on at the next data node
208
within the data structure.
Significant latency can occur if the CPU
100
sits idly waiting for data requests to be answered by the memory subsystem
102
,
104
,
106
. Although cache memory is useful in mitigating the memory latency problem, it is not a complete solution. Other techniques have been developed to address the increasing divergence between processor speeds and memory speeds. Automatic compiler techniques such as software-controlled prefetching have had moderate success. Software-controlled prefetching is a compiler technique for tolerating memory latency by executing data prefetch instructions which move data into the cache and closer to the processor before it is needed by the processor. Although software-controlled prefetching offers some benefit to the memory latency problem, it suffers from several disadvantages such as the need for a sophisticated compiler to insert prefetch instructions into the code, execution overhead created by the new prefetch instructions, and its mostly limited application to array-based data structures.
Along with multi-dimensional data array structures, pointer-based or recursive data structures which include linked lists, trees and graphs are one of the most common methods of building large data structures. As illustrated in
FIG. 3A
, each node
300
within a recursive data structure has a field which contains a pointer address
302
pointing to the address
304
of the next node in the data structure. By contrast, in a data array structure as shown for example in
FIG. 3B
, consecutive data elements or nodes
306
are located at contiguous addresses
308
. Therefore, predicting which data nodes in the memory that need to be prefetched due to the likelihood of generating a cache miss and inserting a prefetch instruction sufficiently far in advance to avoid memory latency is significantly more difficult in a recursive data structure than in a data array structure. This is so because addresses to consecutive data elements or nodes in data array memory structures can always be calculated given a reference point address, while the lack of spatial locality due to the arbitrary addressing of data nodes in a recursive data structure precludes the discovery of the next desired node address until the address value stored in that particular node is actually read. Thus, the limitation of software-controlled prefetching to mostly multi-dimensional data array structures is apparent based on the problems encountered in software-controlled prefetching of recursive data structures through compiler-inserted prefetching schemes.
Accordingly, there exists a need for a method of efficiently generating prefetch instructions in compiler-inserted prefetching of pointer-based or recursive data structures which incurs minimal costs in overhead and hardware.
SUMMARY OF THE INVENTION
Additional memory hardware in a computer system which is distinct in function from the main memory system architecture permits the storage and retrieval of prefetch addresses and allows the compiler to more efficiently generate prefetch instructions for execution while traversing pointer-based or recursive data structures. The additional memory hardware makes up a content addressable memory (CAM) or a hash table/array memory that is relatively close in cycle time to a processor such as a central processing unit (CPU) and relatively small when compared to the main memory system. The additional CAM hardware permits the compiler to write data access loops which remember the addresses for each node visited while traversing the linked data structure by providing storage space to hold a prefetch address or a set of prefetch addresses. Since the additional CAM is separate from the main memory system and acts as an alternate cache for holding the prefetch addresses, it prevents the overwriting of desired information in the regular cache and thus leaves the regular cache unpolluted. Furthermore, rather than having the addresses for the entire memory system stored in the CAM, only the addresses to those data nodes traversed along the pointer-based data structure are stored and thus remembered, which allows the size of the CAM to remain relatively small and access to the CAM by the CPU, relatively fast.


REFERENCES:
patent: 5107418 (1992-04-01), Cramer et al.
patent: 5185810 (1993-02-01), Freischlad
patent: 5317718 (1994-05-01), Jouppi
patent: 5317740 (1994-05-01), Sites
patent: 5659754 (1997-08-01), Grove et al.
patent: 5694568 (1997-12-01), Harrison et al.
patent: 5761468 (1998-06-01), Emberson
patent: 5778233 (1998-07-01), Besaw et al.
patent: 5790859 (1998-08-01), Sarkar
patent: 5822788 (1998-10-01), Kahn et al.
patent: 5905876 (1999-05-01), Pawlowski et al.
patent: 5948095 (1999-09-01), Arora et al.
patent: 5996061 (1999-11-01), Lopez-Aguado et al.
patent: 6119218 (2000-09-01), Arora et al.
patent: 6175898 (2001-01-01), Ahmed et al.
patent: 6253306 (2001-06-01), Ben-Meir et al.
patent: 6317810 (2001-11-01), Lopez-Aguado et al.
patent: 6418516 (2002-07-01), Arimilli et al.
Stallings, William, “Operating Systems”, 3rdedition, 1998, Prentice-Hall, Upper Saddle River, New Jersey, pp. 325-327.*
Doug Jose

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for apparatus for prefetching linked data structures does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for apparatus for prefetching linked data structures, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for apparatus for prefetching linked data structures will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3311708

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.