Methods and apparatus for controlling a cache memory

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S217000

Reexamination Certificate

active

06816946

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to methods and apparatus for controlling a cache memory and, more particularly, to methods and apparatus for pre-fetching data into the cache memory.
In recent years, there has been an insatiable desire for faster computer processing data throughputs because cutting-edge computer applications are becoming more and more complex, and are placing ever increasing demands on microprocessing systems. Graphics applications are among those that place the highest demands on a microprocessing system because they require such vast numbers of data accesses, data computations, and data manipulations in relatively short periods of time to achieve desirable visual results. Conventional microprocessors have very rapid cycle times (i.e., the unit of time in which a microprocessor is capable of manipulating data), such as two nanoseconds, although the time required to access data stored in main memory may be considerably higher than the cycle time of the microprocessor. For example, the access time required to obtain a byte of data from a main memory implemented using dynamic random access memory (DRAM) technology is on the order of about 60 nanoseconds.
In order to ameliorate the bottleneck imposed by the relatively long access time to DRAM memory, those skilled in the art have utilized cache memories. Cache memory is significantly faster than DRAM memory, and augments the function of data storage provided by the main memory. For example, an L2 cache memory may be coupled externally to the microprocessor or an L1 cache memory may be coupled internally with the microprocessor, which memories are significantly faster than a main memory implemented utilizing DRAM technology. An L2 cache memory may be implemented utilizing, for example, static random access memory (SRAM) technology, which is approximately two to three times faster than DRAM technology. An L1 cache memory is even faster than an L2 cache memory.
Due to the relatively high cost of cache memories, they are typically much smaller than main memory. Consequently, conventional algorithms have been employed to determine what data should be stored in the cache memory. These conventional algorithms may be based on, for example, the theoretical concept of “locality of reference,” which takes advantage of the fact that relatively small portions of a very large executable program are used at any particular point in time. Thus, in accordance with the concept of locality of reference, only the small portions of the overall executable program are stored in cache memory at any particular point in time.
The particularities of the known algorithms for taking advantage of locality of reference, or any other concept, for controlling the storage of data in a cache memory are too numerous to present in this description. Suffice it to say, however, that any given algorithm may not be suitable in all applications as the data processing goals of various applications may differ significantly.
In graphics applications, for example, strong data storage localities may exist in main memory, which localities are a function of the coordinates of pixel data within main memory. More particularly, the main memory utilized in a graphics application may be conceptually organized into a two-dimensional plane, where the information for each pixel in an image is stored at a particular X, Y coordinate. The main memory utilized in a graphics application may be further organized into a plurality of rectangular blocks, where each block contains the information for a plurality of pixels. The addressing of the pixels within a particular block may be highly linear, i.e., the access to the pixel information may be highly sequential within a given rectangular block.
Thus, conventional algorithms for controlling a cache memory in a graphics application typically dictate that a given cache line stores the pixel information for an entire rectangular block. When a cache hit occurs (i.e., when a memory access request for a particular byte of data can be satisfied by accessing the cache memory), the high linearity of the pixel information within the given rectangular block is likely to result in a cache hit for the next data access request. On the other hand, when a cache miss occurs (i.e., the memory access request for a particular byte of data cannot be satisfied by accessing the cache memory), all of the pixel information within the rectangular block in which the desired byte of data is located is read from main memory and stored in the cache memory. Unfortunately, these conventional algorithms do not adequately exploit the characteristics associated with the linearity and strong localities of memories used in graphics applications. Indeed, such algorithms result in an undesirable number of cache misses and, therefore, result in less efficient memory control, lower processing throughput, and inferior graphics imaging.
Accordingly, there are needs in the art for new methods and apparatus for controlling an overall memory, including a cache memory, that exploit the linearity and strong localities exhibited in memories used in graphics applications in order to improve memory efficiency, increase processing throughput, and improve the quality of graphics images.
SUMMARY OF THE INVENTIONS
In accordance with at least one aspect of the present invention, an apparatus for controlling a memory includes: a main memory partitioned into a plurality of two-dimensional blocks, each block including a plurality of data units containing one or more bytes of data, and each given block being partitioned into a plurality of zones, each zone containing one or more of the data units and at least some of the zones being associated with a respective block adjacent to the given block; an address decoder operable to receive addresses to the bytes of data in the main memory, and to decode the addresses to determine in which zones the respective data bytes are located; and a pre-fetch address generator operable to generate a request to pre-fetch the blocks that are associated with respective zones into a cache memory when the address decoder receives addresses to bytes of data located in the respective zones.
By way of example, each of the blocks may include at least one of: a left boundary common with a left adjacent block, a right boundary common with a right adjacent block, an upper boundary common with an upper adjacent block, and a lower boundary common with a lower adjacent block. Further, each of the blocks may include at least one of: a left zone associated with the left adjacent block and including at least one data unit disposed along the left boundary, a right zone associated with the right adjacent block and including at least one data unit disposed along the right boundary, an upper zone associated with the upper adjacent block and including at least one data unit disposed along the upper boundary, and a lower zone associated with the lower adjacent block and including at least one data unit disposed along the lower boundary.
Utilizing this partitioning: the left zone may be defined at least in part by a left dividing line extending from the upper boundary to the lower boundary, the left zone including at least some of the data units located between the left boundary and the left dividing line; the right zone may be defined at least in part by a right dividing line extending from the upper boundary to the lower boundary, the right zone including at least some of the data units located between the right boundary and the right dividing line; the upper zone may be defined at least in part by an upper dividing line extending from the left boundary to the right boundary, the upper zone including at least some of the data units located between the upper boundary and the upper dividing line; and the lower zone may be defined at least in part by a lower dividing line extending from the left boundary to the right boundary, the lower zone including at least some of the data units located between the lower boundary and the lower dividing line.
Prefera

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and apparatus for controlling a cache memory does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods and apparatus for controlling a cache memory, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for controlling a cache memory will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3305113

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.