Method and apparatus for optimizing bcache tag performance...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S122000

Reexamination Certificate

active

06401173

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to data processing system design and in particular to accelerating cache memory performance.
BACKGROUND OF THE INVENTION
A number of microprocessor design features are now considered essential in order to obtain high performance at low cost. For example, processor implementations now typically allow issuance of multiple instructions during each clock cycle. Most such processors also employ cache memories to provide a latency and bandwidth advantage for reasonably large data blocks such as on the order of one megabyte or more. The cache memories permit high speed pipelined execution to occur while minimizing delays associated with reading and writing data.
Cache memories operate by mirroring the contents of main memory in a way which is transparent to a Central Processing Unit (CPU). For example, each memory address referenced by an instruction is first passed to a cache controller. The cache controller keeps track of which portions of main memory are currently assigned to the cache. If the cache is currently assigned to hold the contents of the requested address, a “cache hit” occurs and the cache is enabled to complete the memory reference access, whether it be a write or a read access. If this is not the case, a “cache miss” has occurred, and the main memory is enabled for access. When a miss occurs, the cache controller typically assigns the miss address to the cache, fetches or “fills” the data contained at that address from main memory and stores it in the cache, and if necessary, displaces the contents of a corresponding cache location.
Cache memories are implemented in a hierarchy with a primary data store or main memory being the lowest order of the hierarchy, a secondary cache or backup cache (“bcache”) being a middle level of the hierarchy, and a primary level cache or “dcache” being the highest level cache. The bcache, for example, may be a “board level” cache implemented with memory chips external to the processor chip and the dcache may be implemented with on-chip memory devices.
It is desirable for the physical existence of the various cache hierarchy levels to be transparent. For example, the programmer should only have to worry about implementing instructions and not be concerned with the details of whether a particular target address is located in the dcache, bcache, or main memory. Furthermore, the programmer should be permitted to assume that the data written back to memory by a store instruction (STx) will always be written back properly. This important property of cache hierarchies is known as cache coherency.
In general, cache memories consist of a tag portion in addition to a data storage portion. The tag portion contains address and status bits for the data contained in the storage portion. The data portion contains typically multiple data bytes for each addressable cache location.
To complete an instruction reference to a cache memory, the data and tag memories are first read. If the referenced address matches the address in the tag portion, a hit occurs and then data associated with the tag is delivered to the consuming instruction. If the tags do not match, the data referenced by the consuming instruction must then be fetched and written into the cache. Cache filling is an operation by which the contents of the cache are copied back to main memory, and must typically be performed prior to displacing a “victim” cache location with the new data in order to avoid losing the contents of the victim location. It is therefore common to include a so-called “dirty” bit with each cache location, indicating whether the data for the cache location is different from the corresponding data in the next lower level of the hierarchy.
After the victim block has been written, but before the memory fill for the new block may proceed, the tag array still contains the victim address. During this period of time, if the same location is again accessed by another outside agent such as a second processor, the cache might provide a false hit response. One way of dealing with this problem is to allow this false hit response to occur, but then depend upon the fact that the data in the cache is the same as the data in the main memory until the memory fill updates the processor. This assumption is valid when the processor and the cache use a shared data bus. For example, in most known computing system architectures, the caches and main memory typically share a data bus. Therefore, the necessary fill operations may be completed for each level of the cache simultaneously.
A complication for cache management occurs if the system architecture permits sharing of write access to main memory locations among processors. Probe commands are therefore typically used in such architectures to allow one processor to inform another processor that it is attempting to write to a particular location. This allows the first processor to properly execute the store conditional instructions. However, the need to support such probe commands requires each processor to be able to determine whether it presently has the only valid contents of a main memory location in one or more of its own caches.
SUMMARY OF THE INVENTION
In the present invention, the primary level cache (dcache) and second level cache (bcache) do not share a common bus for access to the main memory. Rather, the dcache is provided with two different data buses to separately access the main memory and the bcache in order to provide higher bandwidth to each of these structures. In this case, a memory fill operation from the main memory may be consumed directly by the dcache without first having to wait for a fill operation in the bcache to complete.
Since the bcache bus is normally a high speed pipelined read bus, this avoids necessarily turning around the bus in order to update the bcache during the pendency of some other critical operation. Otherwise, one would have to wait for the bcache pipeline to drain, initiate the store operation, wait for the pipeline to drain again, and then turn the pipeline back around for subsequent read operations.
While this architecture improves processor performance by allowing the higher speed dcache memory to complete a fill operation without waiting for the slower bcache memory, there is a problem in that strict cache hierarchy rules are violated. In effect, the rules of cache hierarchy coherency are temporarily “bypassed” in the sense that the bcache is not immediately updated with the fill data. Thus, the bcache may not be updated for a long period of time and there is no guarantee that “stale” data in the bcache is the same as that in main memory.
A simple solution to this problem might be to either invalidate the bcache tag on all bcache victim operations, or otherwise to insure that all fill operations are cycled through the bcache, i.e., disable the bypassing mode. However, these approaches either consume precious bcache tag or dcache memory bandwidth.
Thus, while a processor according to the present invention uses two independent memory access ports for the bcache memory and the main memory, a set of rules are also observed by the processor to enable it to infer the bcache state without unnecessarily performing bcache reads.
In accordance with the invention, upon the issuance of a memory reference instruction such as a load or store instruction, the dcache memory array is first checked to see if it has the contents of the referenced location as is typical. If there is a hit in the dcache, then the memory access is complete.
However, if a dcache miss occurs, then a bcache read is initiated. In the process of reading data from the bcache, if it becomes apparent that a dcache victim operation will be required, i.e., that the dcache is already full and dcache locations will need to be displaced in order to copy new information from the bcache to the dcache, a determination is first made as to whether or not the dcache victim block is dirty. If the dcache victim block is dirty, this block must be scheduled for eviction either to the bcache or to main

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for optimizing bcache tag performance... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for optimizing bcache tag performance..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for optimizing bcache tag performance... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2977526

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.