Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2001-07-31
2003-12-09
Sparks, Donald (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S141000, C711S144000, C711S146000, C711S147000, C711S148000
Reexamination Certificate
active
06662277
ABSTRACT:
FIELD OF INVENTION
This invention relates generally to computer systems and more specifically to cache memory systems.
BACKGROUND OF THE INVENTION
Most computer systems employ a multilevel hierarchy of memory systems, with relatively fast, expensive, limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost, higher-capacity memory at the lowest level of the hierarchy. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit, or mounted physically close to the processor for speed. There may be separate instruction caches and data caches. There may be multiple levels of caches.
Caches are commonly organized around an amount of memory called a line, or block, or page. The present patent document uses the term “line,” but the invention is equally applicable to systems employing blocks or pages.
Many computer systems employ multiple processors, each of which may have multiple levels of caches. Some caches may be shared by multiple processors. All processors and caches may share a common main memory. A particular line may simultaneously exist in memory and in the cache hierarchies for multiple processors. All copies of a line in the caches must be identical, a property called coherency. The protocols for maintaining coherence for multiple processors are called cache coherence protocols.
A cache “owns” a line if the cache has permission to modify the line without issuing any further coherency transactions. There can only be one “owner” of a line. For any cache coherence protocol, the most current copy of a cache line must be retrieved from the current owner, if any, and a copy of the data must be delivered to the requestor. If the line is to be modified, ownership must be acquired by the requester, and any shared copies must be invalidated.
There are three common approaches to determine the location of the owner of a line, with many variations and hybrids. In one approach, called a snooping protocol, or snoop-based protocol, the owner is unknown, and all caches must be interrogated (snooped) to determine the location of the most current copy of the requested line. All requests for access to a cache line, by any device in the system, are forwarded to all caches in the system. Eventually, the most current copy of a line is located and a copy is provided to the requestor. In a single-bus system, coherence (snooping) traffic, addresses, and often data all share a common bus.
In a second approach, called a directory-based protocol, memory is provided to maintain information about the state of every line in the memory system. For example, for every line in memory, a directory may include a bit for each cache hierarchy to indicate whether that cache hierarchy has a copy of the line, and a bit to indicate whether that cache hierarchy has ownership. For every request for access to a cache line, the directory must be consulted to determine the owner, and then the most current copy of the line is retrieved and delivered to the requester. Typically, tags and status bits for a directory are stored in main memory, so that a request for state information cycles main memory and has the latency of main memory. In a multiple bus system, directory traffic may be on a separate bus.
A third approach is a global coherency filter, which has a tag for every valid line in the cache system. A coherency filter is a snoop system with a second set of tags, stored centrally, for all caches in the system. A request for a cache line is forwarded to the central filter, rather than to all the caches. The tags for a coherency filter are typically stored in a small high-speed memory. Some coherency filters may only track owned lines, and may not be inclusive of all shared lines in the system. In a multiple bus system, coherency filter traffic may be on a separate bus.
For relatively small systems, with one bus or with only a few buses, snoop-based protocols provide the best performance. However, snoop-based systems with one bus increase bus traffic, and for large systems with one bus or with only a few buses, snoop traffic can limit overall performance. Directory-based systems increase the time required to retrieve a line (latency) relative to snooping on a single bus, but in a multiple-bus system a directory requires less coherency traffic on the system buses than snoop-based systems. For large multiple-bus systems, where bus traffic may be more important than latency, directory-based systems typically provide the best overall performance. Many computer systems use some sort of hybrid of snoop-based and directory-based protocols. For example, for a multiple bus system, snoop-based protocols may be used for coherency on each local bus, and directory-based protocols may be used for coherency across buses.
If a processor requests a line, the overall time required to retrieve the line (overall latency) includes (1) the time required to acquire access rights using a cache coherency protocol, (2) the time required to process an address, and (3) the time required to retrieve and transfer the data. As discussed above, bus traffic for coherency requests can limit overall performance.
One way to decrease bus traffic for coherency requests is to increase the line size. For example, if contiguous lines are requested, each line requires a separate coherency request. If line size is doubled, twice as much data is read for each coherency request. In addition, a substantial part of overall latency is the time required to route a memory request to the various memory components and to get the data from those components. Larger lines provide more data for each request. However, as lines become even larger, much of the data transferred is not needed, and much of the cache space is filled with data that is not needed. This increases the bus traffic for data transfer, and increases the cache miss rate, both of which negatively impact overall performance. In addition, some fraction of a line may be needed exclusively by more than one processor or node. This can cause excessive cache-to-cache copy activity as the two processors or nodes fight for ownership, and the resulting number of coherency requests may increase.
As an alternative, it is known to permit partial line (or partial block) invalidation. It is also known to prefetch extra sub-lines. For example, see C. K. Liu and T. C. King, A Performance Study on Bounteous Transfer in Multiprocessor Sectored Caches,
The Journal of Supercomputing,
11, 405-420 (1997). Liu and King describe a coherence protocol for invalidating sub-lines, and for prefetching of multiple sub-lines.
There is an ongoing need to reduce overall latency while maintaining coherency, particularly for large multiple-bus systems.
SUMMARY OF THE INVENTION
A computer system retrieves and stores groups of lines. Coherency states are maintained for groups of lines and for individual lines. Alternatively, the coherency state of a group of lines can be deduced by the coherency state of all of its sublines. A single coherency transaction, and a single address transaction, can then result in the transfer of multiple lines of data, reducing overall latency. Even though lines may be retrieved as a group, the lines can subsequently be treated separately. This avoids many of the problems caused by long lines, such as increased cache-to-cache copy activity. There may be multiple owners of lines within a group of lines. Special instructions may be implemented that request up to a group of lines. That is, depending on ownership, the instruction may result in only one line being transferred, or up to an entire group of lines being transferred. For multiple-bus systems, latency may be further reduced by preferably retrieving unowned lines from memory rather than from caches.
REFERENCES:
patent: 5666514 (1997-09-01), Cheriton
patent: 5787475 (1998-07-01), Pawlowski
patent: 5832232 (1998-11-01), Danneels
patent: 6035376 (2000-03-01), James
patent: 6049845 (2000-04-01), Bauman et al.
patent: 6189078 (2001-0
Dinh Ngoc
Hewlett--Packard Development Company, L.P.
Sparks Donald
Winfield Augustus W.
LandOfFree
Cache system with groups of lines and with coherency for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Cache system with groups of lines and with coherency for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cache system with groups of lines and with coherency for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3142386