Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-05-03
2003-09-23
Portka, Gary J (Department: 2188)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S144000, C711S159000
Reexamination Certificate
active
06625694
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
The present invention relates generally to cache coherence in multiprocessor data processing systems, and more particularly to enhancing operation of caches with an algorithm for selecting a cache directory entry.
2. Discussion of the Background Art
A computer system node may be divided into a memory subsystem and a processor subsystem. The memory subsystem includes the main Dynamic Random Access Memory (DRAM) and provides data from memory in response to requests from any number of connected processors. Normally, the amount of time spent to access data in the memory subsystem is quite long relative to the processor's speed and therefore processors are often built with caches to improve their performance. The processor subsystem includes the processors and one or more caches. A cache is a small memory, connected between the processor and main memory, that stores recently-used data from the main memory. A cache is much faster to access than the main memory subsystem, and is usually much smaller. The smallest unit of data that can be transferred into and out of a cache is called a cached “line.” The data in memory that corresponds to a cached line is called a memory line. A data line refers to either a cached line or a memory line.
All caching architectures divide main memory into physically consecutive segments comprising one or a series of memory lines, many of which correspond to a pluralities of cached lines. Accessing a cached line requires a segment tag to identify the segment that corresponds to the line and a line index to identify the line within the segment. Those skilled in the art will recognize that if a segment has only one line then a line index is not required. If a processor requests a data line that is already contained in the local cache, then that data line is delivered to the processor. Otherwise, the processor gets the data line from main memory.
Set-associative and fully associative caches are “multiple” ways, meaning a directory entry references multiple cached lines that have the same memory segment index but are from different segments. This, compared to a direct-mapped cache, can improve the cache-hit rate because the multiple-way directory reduces contention between active cache lines that map to the same way. Direct mapping of cache lines avoids the question of selecting a directory to replace when the directory is needed to reference a newly requested cached line, but fully-associative and set-associative cache mapping schemes require a replacement protocol to select a directory referencing a particular cached line that should be replaced. The most popular protocol is the Least Recently Used (LRU) protocol, which replaces the cache line that has not been used for the longest time.
Typically, a set-associative cache is four- to eight-way while a fully-associative cache is thirty-two- to sixty-four-way.
In a shared-memory multiprocessor system, each processor usually has its own cache, so the system has multiple caches. Since each cache can hold a copy of a given data line, it is important to keep the states of all different cached lines consistent and up-to-date with the latest version written by any one of the processors. A memory subsystem is usually responsible for returning, from the caches or main memory, the correct value as prescribed by the processor's memory model, which includes a cache-coherence protocol having a set of rules to govern the operation of caches.
To maintain cache coherence across the system, the cache-coherence protocol uses a directory that contains cache-coherence control information. The directory, usually part of the memory subsystem, has an entry for each main memory location with state information indicating whether the memory data may also exist in a cache elsewhere in the system. The coherence protocol specifies all transitions and transactions to be taken in response to a memory request. Any action taken on a cache line is reflected in the state stored in the directory. A common cache coherence scheme uses three permanent states to accomplish this:
Invalid: Line is not cached anywhere. Main memory has the only copy.
Shared: Line is valid in at least one cache at a remote node.
Dirty: Line is valid in one cache at a remote node. The copy may be modified by the processor in that remote node. The main memory may contain old data.
The coherence protocol may use other transient states to indicate that a line is in transition. Given enough time, these transient states revert to one of the above permanent states.
On every memory request from a processor, a memory subsystem must look at all cache tags to identify the segment that stores the memory line corresponding to the cached line. Each cache in a “snoopy protocol” can “snoop” every request and then signal to the memory subsystem if it has the most recent version of the cached line. Alternatively, the memory subsystem can keep a duplicate of each cache's tags to find the location of the most recent version of the cached line. A duplicate tag-based method is sometimes called a “directory based cache-coherence protocol.”
FIG. 1
shows a prior art system
100
including multiple CPUs
102
A,
102
B,
102
C, and
102
D having respective local caches
110
A,
110
B,
110
C, and
110
D connected by a bus
118
to a memory controller
120
for the main DRAM memory
122
. In this example, main memory
122
has, for each memory line, a space reserved for a directory
124
entry, and therefore wastes memory spaces because the total number of cached lines, which determines the number of entries in directory
124
, is usually much smaller than the total number of memory lines in memory
122
. Further, the cache coherence protocols for prior art system
100
are deficient in that, as the number of caches
110
and size of memory
122
increase, the size of directory
124
becomes objectionably large.
System
100
may be improved by using a sparse directory, which is a cache of directory entries. However, a replacement algorithm to find a directory entry for referencing a new cached line without regard to the state of the existing cached line can cause heavy data traffic between memory
122
and caches
110
, and thus degrade system performance.
Therefore, what is needed is a replacement algorithm for use in a sparse directory that can solve the above deficiencies.
SUMMARY OF THE INVENTION
The present invention provides an algorithm to allocate a directory entry to store the state of a cached line in response to a memory request from a processor. The algorithm thus searches the directory for an entry. If at least one free entry is available, then the algorithm uses one of the available entries. Otherwise, the algorithm searches for a “shared” entry, and if at least one shared entry is found, then the algorithm uses preferably a “least recently used” (LRU) criteria to search among the available shared entries. Otherwise, the algorithm searches for a “dirty” entry. If at least one dirty entry is found, then the algorithm uses preferably the LRU criteria to search among the available dirty entries. The algorithm uses an LRU criteria because entries that were allocated long ago and that have not been used recently are more likely to be stale. To increase system performance, the algorithm preferably searches for a shared entry before searching for a dirty entry.
These and other advantages of the invention will become apparent to those skilled in the art from the following detailed description and the accompanying drawings.
REFERENCES:
patent: 5265232 (1993-11-01), Gannon et al.
patent: 5377345 (1994-12-01), Chang et al.
patent: 5848434 (1998-12-01), Young et al.
patent: 5897655 (1999-04-01), Mallick
patent: 5933849 (1999-08-01), Srbljic et al.
patent: 6138217 (2000-10-01), Hamaguchi
patent: 6185658 (2001-02-01), Arimilli et al.
U.S. patent application Ser. No. 09/287,650 , Takeshi Shimizu, Credit-Based Protocol for Over-Run Protection in a Milti-Processor Computer System, filed Apr. 4, 1999.
U.S. patent application Ser. No. 0
Masri Nabil N.
Weber Wolf-Dietrich
Carr & Ferrell LLP
Fujitsu Ltd.
Portka Gary J
LandOfFree
System and method for allocating a directory entry for use... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for allocating a directory entry for use..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for allocating a directory entry for use... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3072858