Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-01-22
2001-11-20
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S102000, C711S103000, C711S145000
Reexamination Certificate
active
06321304
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to cache coherence in computer systems having multiple processors with caches, and more particularly to a system and method for deleting a read-only head entry in systems supporting mixed-coherence protocol options.
2. Description of the Background Art
Multiple-processor computer systems involve various processors which at the same time may each work on a separate portion of a problem or work on a different problem.
FIG. 1
shows a multi-processor system, including a plurality of Central Processing Units (CPUs) or processors
102
A,
102
B . . .
102
N, communicating with memory
104
via interconnect
106
, which could be, for example, a bus or a collection of point-to-point links. Processors
102
access data from memory
104
for a read or a write. In a read operation, processor
102
receives data from memory
104
without modifying the data, while in a write operation processor
102
modifies the data transmitted to memory
104
.
Each processor
102
generally has a respective cache unit
108
A,
108
B, . . .
108
N, which is a relatively small group of high speed memory cells dedicated to that processor. A processor
102
's cache
108
is usually on the processor chip itself or may be on separate chips, but is local to processor
102
. Cache
108
for each processor
102
is used to hold data that was accessed recently by that processor. Since a processor
102
does not have to go through the interconnecting bus
106
and wait for the bus
106
traffic, the processor
102
can generally access data in its cache
108
faster than it can access data in the main memory
104
. In a normal operation, a processor
102
N first reads data from memory
104
and copies that data to the processor's own cache
108
N. During subsequent accesses for the same data the processor
102
N fetches the data from its own cache
108
N. In effect, after the first read, data in cache
108
N is the same copy of data in memory
104
except that the data is now in a high-speed local storage. Typically, cache
108
N can be accessed in one or two cycles of CPU time while it takes a processor
102
15 to 50 cycles to access memory
104
. A typical processor
102
runs at about 333 Mhz or 3 ns (nanoseconds) per cycle, but it takes at least 60 ns or 20 cycles to access memory
104
.
A measure of data, typically 32, 64, 128, or 2
n
bytes, brought from memory
104
to cache
108
is usually called a “cache line.” The data of which a copy was brought to cache
108
and which remains in memory
104
is called a “memory line.” The size of a cache line or a memory line is determined by a balance of the overhead per read/write operation versus the usual amount of data transferred from memory and cache. An efficient size for a cache line results in transfers spending about 25% of their time on overhead and 75% of their time on actual data transfer.
A particular problem with using caches is that data becomes “stale.” A first processor
102
A may access data in the main memory
104
and copy the data into its cache
108
A. If the first processor
102
A then modifies the cache line of data in its cache
108
A, then at that instant the corresponding memory line becomes stale. If a second processor,
102
B for example, subsequently accesses the original data in the main memory
104
, the second processor
102
B will not find the most current version of the data because the most current version is in the cache
108
A. For each cache-line address, cache coherence guarantees that only one copy of data in cache
108
can be modified. Identical copies of a cache line may be present in multiple caches
108
, and thus be read by multiple processors
102
at the same time, but only one processor
102
is allowed to write, i.e., modify, the data. After a processor
102
writes to its cache
108
that processor
102
must “invalidate” any copies of that data in other caches to notify other processors
102
that their cache lines are no longer current.
FIG. 2A
shows valid cache lines D
0
for caches
108
A to
108
N whereas
FIG. 2B
shows cache
108
B with an updated cache line D
1
and other caches
108
A,
108
C, and
108
N with invalidated cache lines D
0
. The processors
102
A,
102
C, and
102
N with invalidated cache data D
0
in their respective caches
108
must fetch the updated version of cache line D
1
if they want to access that data line again.
Nonnally and for illustrative purposes in the following discussion, cache coherence protocols are executed by processors
102
associated with their related caches. However, in other embodiments these protocols may be executed by one or more specialized and dedicated cache controllers.
There are different cache coherence management methods for permitting a processor
102
to modify its cache line in cache
108
and invalidate other cache lines. One method (related to the present invention) utilizes, for each cache line, a respective “shared list” representing cache-line correspondences by “double-links” where each cache has a forward pointer pointing to the next cache entry in the list and a backward pointer pointing to the previous cache entry in the list. Memory
104
has a pointer which always points to the head of the list.
FIG. 3
shows a linked list
300
of caches
108
A . . .
108
N with the associated memory
104
. Memory
104
has a pointer which always points to the head (cache
108
A) of the list while the forward pointers Af, Bf, and Cf of caches
108
A,
108
B, and
108
C respectively point forward to the succeeding caches
108
B,
108
C, and
108
D (not shown). Similarly, backward pointers Nb, Cb, and Bb of caches
108
N,
108
C, and
108
B respectively point backward to the preceding caches. Because each cache unit
108
is associated with a respective processor
102
, a linked list representation of cache
108
is also understood as a linked list representation of processors
102
.
There are typically two types of cache sharing lists. The first type of list is the read-only (sometimes called “fresh”) list of caches for which none of the processors
102
has permission to modify the data. The second type of list is a read-write (sometimes called “owned”) list of caches for which the head-of-list processor
102
may have permission to write to its cache
108
. A list is considered “stable” after an entry has been completely entered into or completely deleted from the list. Each of the stable list states is defined by the state of the memory and the states of the entries in the shared list. Relevant states of memory include HOME, FRESH, and STALE. HOME indicates no shared list exists, FRESH indicates a read-only shared list, and STALE indicates the shared list is a read-write list and data in the list can be modified. A processor
102
must get authorization to write to or read from memory. A list entry always enters the list as the list head, and the action of entering is referred to as “prepending” to the list. If a list is FRESH (the data is the same as in memory), the entry that becomes the newly created head receives data from memory; otherwise it receives data from the previous list head. In a read-write list, only the head is allowed to modify (or write to) its own cache line and, after the head has written the data, the head must invalidate the other stale copies of the shared list. In one embodiment, invalidation is done by purging the pending invalidated entries of the shared list.
FIGS. 4A-4F
illustrate how the two types of lists are created and grown. Each of the
FIGS. 4
includes a before and an after list with states of the list and memory
104
. In
FIG. 4A
, initially memory
104
is in the HOME state, indicating there is no cache shared list. Processor
102
A requests permission to read a cache line. Since this is a read request, memory
104
changes from the HOME state to the FRESH state, and the resulting after list
402
AR is a read-only list with one entry
108
A. Cache
108
A receives data from memory
104
because cache
108
A access
Apple Computer Inc.
Kim Matthew
Simon Nancy R.
Simon & Koerner LLP
Tzeng Fred F.
LandOfFree
System and method for deleting read-only head entries in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for deleting read-only head entries in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for deleting read-only head entries in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2609173