Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2002-01-31
2004-01-27
Nguyen, Hiep T. (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S128000, C711S133000
Reexamination Certificate
active
06684297
ABSTRACT:
BACKGROUND
1. Field of the Invention
The present invention relates the design of multiprocessor systems, More specifically, the present invention relates to a method and an apparatus for using a reverse directory located at a lower-level cache to facilitate operations involving higher-level caches that perform accesses through the lower-level cache.
2. Related Art
In order to achieve high rates of computational performance, computer system designers are beginning to employ multiple processors that operate in parallel to perform a single computational task. One common multiprocessor design includes a number of processors
151
-
154
coupled to level one (L
1
) caches
161
-
164
that share a single level two (L
2
) cache
180
and a memory
183
(see FIG.
1
A). During operation, if a processor
151
accesses a data item that is not present in local L
1
cache
161
, the system attempts to retrieve the data item from L
2
cache
180
. If the data item is not present in L
2
cache
180
, the system first retrieves the data item from memory
183
into L
2
cache
180
, and then from L
2
cache
180
into L
1
cache
161
.
Note that coherence problems can arise if a copy of the same data item exists in more than one L
1
cache. In this case, modifications to a first version of a data item in L
1
cache
161
may cause the first version to be different than a second version of the data item in L
1
cache
162
.
In order to prevent coherency problems, computer systems often provide a coherency protocol that operates across bus
170
. A coherency protocol typically ensures that if one copy of a data item is modified in L
1
cache
161
, other copies of the same data item in L
1
caches
162
-
164
, in L
2
cache
180
and in memory
183
are updated or invalidated to reflect the modification.
Coherence protocols typically perform invalidations by broadcasting invalidation messages across bus
170
. If such invalidations occur frequently, these invalidation messages can potentially tie up bus
170
, and can thereby degrade overall system performance.
In order to remedy this problem, some designers have begun to explore the possibility of maintaining directory information within L
2
cache
180
. This directory information specifies which L
1
caches contain copies of specific data items. This allows the system to send invalidation information to only the L
1
caches that contain the data item instead of sending a broadcast message to all L
1
caches. (This type of system presumes that there exist separate communication pathways for invalidation messages to each of the L
1
caches
161
-
164
, unlike the example illustrated in
FIG. 1A
, which uses a single shared bus
170
to communicate with L
1
caches
161
-
164
.)
However, note that storing directory information for each entry in L
2
cache
180
is wasteful because L
2
cache
180
typically has many more entries than L
1
caches
161
-
164
. This means that most of the entries for directory information in L
2
cache
180
will be empty.
Furthermore, note that L
1
caches
161
-
164
are typically set-associative. Hence, when an invalidation message is received by L
1
cache
161
, a lookup and comparison must be performed in L
1
cache
161
to determine the way location of the data item. For example, in a four-way set-associative L
1
cache, a data item that belongs to a specific set (that is specified by a portion of the address) can be stored in one of four possible “ways”. Consequently, tags from each of the four possible ways must be retrieved and compared to determine the way location of the data item. This lookup is time-consuming and can degrade system performance.
What is needed is a method and an apparatus for maintaining directory information for L
1
caches without wasting memory.
Furthermore, what is needed is a method and an apparatus for invalidating an entry in an L
1
cache without performing a lookup to determine the way location of the entry.
SUMMARY
One embodiment of the present invention provides a multiprocessor system that includes a number of processors with higher-level caches that perform memory accesses through a lower-level cache. This multiprocessor system also includes a reverse directory coupled to the lower-level cache, which includes entries corresponding to lines in the higher-level caches, wherein each entry identifies an associated entry in the lower-level cache.
In one embodiment of the present invention, the lower-level cache is configured to receive a request from a higher-level cache to retrieve a line from the lower-level cache. If the line is present within the lower-level cache, the system sends the line to the higher-level cache so that the line can be stored in the higher-level cache. The system also stores information in the reverse directory to indicate that the line is stored in the higher-level cache.
In a variation on this embodiment, the higher-level cache is an N-way set-associative cache, and storing the information in the reverse directory involves storing way information identifying a way location in the higher-level cache in which the line is to be stored. The multiprocessor system is additionally configured to use this way information during a subsequent invalidation operation to invalidate the line in the higher-level cache without having to perform a lookup in the higher-level cache to determine the way location of the line in the higher-level cache.
In one embodiment of the present invention, the lower-level cache is additionally configured to generate a miss to pull the line into the lower-level cache, if the line is not present within the lower-level cache.
In one embodiment of the present invention, upon receiving an update request that causes a target entry in the lower-level cache to be updated, the system performs a lookup in the reverse directory to determine if the target entry is contained in one or more higher-level caches. For each higher-level cache that contains the target entry, the system sends an invalidation request to the higher-level cache to invalidate the target entry, and updates a corresponding entry in the reverse directory to indicate that the target entry has been invalidated in the higher-level cache.
Note that this update request can include, a load miss, a store miss, and a store hit on the target entry. If the update request is a store hit, the lookup in the reverse directory involves looking up the target entry in all higher-level caches, except for a higher-level cache that caused the store hit.
In one embodiment of the present invention, the reverse directory includes a fixed entry corresponding to each entry in each of the higher-level caches.
In one embodiment of the present invention, each entry in the reverse directory includes information specifying a location of a corresponding entry in the lower-level cache.
In one embodiment of the present invention, the lower-level cache is organized as an M-way set associative cache. In this embodiment, each entry in the reverse directory includes: a way identifier that identifies a way location of a corresponding entry within the lower-level cache; a set identifier that identifies a set location of the corresponding entry within the lower-level cache, wherein the set identifier does not include set information that can be inferred from a location of the entry within the reverse directory; and a valid flag indicating whether the entry in the reverse directory is valid.
In one embodiment of the present invention, the multiprocessor system is located on a single semiconductor chip.
In one embodiment of the present invention, the lower-level cache is an L
2
cache, and each of the higher-level caches is an L
1
cache.
In one embodiment of the present invention, the higher-level caches are organized as write-through caches, so that updates to the higher-level caches are immediately written through to the lower-level cache.
In one embodiment of the present invention, the lower-level cache includes multiple banks that can be accessed in parallel.
REFERENCES:
patent: 5375220 (1994-12-01), Ishikawa
Chaudhry Shailender
Tremblay Marc
Nguyen Hiep T.
Park Vaughan & Fleming LLP
LandOfFree
Reverse directory for facilitating accesses involving a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Reverse directory for facilitating accesses involving a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reverse directory for facilitating accesses involving a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3220813