Using an L2 directory to facilitate speculative loads in a...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S122000, C711S141000, C711S146000, C712S234000, C712S237000, C712S239000

Reexamination Certificate

active

06721855

ABSTRACT:

BACKGROUND
1. Field of the Invention
The present invention relates the design of multiprocessor systems. More specifically, the present invention relates to a method and an apparatus for facilitating speculative load operations and/or speculative store operations in a multiprocessor system.
2. Related Art
In order to achieve high rates of computational performance, computer system designers are beginning to employ multiple processors that operate in parallel to perform a single computational task. One common multiprocessor design includes a number of processors
151
-
154
coupled to level one (L1) caches
161
-
164
that share a single level two (L2) cache
180
and a memory
183
(see FIG.
1
). During operation, if a processor
151
accesses a data item that is not present in local L1 cache
161
, the system attempts to retrieve the data item from L2 cache
180
. If the data item is not present in L2 cache
180
, the system first retrieves the data item from memory
183
into L2 cache
180
, and then from L2 cache
180
into L1 cache
161
.
Note that coherence problems can arise if a copy of the same data item exists in more than one L1 cache. In this case, modifications to a first version of a data item in L1 cache
161
may cause the first version to be different than a second version of the data item in L1 cache
162
.
In order to prevent such coherency problems, computer systems often provide a coherency protocol that operates across bus
170
. A coherency protocol typically ensures that if one copy of a data item is modified in L1 cache
161
, other copies of the same data item in L1 caches
162
-
164
, in L2 cache
180
and in memory
183
are updated or invalidated to reflect the modification.
Coherence protocols typically perform invalidations by broadcasting invalidation messages across bus
170
. However, as multiprocessor systems increase in performance, such invalidations occur more frequently. Hence, these invalidation messages can potentially tie up bus
170
, and can thereby degrade overall system performance.
In order to remedy this problem, some designers have begun to explore the possibility of maintaining directory information within L2 cache
180
. This directory information specifies which L1 caches contain copies of specific data items. This allows the system to send invalidation information to only the L1 caches that contain the data item instead of sending a broadcast message to all L1 caches. (This type of system presumes that there exist separate communication pathways for invalidation messages to each of the L1 caches
161
-
164
, unlike the example illustrated in
FIG. 1
, which uses a single shared bus
170
to communicate with L1 caches
161
-
164
.)
As multiprocessor systems continue to increase in performance, it is becoming increasingly harder to support memory models that significantly restrict the ordering of load and store operations. One commonly used memory model is the “Total Store Order” (TSO) memory model. Under the TSO memory model, loads and stores from a given processor typically execute in program order, except that loads can overtake previous stores. More specifically, under the TSO memory model: loads cannot overtake previous loads; stores cannot overtake previous stores; and stores cannot overtake previous loads. However, loads can overtake previous stores. This allows previous stores to take place in a lazy fashion while the system performs subsequent loads.
Unfortunately, placing these restrictions on the ordering of load and store operations can seriously degrade multiprocessor performance, because the multiprocessor system often has to wait for previous memory operations to complete before executing subsequent memory operations.
A less restrictive memory model is “release consistency,” in which the only restriction is that processors see a consistent view of shared data whenever a critical region is exited. This memory model is less restrictive than TSO and can lead to better multiprocessor performance. Unfortunately, many existing legacy applications make use of restrictive memory models, such as TSO.
Hence, in order to run these legacy applications, what is needed is a method and an apparatus for facilitating efficient parallel execution of programs under a restrictive memory model, such as the TSO memory model.
SUMMARY
One embodiment of the present invention provides a system that facilitates speculative load operations in a multiprocessor system. The system operates by maintaining a record of speculative load operations that have completed at a processor in the multiprocessor system, wherein a speculative load operation is a load operation that is speculatively initiated before a preceding load operation has returned. Next, the system receives an invalidation signal at an L1 cache that is coupled to the processor, wherein the invalidation signal indicates that a specific line in the L1 cache is to be invalidated. In response to this invalidation signal, the system examines the record of speculative load operations to determine if there exists a matching speculative load operation that is completed and is directed to the same location in the L1 cache that the invalidation signal is directed to. If there exists a matching speculative load operation, the system replays the matching speculative load operation so that the matching speculative load operation takes place after an event that caused the invalidation signal completes.
In one embodiment of the present invention, the record of speculative load operations includes a plurality of banks, wherein each bank contains speculative load operations directed to a specific bank of the L2 cache.
In one embodiment of the present invention, the record of speculative load operations maintains set and way information for entries in the L1 cache that contain results of speculative load operations.
In one embodiment of the present invention, the invalidation signal is received as a result of a cache coherency protocol operation.
In one embodiment of the present invention, the invalidation signal is received as a result of a store operation associated with the specific line in the L1 cache.
In one embodiment of the present invention, invalidation signal is received as a result of an invalidation of a corresponding line in the L2 cache.
In one embodiment of the present invention, the record of speculative load operations includes an indicator for each speculative load operation. This indicator specifies whether the speculative load operation has completed.
In one embodiment of the present invention, maintaining the record of speculative load operations involves updating the record whenever a new speculative load operation completes.
In one embodiment of the present invention, the system receives a replay signal at the processor from the L2 cache, wherein the replay signal identifies a specific set and way location. In response to this replay signal, the system replays any speculative load operation that has completed and is directed to the specific set and way location. Note that he system performs this replay without performing a corresponding invalidation.
In one embodiment of the present invention, the multiprocessor system implements a total store ordering (TSO) memory model in which loads can overtake previous stores, loads cannot overtake previous loads, stores cannot overtake previous loads, and stores cannot overtake previous stores.
Another embodiment of the present invention provides a system that facilitates speculative load operations in a multiprocessor system. This system operates by maintaining a record at an L2 cache of speculative load operations that have returned data values through the L2 cache to associated L1 caches, wherein a speculative load operation is a load operation that is speculatively initiated before a preceding load operation has returned. In response to receiving an invalidation event, the system invalidates a target line in the L2 cache. The system also performs a lookup in the record to identify affected L1 caches that are associated with speculat

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Using an L2 directory to facilitate speculative loads in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Using an L2 directory to facilitate speculative loads in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Using an L2 directory to facilitate speculative loads in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3207707

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.