Method and apparatus for facilitating speculative loads in a...

Machine element or mechanism – Mechanical movements – Oscillation or reciprocation to intermittent unidirectional...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S133000, C711S146000, C712S237000, C712S239000

Reexamination Certificate

active

06718839

ABSTRACT:

BACKGROUND
1. Field of the Invention
The present invention relates the design of multiprocessor systems. More specifically, the present invention relates to a method and an apparatus for facilitating speculative load operations and/or speculative store operations in a multiprocessor system.
2. Related Art
In order to achieve high rates of computational performance, computer system designers are beginning to employ multiple processors that operate in parallel to perform a single computational task. One common multiprocessor design includes a number of processors
151
-
154
coupled to level one (L
1
) caches
161
-
164
that share a single level two (L
2
) cache
180
and a memory
183
(see FIG.
1
). During operation, if a processor
151
accesses a data item that is not present in local L
1
cache
161
, the system attempts to retrieve the data item from L
2
cache
180
. If the data item is not present in L
2
cache
180
, the system first retrieves the data item from memory
183
into L
2
cache
180
, and then from L
2
cache
180
into L
1
cache
161
.
Note that coherence problems can arise if a copy of the same data item exists in more than one L
1
cache. In this case, modifications to a first version of a data item in L
1
cache
161
may cause the first version to be different than a second version of the data item in L
1
cache
162
.
In order to prevent such coherency problems, computer systems often provide a coherency protocol that operates across bus
170
. A coherency protocol typically ensures that if one copy of a data item is modified in L
1
cache
161
, other copies of the same data item in L
1
caches
162
-
164
, in L
2
cache
180
and in memory
183
are updated or invalidated to reflect the modification.
Coherence protocols typically perform invalidations by broadcasting invalidation messages across bus
170
. However, as multiprocessor systems increase in performance, such invalidations occur more frequently. Hence, these invalidation messages can potentially tie up bus
170
, and can thereby degrade overall system performance.
In order to remedy this problem, some designers have begun to explore the possibility of maintaining directory information within L
2
cache
180
. This directory information specifies which L
1
caches contain copies of specific data items. This allows the system to send invalidation information to only the L
1
caches that contain the data item instead of sending a broadcast message to all L
1
caches. (This type of system presumes that there exist separate communication pathways for invalidation messages to each of the L
1
caches
161
-
164
, unlike the example illustrated in
FIG. 1
, which uses a single shared bus
170
to communicate with L
1
caches
161
-
164
.)
As multiprocessor systems continue to increase in performance, it is becoming increasingly harder to support memory models that significantly restrict the ordering of load and store operations. One commonly used memory model is the “Total Store Order” (TSO) memory model. Under the TSO memory model, loads and stores from a given processor typically execute in program order, except that loads can overtake previous stores. More specifically, under the TSO memory model: loads cannot overtake previous loads; stores cannot overtake previous stores; and stores cannot overtake previous loads. However, loads can overtake previous stores. This allows previous stores to take place in a lazy fashion while the system performs subsequent loads.
Unfortunately, placing these restrictions on the ordering of load and store operations can seriously degrade multiprocessor performance, because the multiprocessor system often has to wait for previous memory operations to complete before executing subsequent memory operations.
A less restrictive memory model is “release consistency,” in which the only restriction is that processors see a consistent view of shared data whenever a critical region is exited. This memory model is less restrictive than TSO and can lead to better multiprocessor performance. Unfortunately, many existing legacy applications make use of restrictive memory models, such as TSO.
Hence, in order to run these legacy applications, what is needed is a method and an apparatus for facilitating efficient parallel execution of programs under a restrictive memory model, such as the TSO memory model.
SUMMARY
One embodiment of the present invention provides a system that facilitates speculative load operations in a multiprocessor system. The system operates by maintaining a record of speculative load operations that have completed at a processor in the multiprocessor system, wherein a speculative load operation is a load operation that is speculatively initiated before a preceding load operation has returned. Next, the system receives an invalidation signal at an L
1
cache that is coupled to the processor, wherein the invalidation signal indicates that a specific line in the L
1
cache is to be invalidated. In response to this invalidation signal, the system examines the record of speculative load operations to determine if there exists a matching speculative load operation that is completed and is directed to the same location in the L
1
cache that the invalidation signal is directed to. If there exists a matching speculative load operation, the system replays the matching speculative load operation so that the matching speculative load operation takes place after an event that caused the invalidation signal completes.
In one embodiment of the present invention, the record of speculative load operations includes a plurality of banks, wherein each bank contains speculative load operations directed to a specific bank of the L
2
cache.
In one embodiment of the present invention, the record of speculative load operations maintains set and way information for entries in the L
1
cache that contain results of speculative load operations.
In one embodiment of the present invention, the invalidation signal is received as a result of a cache coherency protocol operation.
In one embodiment of the present invention, the invalidation signal is received as a result of a store operation associated with the specific line in the L
1
cache.
In one embodiment of the present invention, invalidation signal is received as a result of an invalidation of a corresponding line in the L
2
cache.
In one embodiment of the present invention, the record of speculative load operations includes an indicator for each speculative load operation. This indicator specifies whether the speculative load operation has completed.
In one embodiment of the present invention, maintaining the record of speculative load operations involves updating the record whenever a new speculative load operation completes.
In one embodiment of the present invention, the system receives a replay signal at the processor from the L
2
cache, wherein the replay signal identifies a specific set and way location. In response to this replay signal, the system replays any speculative load operation that has completed and is directed to the specific set and way location. Note that he system performs this replay without performing a corresponding invalidation.
In one embodiment of the present invention, the multiprocessor system implements a total store ordering (TSO) memory model in which loads can overtake previous stores, loads cannot overtake previous loads, stores cannot overtake previous loads, and stores cannot overtake previous stores.
Another embodiment of the present invention provides a system that facilitates speculative load operations in a multiprocessor system. This system operates by maintaining a record at an L
2
cache of speculative load operations that have returned data values through the L
2
cache to associated L
1
caches, wherein a speculative load operation is a load operation that is speculatively initiated before a preceding load operation has returned. In response to receiving an invalidation event, the system invalidates a target line in the L
2
cache. The system also performs a lookup in the record

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for facilitating speculative loads in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for facilitating speculative loads in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for facilitating speculative loads in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3210696

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.