Method and apparatus for relaxing the FIFO ordering...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S147000

Reexamination Certificate

active

06253291

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to memory models for microprocessor control. More particularly, the present invention relates to relaxing the first-in-first-out (FIFO) constraint on the processing of all snoop requests received from the system bus. Some concurrent processing of snoop requests is possible through the separation of locally generated requests from remotely generated requests, allowing selected requests to proceed without strict order constraints.
2. The Prior Art
In computer systems, memory access times often limit the throughput of the system. Relative to current processor speeds, data access times for the main memory can be quite long. One scheme to minimize this data access time limitation is to store some of the more frequently-used data in a location that is more quickly accessible to the processor than is the main memory. For example, in systems with multiple processors, a cache memory associated with each of the processors is used to store copies of certain data so that the data can be accessed more quickly than from the main memory. Cache memory generally has faster access times than does the main memory.
Unfortunately, cache memory systems are not without their own problems. Because the use of cache memory involves creating a copy of some of the main memory data, multiple copies of the same data may exist in different locations within the multi-processor computer system. When one copy of the data is changed, multiple copies of the same data held in other locations must also be updated. Data errors will occur within the multi-processor system if different processors within the system are operating (performing reads and writes) on such inconsistent copies of data. This problem is known as a cache consistency or cache coherence problem. To avoid this problem, a common solution is to maintain a total order for all memory accesses with the help of cache consistency protocols and hardware.
Hardware-based solutions to the cache coherence problem generally follow either a centralized or distributed approach. In a centralized approach, directory protocols maintain information about where copies of information reside in a centralized directory. The directory contains information about the contents of local caches for the entire multi-processor system. A centralized controller keeps this information up to date and interacts with all of the local caches to ensure that data consistency is maintained.
In a distributed approach, “snoopy” protocols distribute the responsibility for maintaining cache coherence among all of the processors. The updates each processor makes to a shared memory block must be broadcast to all other processors. Each cache controller “snoops”, or reads, these broadcast messages and updates its own cache accordingly.
In the “snoopy” system, each individual processor and its cache is connected to a shared system bus that is connected to the shared main memory. As data operations are performed in each processor, the processor will broadcast these operations onto the shared system bus. For example, as a first processor performs read and write operations on shared data copies located in its cache, it broadcasts this information to the system bus to alert other processors to update the status of their data copies. By “snooping” the system bus, a second processor knows that it must invalidate its copy of a piece of data after it receives the broadcast that the first processor has operated on that same piece of data. Other examples of the messages broadcast by processors onto the shared system bus are well known to those of ordinary skill in the art.
In the asynchronous cache system
10
shown in
FIG. 1
, each processor
12
-
1
through
12
-n has an associated cache
14
-
1
through
14
-n and an associated first-in-first-out (FIFO) snoop buffer
16
-
1
through
16
-n. Each snoop buffer
16
-
1
through
16
-n is responsible for storing the snoop broadcasts received from the system bus
18
until they can be processed by each individual processor
12
-
1
through
12
-n.
When a cache coherence transaction such as an invalidation is broadcast on the system bus
18
, the invalidation request is buffered in each individual snoop buffer
16
-
1
through
16
-n. The data selected for invalidation located in a cache
14
is not invalidated immediately. Instead, the broadcast messages are propagated separately following a FIFO order through each snoop buffer
16
to its associated cache
14
. Accordingly, a copy of the data found in a cache
14
is invalidated independently or “asynchronously” of the invalidations occurring in other processor caches.
Total Store Ordering (TSO) and Sequential Consistency (SC) are two well-known techniques for specifying the order in which memory accesses are performed. SC and TSO do not allow any processors in the system to read two different write updates as having occurred in a different order. SC is a stronger model than TSO because SC also does not allow a processor to read and return the value of a write while the write is still pending in the buffer.
As an example of the type of errors TSO and SC prevent, suppose that the initial value of location X in the cache of a first processor is 0 and the initial value of location Y in the cache of a second processor is also 0. The first processor attempts to update location X to a value of 1 and the second processor attempts to update location Y to a value of 1. An illegal result occurs if a third processor reads X as 1 and Y as 0, while a fourth processor reads X as 0 and Y as 1. The third and fourth processors must read these two operations as having occurred in the same order.
It was normally believed that requests received from the system bus needed to be processed by the system processors in the order in which they were received according to the prior art constraints. If the processor has a hierarchical cache system with multiple levels of cache associated with each processor, a FIFO path from the system bus through each level of cache must be maintained. Thus, the system bus behaves as a reference point at which memory accesses are ordered. It provides the necessary information to guarantee that all processors will observe the same sequence of events, and it is impossible to observe two writes, for example, in different orders.
Maintaining this strict FIFO order does create a memory access time penalty. If a strict FIFO order is maintained on all coherence transactions, a pending request that may have a long memory access latency, for example a request to access the shared main memory which will take longer than a cache access, will penalize other transactions further down in the FIFO store path. The processor will pause and wait for the pending request to complete, wasting processor time that could be used to execute other requests if the FIFO restraint could be relaxed. Accordingly, a method for relaxing this FIFO constraint would be desirable. Such a method could reduce the amount of time an individual processor is idle while waiting for a request to complete, and thus reduce overall execution time for the multi-processor computer system.
BRIEF DESCRIPTION OF THE INVENTION
The present invention is a method for relaxing the first-in-first-out (FIFO) constraint on the processing of snoop requests received by local processors from the system bus and also requests generated locally within the processor itself. These snoop requests contain coherence transaction information designed to maintain consistency between the multiple memory data copies which may exist in various processors' local caches. The relaxation of this FIFO constraint allows certain snoop requests to be processed out-of-order without compromising the data coherence of the multi-processor system. The relaxation of this constraint allows for a more efficient utilization of processor resources in order to decrease the overall multi-processor system execution time.
In a preferred embodiment of the present invention, each processor in a multi-processor sy

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for relaxing the FIFO ordering... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for relaxing the FIFO ordering..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for relaxing the FIFO ordering... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2505258

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.