Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
1999-03-30
2003-05-06
Kim, Matthew (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S146000
Reexamination Certificate
active
06560681
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates generally to multiprocessor data processing systems, and more particularly to cache coherence for a plurality of multiprocessors with a multi-bus shared memory system.
2. Discussion of the Background Art
Some computer systems are built with multiple processors that can work in parallel on a program to speed up its execution. The architectures of such multiprocessor systems are classified according to the mechanisms used to communicate between the multiple processors. In shared-memory architectures, all processors access a single large memory and communicate with one another by writing to and reading from this shared memory.
A computer system node may be divided into a processor subsystem and a memory subsystem. The memory subsystem includes the main Dynamic Random Access Memory (DRAM) and provides data from memory in response to requests from any number of connected processors. Normally, the amount of time spent to access data in the memory subsystem is quite long relative to a processor's speed and therefore processors are often built with caches to improve their performance. A cache is a small memory connected between a processor and main memory that stores recently-used data from locations of main memory. The smallest unit of data that can be transferred into and out of a cache is called a “line.”
The processor subsystem includes the processors and one or more caches. A cache has a much faster access time than the main memory subsystem, but is usually much smaller. Since a smaller cache cannot hold all of the data that is in the main memory, it must store both the address (called the tag) and the value for any data it currently holds. When a processor requests a particular line, it first attempts to match the address against each of the tags currently in the cache to see whether the line is in the cache. If there is no match, the memory subsystem passes the request to the main memory.
All caching schemes divide main memory into physically consecutive segments. A cache is usually organized as a series of lines totaling the same size as a segment. The tag is used to identify which segment is currently occupying the cache. If the line with a requested address is contained in the cache then the data in that line is delivered to the processor. If the line is not in the cache, then the segment of the main memory containing the line is fetched into the cache and the line is delivered to the processor.
In a direct-mapped cache memory, segments are put into cache lines whose line numbers can easily be calculated from the main memory address.
An associative cache differs from a direct-mapped cache in that a line from any segment may be loaded into any cache line. The cache line needs to store the segment number in addition to the data itself. In order to allow searching the cache, an associative cache includes circuitry to simultaneously check the segment number address against the segment number in every line of the cache. This additional circuitry makes associative caches more expensive.
Set-associative mapping combines architectures of both direct-mapping and associative caches. A set-associative cache is organized as a number of sets, each containing a number of lines. With set-associative mapping, the cache set is determined by the address segment number, but the line within the set is not determined by the address segment number. Typically, set-associative caches are two way, meaning that there are two sets in the cache, which significantly improves the cache hit rate over a direct-mapped cache in providing the requested memory lines.
FIG. 1
shows a flowchart of a typical prior art cache memory access algorithm
100
starting in step
102
with a new request for data in memory. In step
104
the requested data address is compared to the cache tags. Step
106
determines whether the requested data is in a cache. In step
106
, if no valid copy of the data is in a cache, then in step
108
the memory subsystem fetches the requested data from main memory into cache memory. Once the requested data is in a cache, in step
110
the memory subsystem delivers the data to the processor that requested it. After the data is delivered to the processor, the memory access algorithm returns to step
102
to await a next memory request.
In a shared-memory multiprocessor system, each processor typically has its own cache, so the system has multiple caches. Since each cache can hold a copy of a given data item, it is important to keep the states of all the caches consistent and up-to-date with the latest copy written by one of the processors. It is the responsibility of the memory subsystem to return from the caches or main memory the correct value as prescribed by the processor's memory model, which is a set of rules governing the operation of the memory subsystem. This is achieved through the use of a cache coherence protocol.
Therefore, in addition to the main memory, the memory subsystem includes a cache coherence directory which contains control information used by the cache coherence protocol to maintain cache coherence in the system. A conventional directory has an entry for each main memory location with state information indicating whether the memory location data may also exist in a cache somewhere in the system. The node where the main memory line resides is called the home node of that line. The node where the directory resides is called the local node and other nodes are called remote nodes. The coherence protocol keeps track of the state of each cache line and guarantees that the most recent copy of data is given to the processor. The protocol also specifies all transitions and transactions to be taken in response to a request. Any action taken on a cache line is reflected in the state stored in the directory. A common scheme uses three permanent states to accomplish this. The “Invalid” state exists when a line is not cached anywhere and main memory has the only copy. The “Shared” state exists when a remote node (group of processors) has a valid copy of the line. This is not to be confused with a “global shared state,” i.e., a global cache coherence state which exists when any cache in the local node or at least one remote cache has a valid copy of the line. These valid lines are read-only and are identical to the copy in main memory. The “Dirty” state exists when a line is valid in one cache only at a remote node. The copy may be modified by that processor and the main memory may contain old data.
FIG. 2
shows a flowchart of a memory access algorithm
200
using a cache coherence directory. A new request for data in memory starts the algorithm in step
202
. In step
204
the algorithm compares the requested data address to the directory tags in the cache coherence directory.
In step
206
the algorithm determines the state of the requested data from the cache coherence directory. If in step
206
the state is “Invalid” (i.e., there is no valid copy of the data in a cache), then in step
208
the algorithm fetches the requested data from main memory or local cache into cache memory. If in step
206
the state is “Shared” (i.e., the requested data is in a cache in a remote node), then in step
212
the algorithm fetches the data from memory and if the request is a “store,” invalidates the cached copies. If in step
206
the state is “Dirty” (i.e., the most recent version of the data is valid in only one cache), then in step
210
the algorithm fetches the requested data from the cache.
Once a valid copy of the data has been fetched, in step
214
the algorithm delivers the data to the processor that requested it. After the data is delivered to the processor, the memory access algorithm returns to step
202
to await a next memory request.
The coherence protocol may use other transient states to indicate that a line is in transition. Given enough time, these transient states revert to one of the three permanent states.
A cache coherence protocol is typically implemented by a finite state machine in wh
Weber Wolf-Dietrich
Wilson James Christopher
Anderson Matthew D.
Carr & Ferrell LLP
Fujitsu Limited
Kim Matthew
LandOfFree
Split sparse directory for a distributed shared memory... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Split sparse directory for a distributed shared memory..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Split sparse directory for a distributed shared memory... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3082621