Electrical computers and digital processing systems: memory – Storage accessing and control – Access timing
Reexamination Certificate
2000-01-28
2002-12-10
Robertson, David L. (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Access timing
C711S148000
Reexamination Certificate
active
06493809
ABSTRACT:
TECHNICAL FIELD
This invention relates generally to multiprocessor systems that are comprised of a number of separate but interconnected processor nodes. More particularly, this invention relates to a method for reducing the time required to invalidate shared cache lines that reside on the separate nodes when a copy of the data in one of the lines is changed.
BACKGROUND
Multiprocessor computer systems by definition contain multiple processors that can execute multiple parts of a computer program or multiple distinct programs simultaneously, in a manner known as parallel computing. In general, multiprocessor systems execute multithreaded-programs or single-threaded programs faster than conventional single processor computers, such as personal computers (PCs), that must execute programs sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a multithreaded-program and/or multiple distinct programs can be executed in parallel and the architecture of the particular multiprocessor computer at hand.
Multiprocessor systems may be classified by how they share information among the processors. Shared memory multiprocessors offer a common physical memory address space that all processors can access. Multiple processes or multiple threads within the same process can communicate through shared variables in memory that allow them to read or write to the same memory location in the computer. Message passing multiprocessors, in contrast, have a separate memory space for each processor, requiring processes to communicate through explicit messages to each other.
Shared memory multiprocessors can further be classified by how their memory is physically organized. In distributed shared memory (DSM) machines, the memory is divided into modules physically placed near groups of one or more processors. Although all of the memory modules are globally accessible, a processor can access memory placed nearby faster than memory placed remotely. Because the memory access time differs based on memory location, distributed shared memory machines are also called non-uniform memory access (NUMA) machines. In centralized shared memory machines, on the other hand, all of the memory is physically in one location. Centralized shared memory machines are also called uniform memory access (UMA) machines because the memory is equidistant in time from each of the processors. Both forms of memory organization typically use high-speed cache memory in conjunction with main memory to reduce execution time.
Multiprocessor systems with distributed shared memory are often organized into nodes with one or more processors per node. Included in the node may be a node bus, local memory for the processors, a remote cache for caching data obtained from memory in other nodes, and logic for linking the node with other nodes in the computer. A processor in a node communicates directly with the local memory and communicates indirectly with memory on other nodes through the node's remote cache. For example, if the desired data is in local memory, a processor obtains the data directly from a block (or line) of local memory. However, if the desired data is stored in memory in another node, the processor must access its remote cache to obtain the data. A remote cache hit occurs if the data has been obtained recently and is presently stored in a line of the remote cache. Otherwise a cache miss occurs, and the processor must obtain the desired data from the local memory of another node through the linking logic and place the obtained data in its node's remote cache. Further information on multiprocessor computer systems in general and NUMA machines in particular can be found in a number of works including
Computer Architecture: A Quantitative Approach
(2
nd
Ed. 1996), by D. Patterson and J. Hennessy.
Data coherency is maintained among the multiple caches and memories of a distributed shared memory machine through a cache coherency protocol such as the protocol described in the Scalable Coherent Interface (SCI)(IEEE 1596). Central to this coherency protocol is the use of doubly linked sharing list structures to keep track of the cache lines of separate remote caches that share the same data. Before the data in one of the linked cache lines can change, such as by a processor writing to a cache line, the other cache lines on the list must be invalidated and the list purged (i.e., dissolved).
An SCI sharing list is constructed using tags that are associated with each line of memory and each line of a remote cache. A memory tag includes a state field and a head pointer that, when a sharing list exists for the memory line, points to the node that is the head of the list. A remote cache tag includes a state field, a backward pointer, and a forward pointer. The backward pointer points to the node that is the next list element toward the home node. The forward pointer points to the node that is the next list element toward the list tail. In the remote cache tag at the list head, the backward pointer points to the home memory whose data is cached. In the remote cache tag at the list tail, the forward pointer is null.
A sharing list is formed, increased, or dissolved whenever a processor tries to read from or write to a line of data that is not present in its remote cache or local memory. In these cases a processor then requests the data from the remote memory storing the data. If no cached copies of the line exist in the computer system, the remote memory responds with the data. A sharing list is then formed with a cache line on the requesting processor's node now storing the data. The pointers in the memory and remote cache tags are changed as described above to designate the node containing the cache line as the list head, with the remote cache tag's forward pointer set to null since there are no other list elements. If a cached copy of the data already exists in the computer system, the memory still responds with the data if it is valid; otherwise, the data is obtained from the present head of the list. Again, the pointers in the memory and remote cache tags are changed to designate the node of the requesting processor as the list head and the old head is moved down the list.
An old sharing list is dissolved and its cache lines invalidated when a processor writes to a memory line that corresponds to the sharing list. The SCI scheme for invalidating a sharing list is shown in
FIGS. 1A-C
, where a processor writes to a cache line that is shared by other nodes. In
FIG. 1A
, the state of a sharing list is shown before the scheme is initiated. Node N is the head of the list. As indicated by the bidirectional arrows, the remote cache tag for the shared line points forward to node Y, whose remote cache tag for the line points backward to node N. Similarly, the remote cache tag for the cache line on node Y points forward to node Z, whose remote cache tag for the share line points backward to node Y. Since the remote cache tag for the line on node Z does not point forward to another cache line in this example, node Z is the tail of the list. In
FIG. 1B
, node N issues an SCI invalidate request (
1
) to node Y to remove its cache line from the list. Node Y responds through its cache controller by issuing a bus invalidate request (
2
) on its node bus to its bus controller. The bus controller invalidates the copies of the cache line that exist in the processors' local caches and acknowledges to the cache controller when the invalidate is complete (
3
). The cache controller then updates the state of its cache line in the remote cache to invalid and issues an invalidate response to node N (
4
). This response confirms that the cache line has been invalidated and that node Y has been removed from the list. The response also includes the forward pointer to node Z. Using this forward pointer, node N then issues an SCI invalidate request to node Z to remove its cache line from the list (
5
). Node Z responds in the same manner as node Y. by issuing a bus invalidate
Lovett Thomas D.
Safranek Robert J.
Garnett Pryor A.
Klarquist & Sparkman, LLP
Robertson David L.
LandOfFree
Maintaining order of write operations in a multiprocessor... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Maintaining order of write operations in a multiprocessor..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Maintaining order of write operations in a multiprocessor... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2997166