Electrical computers and digital data processing systems: input/ – Intrasystem connection – Bus interface architecture
Reexamination Certificate
2000-12-21
2003-10-21
Ray, Gopal C. (Department: 2181)
Electrical computers and digital data processing systems: input/
Intrasystem connection
Bus interface architecture
C710S317000, C711S141000, C709S213000, C700S005000
Reexamination Certificate
active
06636926
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a multiprocessor system configured with a plurality of processors for realizing a high performance, or in particular to a shared memory multiprocessor for performing the cache coherence control against access requests and a node controller used with the same multiprocessor.
In a well-known method for implementing a shared memory multiprocessor, a plurality of nodes each configured with only processing units having cache memories are connected to each other by a single bus, and further a memory device and an I/O device are connected to the bus. The memory device and the I/O device are shared by the nodes both physically and logically, thereby making up what is called a shared memory multiprocessor. This system comprising a plurality of nodes connected by a single bus is inexpensive and can be configured in a simplistic fashion. In view of the fact that there is only one path for transferring data between the nodes connected to each other, however, the data bus constitutes a bottleneck to what otherwise might be a successful attempt to improve the performance of the system as a whole by increasing the number of nodes.
As a solution to this problem, there has been proposed a method in which a bus is used to transfer an access request (address) for the memory device or the I/O device, while a crossbar switch is used for data transfer.
The 1995 COMPCON95 Proceedings, p.p. 102-109 entitled “RISC System/6000SMP System” (first reference) proposes a system having a physically-shared and logically-shared memory in which a bus is used for address transfer while a crossbar switch is used for data transfer requiring a high throughput.
Generally, a shared memory multiprocessor employing a bus for address transfer uses an address snoop system as a method of maintaining the data coherence between a memory device and the cache memories included in the nodes. In the address snoop system, an address is broadcast in order to maintain the data coherence between all the nodes connected to the bus.
In the system disclosed in the first reference described above, the data throughput can be improved by employing a crossbar switch in place of a bus for data transfer. The use of a single bus for address transfer as in the prior art, however, makes it impossible to realize an efficient address snoop system in keeping with the improved throughput.
In order to obviate the bus neck posed when using a single bus for address transfer, on the other hand, “STARFIRE: extending the SMP Envelop”, 1998 MICRO January/February, pp. 39-49 (second reference) introduces a system which uses multiple buses for address transfer.
The system according to the second reference described above, in which each node is not configured only with a processor having a cache memory, is a multiprocessor system in which each node is configured with a processor including a cache memory, a memory and an I/O device. This system is what is called a distributed shared memory multiprocessor (physically-distributed logically-shared memory multiprocessor), in which the memories and the I/O devices are distributed physically among the nodes but shared logically by the nodes. In the system according to the second reference, a plurality of nodes are coupled to each other by buses for address and coupled by a crossbar switch for data. By use of four address buses, four address snoop operations can be performed in parallel. The physical address space is divided into four parts so that each address bus can snoop different address spaces at the same time.
The use of multiple buses for address transfer as in the second reference makes it possible to realize a more efficient address snoop than when using a single bus.
In the first and second references, however, the bus is used for address transfer and therefore the right to use the address bus is required to be secured even in the case where data coherence is not required between a cache memory and a memory device. Thus, the address bus cannot be used efficiently.
In order to obviate this problem, U.S. Pat. No. 6,011,791 (third reference) discloses what is called a physically-shared logically-shared memory multiprocessor in which the address bus is eliminated and the address is transferred to a crossbar switch for data use. In this system, the address can be transferred only to a node intended as a transfer destination in the case where data coherence is not need between the cache memory and the memory device.
SUMMARY OF THE INVENTION
The use of multiple buses for address transfer as in the second reference can realize the address snoop more efficient than when a single bus is used. In the case where a multiplicity of nodes are involved, however, even the use of multiple buses cannot secure the throughput of the address snoop commensurate with the improved throughput of the data transfer by the crossbar switch.
According to the third reference in which the address bus is disused and the address and the data area transferred through a single crossbar switch, a sufficient throughput of the address snoop cannot be secured in the case where the nodes are increased in number.
In all the conventional systems described above, an address is transferred to all the nodes in the case where data coherence is required between the cache memory and the memory device. According to the second reference, for example, an address is broadcast to all the nodes in the case where data coherence is required.
In view of this, the present inventors have conducted the following study. Specifically, in the case where data coherence is required, the address is required to be transferred only to the nodes having a cache (i.e. the nodes requiring cache coherence control for an access request), but the address transfer is not required to the nodes having no cache (i.e. the nodes requiring no cache coherence control for an access request). In the prior art, however, the address is transferred also to the nodes having no cache, thereby deteriorating the utilization efficiency of the path (regardless of whether the path is a crossbar switch or a bus). In the case where the nodes are increased in number, therefore, a sufficient throughput of the address snoop cannot be secured.
In the case where no data coherence is required between the cache memory and the memory device, the address is required to be transferred only to the nodes to which data coherence is required.
Specifically, the address is required to be transferred only to the nodes requiring data coherence, and therefore means is required for the one-to-many transferring (multicast) as well as the one-to-all transferring (broadcast).
The present inventors have proposed a shared memory multiprocessor system, in which each node is not configured only with processing units including cache memories but includes at least one processing unit each having a cache memory combined with at least one of a memory device and an I/O device, so that a plurality of the nodes have different configurations. Also in this distributed shared memory multiprocessor, the address is required to be transferred only to the nodes requiring cache coherence control for an access request but no address transfer is required to the nodes not requiring cache coherence control for an access request.
Accordingly, an object of the present invention is to provide a distributed shared memory multiprocessor configured with a plurality of different nodes and capable of efficient address snoop.
Another object of the invention is to provide a distributed shared memory multiprocessor configured with a plurality of nodes and capable of efficient address snoop, wherein the address is not transferred to the nodes not requiring coherence (i.e. the nodes not requiring cache coherence control for an access request) regardless of whether data coherence control is required or not between the cache memory and the memory device.
In order to achieve these objects, according to one aspect of the invention, there is provided a shared memory multiprocessor, wherein each node inc
Akashi Hideya
Hamanaka Naoki
Shonai Toru
Tsushima Yuji
Uehara Keitaro
LandOfFree
Shared memory multiprocessor performing cache coherence... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Shared memory multiprocessor performing cache coherence..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Shared memory multiprocessor performing cache coherence... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3155671